Hadoop

Hadoop
Wikipedia Dump Dataset https://dumps.wikimedia.org/enwiki/
Airline on-time Performance Dataset http://stat-computing.org/dataexpo/2009/the-data.html
Freebase Triples Dataset https://developers.google.com/freebase/
AWS Public Datasets(Download data: Need Amazon account ) https://aws.amazon.com/public-datasets/
Sample Datasets for Hadoop Testing and Eval https://streever.atlassian.net/wiki/pages/viewpage.action?pageId=491580
Hadoop-bigdata Datasets https://github.com/algorithmica-repository/hadoop-bigdata/tree/master/datasets
PUMA Benchmarks Dataset https://engineering.purdue.edu/~puma/datasets.htm
Google Books Ngrams http://books.google.com/ngrams/
1000 Genomes- 200TB dataset ftp://ftp-trace.ncbi.nlm.nih.gov/1000genomes/ftp/
The ClueWeb09 Dataset http://lemurproject.org/clueweb09/
Collections of Datasets Weka http://www.cs.waikato.ac.nz/~ml/weka/datasets.html
noaa-27GB dataset ftp://ftp.ncdc.noaa.gov/pub/data/noaa/
Cornell Movie–Dialogs Corpus https://www.cs.cornell.edu/~cristian/Cornell_Movie-Dialogs_Corpus.html
AREALM Dataset https://drive.google.com/file/d/0B1jY75xGiy7eZV93eGxlZ2YwSFE/view
AREAWATER https://drive.google.com/file/d/0B1jY75xGiy7eR3VpNC1XMzB5cWs/view
EDGES SpatialHadoop Dataset https://drive.google.com/file/d/0B1jY75xGiy7eOG85SHM3TzFVd2c/view
ZCTA5 Dataset https://drive.google.com/file/d/0B1jY75xGiy7eLWhNUll0ZWFRT0U/view
OpenStreetMap Datasets https://drive.google.com/file/d/0B1jY75xGiy7eNjJuRy1KWjRieVU/view
Machine Learning Datasets https://blog.bigml.com/2013/02/28/data-data-data-thousands-of-public-data-sources/
Hackspark Dataset http://hackspark.github.io/environment/download-sample-data/
The USC-SIPI Image Database http://sipi.usc.edu/database/
Criteo Labs Terabyte Dataset http://labs.criteo.com/2013/12/download-terabyte-click-logs/
Data Science Datasets http://blog.mortardata.com/post/67652898761/6-dataset-lists-curated-by-data-scientists?goback=%2Egde_4989164_member_5820574831720022020#%21