Wikipedia Dump Dataset |
https://dumps.wikimedia.org/enwiki/ |
Airline on-time Performance Dataset |
http://stat-computing.org/dataexpo/2009/the-data.html |
Freebase Triples Dataset |
https://developers.google.com/freebase/ |
AWS Public Datasets(Download data: Need Amazon account ) |
https://aws.amazon.com/public-datasets/ |
Sample Datasets for Hadoop Testing and Eval |
https://streever.atlassian.net/wiki/pages/viewpage.action?pageId=491580 |
Hadoop-bigdata Datasets |
https://github.com/algorithmica-repository/hadoop-bigdata/tree/master/datasets |
PUMA Benchmarks Dataset |
https://engineering.purdue.edu/~puma/datasets.htm |
Google Books Ngrams |
http://books.google.com/ngrams/ |
1000 Genomes- 200TB dataset |
ftp://ftp-trace.ncbi.nlm.nih.gov/1000genomes/ftp/ |
The ClueWeb09 Dataset |
http://lemurproject.org/clueweb09/ |
Collections of Datasets Weka |
http://www.cs.waikato.ac.nz/~ml/weka/datasets.html |
noaa-27GB dataset |
ftp://ftp.ncdc.noaa.gov/pub/data/noaa/ |
Cornell Movie–Dialogs Corpus |
https://www.cs.cornell.edu/~cristian/Cornell_Movie-Dialogs_Corpus.html |
AREALM Dataset |
https://drive.google.com/file/d/0B1jY75xGiy7eZV93eGxlZ2YwSFE/view |
AREAWATER |
https://drive.google.com/file/d/0B1jY75xGiy7eR3VpNC1XMzB5cWs/view |
EDGES SpatialHadoop Dataset |
https://drive.google.com/file/d/0B1jY75xGiy7eOG85SHM3TzFVd2c/view |
ZCTA5 Dataset |
https://drive.google.com/file/d/0B1jY75xGiy7eLWhNUll0ZWFRT0U/view |
OpenStreetMap Datasets |
https://drive.google.com/file/d/0B1jY75xGiy7eNjJuRy1KWjRieVU/view |
Machine Learning Datasets |
https://blog.bigml.com/2013/02/28/data-data-data-thousands-of-public-data-sources/ |
Hackspark Dataset |
http://hackspark.github.io/environment/download-sample-data/ |
The USC-SIPI Image Database |
http://sipi.usc.edu/database/ |
Criteo Labs Terabyte Dataset |
http://labs.criteo.com/2013/12/download-terabyte-click-logs/ |
Data Science Datasets |
http://blog.mortardata.com/post/67652898761/6-dataset-lists-curated-by-data-scientists?goback=%2Egde_4989164_member_5820574831720022020#%21 |