Stanford Text2Scene Spatial Learning Dataset/Scenes and Descriptions for Text to Scene Generation |
https://nlp.stanford.edu/data/text2scene.shtml |
MSMARCO-Microsoft Machine Reading Comprehension Dataset |
http://www.msmarco.org/ |
NewsQA Dataset |
https://github.com/Maluuba/newsqa |
WikiQA Corpus |
https://www.microsoft.com/en-us/download/details.aspx?id=52419&from=http%3A%2F%2Fresearch.microsoft.com%2Fen-us%2Fdownloads%2F4495da01-db8c-4041-a7f6-7984a4f6a905%2Fdefault.aspx |
The Blog Authorship Corpus |
http://u.cs.biu.ac.il/%7Ekoppel/BlogCorpus.htm |
Amazon Fine Food Reviews |
https://www.kaggle.com/snap/amazon-fine-food-reviews |
ClueWeb09 FACC Dataset |
http://lemurproject.org/clueweb09/FACC1/ |
Google Books Ngram Viewer Dataset |
http://storage.googleapis.com/books/ngrams/books/datasetsv2.html |
Reuters Corpora (RCV1, RCV2, TRC2) |
http://trec.nist.gov/data/reuters/reuters.html |
SouthParkData Dataset |
https://github.com/BobAdamsEE/SouthParkData |
DBpedia Dataset |
http://wiki.dbpedia.org/Datasets/NLP |
i2b2 NLP Research Data Sets |
https://www.i2b2.org/NLP/DataSets/ |
Lexical Inference Datasets |
http://u.cs.biu.ac.il/~nlp/resources/downloads/lexical-inference-datasets/ |
DeepDive Open Datasets |
http://deepdive.stanford.edu/opendata/ |
Stanford Datasets from arXiv |
http://snap.stanford.edu/data/index.html#citnets |
CrisisNLP Dataset |
http://crisisnlp.qcri.org/ |
Enron Email Dataset |
https://www.cs.cmu.edu/~./enron/ |
Marcusyyy/NewYorkTimes_word2vec Dataset |
https://data.world/marcusyyy/newyorktimes-word-2-vec |
Crowdflower/Airline Twitter Sentiment Dataset |
https://data.world/crowdflower/airline-twitter-sentiment |
Fivethirtyeight/Presidential Commencement Speeches Dataset |
https://github.com/fivethirtyeight/data |
Annotated Datasets |
http://clair.si.umich.edu/iopener/dataset.html |
Congressional Speech Dataset |
http://www.cs.cornell.edu/home/llee/data/convote.html |
MIMIC-III Dataset |
https://mimic.physionet.org/ |
CLEF eHealth Dataset |
https://sites.google.com/site/clefehealth/ |
MedNLPDoc Dataset |
https://sites.google.com/site/mednlpdoc/ |
ITU Turkish Natural Language Processing Datasets |
http://tools.nlp.itu.edu.tr/Datasets |
Clinical Natural Language Processing Dataset |
http://faculty.washington.edu/melihay/LING575/Ling575_ClinicalNLP.html |
Niderhoff/nlp-datasets |
https://libraries.io/github/niderhoff/nlp-datasets |