| Stanford Text2Scene Spatial Learning Dataset/Scenes and Descriptions for Text to Scene Generation |
https://nlp.stanford.edu/data/text2scene.shtml |
| MSMARCO-Microsoft Machine Reading Comprehension Dataset |
http://www.msmarco.org/ |
| NewsQA Dataset |
https://github.com/Maluuba/newsqa |
| WikiQA Corpus |
https://www.microsoft.com/en-us/download/details.aspx?id=52419&from=http%3A%2F%2Fresearch.microsoft.com%2Fen-us%2Fdownloads%2F4495da01-db8c-4041-a7f6-7984a4f6a905%2Fdefault.aspx |
| The Blog Authorship Corpus |
http://u.cs.biu.ac.il/%7Ekoppel/BlogCorpus.htm |
| Amazon Fine Food Reviews |
https://www.kaggle.com/snap/amazon-fine-food-reviews |
| ClueWeb09 FACC Dataset |
http://lemurproject.org/clueweb09/FACC1/ |
| Google Books Ngram Viewer Dataset |
http://storage.googleapis.com/books/ngrams/books/datasetsv2.html |
| Reuters Corpora (RCV1, RCV2, TRC2) |
http://trec.nist.gov/data/reuters/reuters.html |
| SouthParkData Dataset |
https://github.com/BobAdamsEE/SouthParkData |
| DBpedia Dataset |
http://wiki.dbpedia.org/Datasets/NLP |
| i2b2 NLP Research Data Sets |
https://www.i2b2.org/NLP/DataSets/ |
| Lexical Inference Datasets |
http://u.cs.biu.ac.il/~nlp/resources/downloads/lexical-inference-datasets/ |
| DeepDive Open Datasets |
http://deepdive.stanford.edu/opendata/ |
| Stanford Datasets from arXiv |
http://snap.stanford.edu/data/index.html#citnets |
| CrisisNLP Dataset |
http://crisisnlp.qcri.org/ |
| Enron Email Dataset |
https://www.cs.cmu.edu/~./enron/ |
| Marcusyyy/NewYorkTimes_word2vec Dataset |
https://data.world/marcusyyy/newyorktimes-word-2-vec |
| Crowdflower/Airline Twitter Sentiment Dataset |
https://data.world/crowdflower/airline-twitter-sentiment |
| Fivethirtyeight/Presidential Commencement Speeches Dataset |
https://github.com/fivethirtyeight/data |
| Annotated Datasets |
http://clair.si.umich.edu/iopener/dataset.html |
| Congressional Speech Dataset |
http://www.cs.cornell.edu/home/llee/data/convote.html |
| MIMIC-III Dataset |
https://mimic.physionet.org/ |
| CLEF eHealth Dataset |
https://sites.google.com/site/clefehealth/ |
| MedNLPDoc Dataset |
https://sites.google.com/site/mednlpdoc/ |
| ITU Turkish Natural Language Processing Datasets |
http://tools.nlp.itu.edu.tr/Datasets |
| Clinical Natural Language Processing Dataset |
http://faculty.washington.edu/melihay/LING575/Ling575_ClinicalNLP.html |
| Niderhoff/nlp-datasets |
https://libraries.io/github/niderhoff/nlp-datasets |