Reuters-21578 Text Categorization |
http://kdd.ics.uci.edu/databases/reuters21578/reuters21578.html |
Reuters Transcribed Subset Data Set |
http://archive.ics.uci.edu/ml/datasets/reuters+transcribed+subset |
NYSK Data Set |
https://archive.ics.uci.edu/ml/datasets/NYSK |
SMS Spam Collection Data Set |
https://archive.ics.uci.edu/ml/datasets/sms+spam+collection |
Text Classification Data Sets |
http://sci2s.ugr.es/keel/textClassification.php#sub2 |
Hansards Dataset |
http://www.isi.edu/natural-language/download/hansard/ |
Webkb Dataset |
http://www.cs.cmu.edu/afs/cs.cmu.edu/project/theo-20/www/data/webkb-data.gtar.gz |
Twenty Newsgroups Data Set |
https://archive.ics.uci.edu/ml/datasets/Twenty+Newsgroups |
Movie Review Data Set |
http://www.cs.cornell.edu/People/pabo/movie-review-data/ |
Multi-Domain Sentiment Dataset |
http://www.cs.jhu.edu/~mdredze/datasets/sentiment/ |
Latent Aspect Rating Analysis/Online Forum Mining and Analysis Datasets |
http://sifaka.cs.uiuc.edu/~wang296/Data/index.html |
Opinosis Dataset |
http://kavita-ganesan.com/opinosis-opinion-dataset |
OpinRank Dataset |
http://kavita-ganesan.com/entity-ranking-data |
Restaurant Reviews Dataset |
http://www.cs.cmu.edu/~mehrbod/RR/ |
MovieLens Dataset |
https://grouplens.org/datasets/movielens/ |
Micropinion Generation Dataset |
http://kavita-ganesan.com/content/micropinion-generation-dataset |
Corpora Dataset |
http://www.mad.disco.unimib.it/doku.php/research/corpora |
Wikipedia XML Corpus Dataset(Download Data:Login Reqiured) |
http://www-connex.lip6.fr/~denoyer/wikipediaXML/ |
Extended Epinions Dataset |
http://www.trustlet.org/datasets/extended_epinions/ |
Text Categorization Corpora |
http://disi.unitn.it/moschitti/corpora.htm |
MLComp Dataset |
http://scikit-learn.org/stable/auto_examples/text/mlcomp_sparse_document_classification.html |
TechTC – Technion Repository of Text Categorization Datasets |
http://techtc.cs.technion.ac.il/ |
Weka Collections of Datasets |
http://www.cs.waikato.ac.nz/ml/weka/datasets.html |
Text classification Datasets #35 |
https://drive.google.com/drive/u/0/folders/0Bz8a_Dbh9Qhbfll6bVpmNUtUcFdjYmF2SEpmZUZUcVNiMUw1TWN6RDV3a0JHT3kxLVhVR2M |
20ng-Dataset |
http://ana.cachopo.org/datasets-for-single-label-text-categorization |
COCO-Text: Dataset for Text Detection and Recognition |
https://vision.cornell.edu/se3/coco-text-2/ |