Text Mining

Text Mining

Reuters-21578 Text Categorization http://kdd.ics.uci.edu/databases/reuters21578/reuters21578.html
Reuters Transcribed Subset Data Set http://archive.ics.uci.edu/ml/datasets/reuters+transcribed+subset
NYSK Data Set https://archive.ics.uci.edu/ml/datasets/NYSK
SMS Spam Collection Data Set https://archive.ics.uci.edu/ml/datasets/sms+spam+collection
Text Classification Data Sets http://sci2s.ugr.es/keel/textClassification.php#sub2
Hansards Dataset http://www.isi.edu/natural-language/download/hansard/
Webkb Dataset http://www.cs.cmu.edu/afs/cs.cmu.edu/project/theo-20/www/data/webkb-data.gtar.gz
Twenty Newsgroups Data Set https://archive.ics.uci.edu/ml/datasets/Twenty+Newsgroups
Movie Review Data Set http://www.cs.cornell.edu/People/pabo/movie-review-data/
Multi-Domain Sentiment Dataset http://www.cs.jhu.edu/~mdredze/datasets/sentiment/
Latent Aspect Rating Analysis/Online Forum Mining and Analysis Datasets http://sifaka.cs.uiuc.edu/~wang296/Data/index.html
Opinosis Dataset http://kavita-ganesan.com/opinosis-opinion-dataset
OpinRank Dataset http://kavita-ganesan.com/entity-ranking-data
Restaurant Reviews Dataset http://www.cs.cmu.edu/~mehrbod/RR/
MovieLens Dataset https://grouplens.org/datasets/movielens/
Micropinion Generation Dataset http://kavita-ganesan.com/content/micropinion-generation-dataset
Corpora Dataset http://www.mad.disco.unimib.it/doku.php/research/corpora
Wikipedia XML Corpus Dataset(Download Data:Login Reqiured) http://www-connex.lip6.fr/~denoyer/wikipediaXML/
Extended Epinions Dataset http://www.trustlet.org/datasets/extended_epinions/
Text Categorization Corpora http://disi.unitn.it/moschitti/corpora.htm
MLComp Dataset http://scikit-learn.org/stable/auto_examples/text/mlcomp_sparse_document_classification.html
TechTC – Technion Repository of Text Categorization Datasets http://techtc.cs.technion.ac.il/
Weka Collections of Datasets http://www.cs.waikato.ac.nz/ml/weka/datasets.html
Text classification Datasets #35 https://drive.google.com/drive/u/0/folders/0Bz8a_Dbh9Qhbfll6bVpmNUtUcFdjYmF2SEpmZUZUcVNiMUw1TWN6RDV3a0JHT3kxLVhVR2M
20ng-Dataset http://ana.cachopo.org/datasets-for-single-label-text-categorization
COCO-Text: Dataset for Text Detection and Recognition https://vision.cornell.edu/se3/coco-text-2/