CogComp's Natural Language Processing Libraries and Demos: Modules include lemmatizer, ner, pos, prep-srl, quantifier, question type, relation-extraction, similarity, temporal normalizer, tokenizer, transliteration, verb-sense, and more.
Given a label ontology, and textual descriptions of those labels, Dataless-Classifier is capable of classifying arbitrary text into that ontology.
It is particularly useful in those scenarios where it is difficult/expensive to gather enough training data to train a supervised text classifier. Dataless-Classifier utilizes the semantic meaning of the labels to bypass the need for explicit supervision. For more information, please visit our main project page.
Some key points:
TextAnnotation
respectively, and it requires the presence of a TOKENS view with the end-user’s desired Tokenization.Dataless Classification requires the end-user to specifcy a Label hierarchy (with label descriptions), which it classifies into. The Label hierarchy needs to be provided using a very specific format:
labelID \t labelName
format
(label id can be any ID specific to your system, however we use the label name itself as ID in our sample hierachy for readibility)parentLabelID \t childLabelID1 \t childLabelID2 \t ...
format.labelID \t labelDescription
format.We provide a sample 20newsgroups hierarchy with label descriptions inside data/hierarchy/20newsgroups, where:
We also provide improved 20newsgroups label descriptions in labelDesc_Kws_embellished.txt which corresponds to the label descriptions used in [2], whereas the labelDesc_Kws_simple.txt corresponds to the label descriptions used in [1].
ESA and Word2Vec Embeddings are fetched from the DataStore on demand.
A sample config file with the default values has been provided in the config folder .. config/project.properties
To check whether you are properly set to use the project or not, run:
mvn -Dtest=ESADatalessTest#testPredictions test
to test the ESADatalessAnnotator.mvn -Dtest=W2VDatalessTest#testPredictions test
to test the W2VDatalessAnnotator.If you use this software for research, please cite the following papers:
[1] Chang, Ming-Wei, et al. “Importance of Semantic Representation: Dataless Classification.” AAAI. Vol. 2. 2008.
[2] Song, Yangqiu, and Dan Roth. “On Dataless Hierarchical Text Classification.” AAAI. Vol. 7. 2014.