CogComp's Natural Language Processing Libraries and Demos: Modules include lemmatizer, ner, pos, prep-srl, quantifier, question type, relation-extraction, similarity, temporal normalizer, tokenizer, transliteration, verb-sense, and more.
This software extracts relations commas participate in, expanding on previous work in this area. Commas and the surrounding sentence structure often express relations that are essential to understanding the meaning of the sentence.
There are 2 models:
There are 2 sources for annotation data:
data/corpus
):
A set of around a 1000 sentences from section 00 of the PTB in which all the commas have been
labeled with their roles. The comma that were labeled as Other have been refined and the annotations are
in data/otherFile.txt
data/Bayraktar-SyntaxToLabel
):Execute ./scripts/annotate.sh
in the project directory to annotate the commas in the data/infile.txt
and receive output in the data/outfile.txt
.
You can edit the infile
to add more sentences. Each sentence must be on a different line.
NB: This script requires Maven to be installed.
Run ClassifierComparison
to get the performance of different models as evaluated over 5-fold cval.
Use CommaLabeler
to obtain a comma View
for a sentence represented as a TextAnnotation
(must have the views required to extract features for the classifier).
If you use this software please cite our work:
@inproceedings{arivazhagan2016labeling,
title={Labeling the Semantic Roles of Commas.},
author={Arivazhagan, Naveen and Christodoulopoulos, Christos and Roth, Dan},
booktitle={AAAI},
pages={2885--2891},
year={2016}
}