Collocations are units of words in which if the words were separated they would have a different definition than the unit itself. Some examples of collocations are: "a little bit", "United States of America", and "school bus". These are strings of words that represent a single concept but whose individual components represent a different concept. The goal of this work is to develop methods to identify collocation in raw text. In this work, we explored using measures of assocation.
Ngram Statistics Package (NSP)
LogLikelihood Module for 3-grams
LogLikelihood Module for 4-grams
LogLikelihood Module for 5-grams
LogLikelihood Modeling Module for 3-grams
LogLikelihood Modeling Module for 4-grams
The Ngram Statistics Package (Text::NSP) - A Flexible Tool for Identifying
Ngrams, Collocations, and Word Associations.
Ted Pedersen, Satanjeev Banerjee, Bridget T. McInnes, Saiyam Kohli,
Mahesh Joshi, and Ying Liu.
Appears in the Proceedings of Multiword Expressions: from Parsing and
generation to the Real World (MWE), an ACL HLT 2011 Workshop. June 23,
2011, pp. 131 - 133, Portland, Oregon. (Demonstration System).
Extending the Log Likelihood Measure to Improve Collocation Identification
Bridget Thomson McInnes. Master of Science Thesis. Department of Computer
Science, University of Minnesota, Duluth, December, 2004.
Last modified 25/08/2014