Collocations

About

Collocations are units of words in which if the words were separated they would have a different definition than the unit itself. Some examples of collocations are: "a little bit", "United States of America", and "school bus". These are strings of words that represent a single concept but whose individual components represent a different concept. The goal of this work is to develop methods to identify collocation in raw text. In this work, we explored using measures of assocation.

Software

Ngram Statistics Package (NSP)

LogLikelihood Module for 3-grams

LogLikelihood Module for 4-grams

LogLikelihood Module for 5-grams

LogLikelihood Modeling Module for 3-grams

LogLikelihood Modeling Module for 4-grams

Publications

The Ngram Statistics Package (Text::NSP) - A Flexible Tool for Identifying Ngrams, Collocations, and Word Associations. Ted Pedersen, Satanjeev Banerjee, Bridget T. McInnes, Saiyam Kohli, Mahesh Joshi, and Ying Liu. Appears in the Proceedings of Multiword Expressions: from Parsing and generation to the Real World (MWE), an ACL HLT 2011 Workshop. June 23, 2011, pp. 131 - 133, Portland, Oregon. (Demonstration System).

Extending the Log Likelihood Measure to Improve Collocation Identification Bridget Thomson McInnes. Master of Science Thesis. Department of Computer Science, University of Minnesota, Duluth, December, 2004.

Last modified 25/08/2014