Porter Stemmer

Torbjörn Lager

provides
x-ozlib://lager/porter-stemmer/EnglishStemmer.so{native}
x-ozlib://lager/porter-stemmer/stem.exe
 

Purpose

This native functor creates a module that exports a function which performs stemming by means of the Porter stemming algorithm. Quoting Martin Porter himself:

The Porter stemming algorithm (or 'Porter stemmer') is a process for removing the commoner morphological and inflexional endings from words in English. Its main use is as part of a term normalisation process that is usually done when setting up Information Retrieval systems.

The algorithm was originally described in Porter, M.F., 1980, An algorithm for suffix stripping, Program, 14(3) :130-137. It has since been reprinted in Sparck Jones, Karen, and Peter Willet, 1997, Readings in Information Retrieval, San Francisco: Morgan Kaufmann, ISBN 1-55860-454-4.

A warning is in place: It is hard to do stemming without using a lexicon. If you are a linguist, the output from a Porter stemmer could easily make you upset. A stemmer such as this could still be useful, however.

This implementation simply links a C-implementation (written by Martin Porter) into Oz.

Installation

Download the package, and invoke ozmake in a shell as follows:

ozmake --install --package=lager-porter-stemmer.pkg

By default, all files of the package are installed in the user's ~/.oz directory tree. In particular, all modules are installed in the user's private cache.

Usage

import Porter at 'x-ozlib://lager/porter-stemmer/EnglishStemmer.so{native}'      
 ... 
{Porter.stem +S1 ?S2} 

Example

For example,

{Porter.stem "programming" S}

strips the suffix "ming" from "programming" and binds S to the remaining stem (i.e. to "program").

Example Application

The distribution also includes stem, a stand-alone demo application which reads and tokenizes a text file and prints the result of stemming each word to a file, or to stdout by default. It can be invoked in the following way:

stem -i test.txt

Invoke stem--help to see what options are available.

Related Information

Here is a link to the ‘official’ home page for distribution of the Porter Stemming Algorithm, written and maintained by its author, Martin Porter:

http://www.tartarus.org/~martin/PorterStemmer/

Among other things, you'll find a pointer to resources for creating stemmers for other languages.


Torbjörn Lager