Brill Tagger

Torbjörn Lager



This is an implementation in pure Oz of a Brill-style rule-based tagger (Brill 1995). The tagger is an abstract class (in the sense that it does not define all the methods that it calls) and you will need to subclass it in order to do something useful. In particular, a derived class is expected to contain (or import) the rules by means of which the tagger will be operating and thus it encapsulates everything which is specific to a particular language and application. The package includes two examples which show how to build a part-of-speech tagger for English, as well as a combined part-of-speech tagger and noun-phrase chunker, also for English. The accuracy of the part-of-speech tagger should be around 95-97% (i.e. 95-97% of the word tokens in arbitrary English text receive the correct tag). An overview of the tagset is available. The accuracy of the chunker is probably around 91-92 percent. Injected with more rules it can be expected to land just above 93%. At least this is the result that Ramshaw and Marcus (1995) claim for this kind of chunker.

This package does not (yet) include a transformation-based learner. To train your own taggers, see for example my µ-TBL system or the fnTBL toolkit. These systems generate rules with a different syntax, but the conversions are straightforward and can be done automatically.

A short description of how a Brill tagger/chunker works follows. A reader only interested in using a part-of-speech tagger or noun-phrase parser could probably skip this part. On the other hand, a reader wanting to know even more should read (Brill 1995) and (Ramshaw and Marcus 1995). A reader wanting to really explore what this paradigm has to offer should consult my Transformation-Based Learning bibliography. It turns out that wonderful things can be achieved: not only part-of-speech tagging and chunking, but also word sense disambiguation, dialogue act tagging, morpheme-phoneme conversion, etc.

How the Part-of-Speech Tagger Works

In the first step, a lexical lookup module assigns exactly one tag to each occurrence of a word (usually the most frequent tag for that wordform), disregarding context. Here is the single rule responsible for this:

init(pos wd @lex.lookup)

The lexicon has been generated from tagged corpora and contains more than 93,000 entries.

In the seconds step, words not in the lexicon are handled separately, by the guesser. The guesser starts by assigning tag NNP to unknown capitialized words, and NN to others. This is the relevant rule:

replace(pos 'unknown' 'NNP' 'NN') # [isCap#[0]]

Then replacement rules are applied that may change these tags on the basis of a simple suffix analysis. Here is a guessing rule:

replace(pos 'NN' 'JJ') # [suffix#less#[0]]  

The rule means "replace tag NN with JJ if the word in question ends in "less"".

In the fourth step, the rule application module proceeds to replace some of the tags with other tags, on the basis of what appears in the local context. Here is a typical context rule:

replace(pos 'VB' 'NN') # [canHave#'NN'#[0] pos#'DT'#[~1]]  

The rule means "replace tag VB with NN if the the word in question can have tag NN (according to the lexicon) and if the previous word is tagged DT".

The present system uses around 50 guessing rules and nearly 300 context rules. Both kinds of rules have been induced from tagged corpora by means of Transformation-Based Learning (TBL).

How the Noun-Phrase Chunker Works

A noun-phrase chunker tries to mark up all the (basic) noun phrases in a text. The idea behind this particular chunker - due to Ramshaw and Marcus (1995) - is to view chunking as a tagging problem, and to encode the chunk structure as tags attached to each word. Three tags - I, O and B - are used to indicate if a word occurrence is inside an NP, outside an NP, or on the border between two NPs, respectively.

Since the rules for noun-phrase chunking are meant to apply to part-of-speech tagged text, the chunker also includes a Brill part-of-speech tagger. Thus, the noun-phrase chunker contains all the rules that the tagger contains plus a number of rules for chunking.

The chunking steps do not introduce any new kinds of rules. The first step creates a new field for each word and initializes it with a default NP tag (i.e. I, O, or B) based on the part-of-speech that it has (according to the part-of-speech tagging phase). Thus:

init(np pos @pos2np_mapping.lookup)  

A sequence of chunking rules are then applied, that will replace NP tags with other NP tags, based on what the local context looks like. Here is a typical chunking rule:

replace(np 'I' 'O' ) # [np#'O'#[1] pos#'JJ'#[0]]  

The rule means "replace tag I with tag O if the next word is tagged O and the part of speech of the word in question is JJ". In the present system there are 100 rules of this kinds.


Download the package, and invoke ozmake in a shell as follows:

ozmake --install --package=lager-brill-tagger.pkg

By default, all files of the package are installed in the user's ~/.oz directory tree. In particular, all modules are installed in the user's private cache.



Module Tagger.ozf exports, on feature class, a class definition for a tagger for natural language.


init(Trace <= false)
An initialization method.
tag(Words Features ?Tag)
Tag gets bound to a record describing the attributes of Words.


The package also contains a part-of-speech tagger for English that is implemented by subclassing Tagger.


init(Trace <= false)
An initialization method.

+ methods inherited from Tagger.


The package also contains a combined part-of-speech tagger and noun-phrase chunker for English that is implemented by subclassing Tagger.


init(Trace <= false)
An initialization method.

+ methods inherited from Tagger.

Example Applications

The distribution includes two example applications: a part-of-speech tagger and a noun-phrase chunker. These applications use the EnglishTagger module and EnglishChunker module, respectively. They also use the modules SentenceSplitter and EnglishTokenizer, which will have to be installed in the user's cache.

Given the file test.txt the invocation

tag --in=test.txt --out=test.tagged.txt --statistics

will produce the file test.tagged.txt. It will also print the following statistics:

2321 words (90 sentences) tagged in 3.02 seconds (769 W/s).
Reading: 0.0 s.
Splitting: 0.06 s.
Tokenizing: 0.16 s.
Tagging: 2.8 s.

(This example, by the way, was run on my old portable 166Mhz PC. On a more decent computer it would be a lot faster.)

The invocation

chunk --in=test.txt --out=test.chunked.txt

will produce the file test.chunked.txt.


Brill, Eric (1995) Transformation-Based Error-Driven Learning and Natural Language Processing: A Case Study in Part of Speech Tagging. Computational Linguistics, December 1995. (.ps)

Ramshaw, L. A. and Marcus, M. P. (1995) Text Chunking using Transformation-Based Learning. In Proceedings of the ACL Third Workshop on Very Large Corpora, June 1995, pp. 82-94. (.ps)

Torbjörn Lager