UGTag - a moprhological tagger for Ukrainian language

Description

Morphological Dictionaries

Getting the Program

Description

UGTag is a program for annotation texts with morphological and syntactical information.  It was written within Polish-Ukrainian Parallel Corpus project, coordinated by dr Natalia Kotsyba. Main features of the program include:

Figure 1. UGTag GUI client with small text file opened. Sentence marks and syntax highlighting is switched off.

Figure 2. Dialog for editing visual appearance of tagged text. Words with zero (i.e. unrecognized) one or more interpretations can are distinguished, as well as punctuation marks or numbers. The user can tune font face, size and style (bold, italic or underline), as well as background and/or text  colors.

Morphological Dictionaries

UGTag has  a dictionary-based tagger. Dictionaries are stored in binary format (for faster loading), but they can be  as well imported from an XML format. UGTag default dictionary is based on Ukrainian Grammatical Dictionary, which was created under guidance of dr. Igor Shevchenko fom ULIF NANU. It was, however, significantly (and I mean really that) reorganized, to be compatible with MultiLex-East specifications. Work on the Ukrainian part of the specifications took considerable amount of time. 

Since the UGS data is not publicply available, UGTag cannot be readily released to wide public. There are plans, however, to make it an open-code. Having program in hand, users are encouranged to make their own dictionaries. 

Figure 3. Dictionary list editor. Users can create new dictionaries, load existing ones from external files, and edit dictionaries' contents.

Figure 4. Dictionary editor. The user can change the name, optional descriptions and the list of authors.

Figure 5. Dictionary content editor. Basically a dictionary is a list of lemmas and a list of entries. Each entry is in essence a word form with bounded grammatical interpretation and reference to  word lemma. 

Extending Dictionaries

Sometimes data present in the dictionary is not enough. In this case UGTag makes a list of words that were not found in dictionaries and presents them in the following dialog.

Figure 6. Dialog for editing unrecognized word list. Fortunately, all words were found.

Figure 7. Dialog for adding interpretation of unknown words directly to user dictionary. 

Getting the program

The project is hosted on SourceForge.net, https://sourceforge.net/projects/ugtag

 

Last updated, 17 October, 2009,  by Andriy Mykulyak. See also PolUkr and project time-line.