Assignments‎ > ‎

HW3: Finite state transducers

Due: October 12, 2011 by 11am (submission on Blackboard)

There are several stub files in the homework bundle, such as simple_english.xfst, numbers_to_numerals.xfst and finnish.xfst. Please modify these scripts when solving the problems; do not use different names for these scripts.

For submission, ensure that the final versions of your FST scripts are in a directory named hw3_<lastname>_<firstname> and create a zip (or tgz) file with the name hw3_<lastname>_<firstname>.zip that contains that directory and its contents.

Note that there are XFST scripts included in this homework that are licensed under the GNU General Public License. You must comply with this licensing, which essentially means that you won’t modify the script and then use it in a proprietary manner without making your revisions available freely. (If all you do is complete the homework and hand it in, you have nothing to worry about.)

A brief reminder of XFST options that will be handy for you to use:

  • -f loads a file, executes it and drops you back to the command line prompt

  • -l loads a file, executes it, and leaves you at the XFST prompt.

The latter is useful for interacting with your transducer, and displaying it (use the write dot command in XFST).

For help with using XFST, see the resources given on the course page, including the Karttunen reading, the tutorial, and the slides.

If any of instructions or problem descriptions do not make sense to you, please get in touch with the instructor right away.

1. Implementing simple English (15%)

For this problem, you will use XFST to implement the finite-state automaton you designed for homework 2, problem 4. Look at the file simple_english.xfst. You will implement the FSA by declaring appropriate regular expressions in this file. Notice that there are some comments giving you some direction as to how to proceed (lines beginning with a # symbol are comment lines that XFST ignores).

You will actually implement the FSA with XFST by declaring regular expressions. As an additional aid, I have provided a file called hint.xfst that contains example declarations that you should find useful in solving this problem. Once you get your script working, you should try out various strings to make sure they are correctly accepted or rejected. Work incrementally: first handle sentences like the man sleeps and then extend the complexity bit by bit, making sure that you don’t break things as you cover more sentences.

2. Odd b’s. (10%)

Create a transducer, by defining a regular expression in the file odd_b.xfst, for accepting

the following language over the alphabet {a, b}:

{ w | every odd position of w is a b }

Your transducer should accept the empty string.

This is the same language as homework 2, problem 3b.

3. a’s to b’s, part 1. (10%)

Define a transducer in the file a_to_b_part1.fst that maps a’s on its upper side to b’s on its lower side whenever they precede a b on the upper side, for strings in the alphabet {a, b}. For example, the string ababaaabab on the upper side should map to bbbbaabbbb on the lower.

Write your regular expression using only the basic operations for regular expressions – union (|), Kleene star (*) and plus (+) – and cross product (:). (You will use the replacement operator in the next problem.)

4. a’s to b’s, part 2. (10%)

Define a transducer in the file a_to_b_part2.fst that does the same mapping as for the previous problem, but this time use the replacement operator -> in your regular expression. (Hint: you will have to restrict the alphabet so that strings like acdab are not accepted. This will actually require composing two transducers using the .o. operator for composition.)

5. Number/numeral transducer. (20%)

Look at the XFST script numbers_to_numerals.xfst, which defines a transducer for relating English numbers from 1 to 99 to their corresponding numerals (one to ninety-nine). Build on this script to the point where you leave on the stack a network, defined as OneToOneThousand that relates the numbers and numerals up to 999/nine-hundred ninety-nine.

Note: this problem is adapted from Lauri Karttunen’s LSA course.

6. Finnish Noun Inflection. (35%)

This exercise is a simplified version of the Finnish Noun Inflection problem on the Beesley and Karttunen book web site. The goal is to build a lexicon that contains Finnish nouns inflected for number and case. To keep the problem reasonably small, we consider only three cases: singular nominative, singular partitive and plural partitive. For the same reason, we only consider two types of nouns: monosyllabic and some bisyllabic stems. Even with these limitations, the problem is challenging because the shape of the endings depends on the shape of the stem and the stems are subject to several regular alternations.

Note: you might be overwhelmed just looking at this problem, but there are actually lots of details that are here to help you, and the finnish.xfst stub file has clues to help you structure your solution. Don't panic!

The Facts

A Finnish noun begins with a stem. In all of the cases below, assume that the stem is identical with the nominative singular. A plural marker, if any, immediately follows the stem. After the stem and the possible plural marker comes one of several possible case endings. We consider only two cases: nominative and partitive. Singular nominative has no case marker. For the purpose of this excercise we assume that the plural and the partitive endings are:

  • I - Plural marker for cases other than Nominative, realized as i or j

  • Ta - Partitive Marker marker, realized as ta or a in words containing only back vowels.

The following list illustrates some of the possible combinations with the noun valo ‘light’:

  • valo - Nominative Singular. No case ending

  • valoa - Partitive Singular. The partitive marker Ta is realized here as a

  • valoja - Partitive Plural. The plural marker I is realized as j here. The partitive marker Ta is realized here as a.

The realization of the T in the partitive marker depends on the syllable structure of the word. After a monosyllabic stem such as maa ‘earth’, the T is realized as t as in maata. After a bisyllabic stem such as valo ‘light’, the T disappears as in valoa.

The plural marker I is realized as j between two vowels, otherwise it is realized as i.

With this information you should be able to write the rules that correctly realize the plural marker and the partitive suffix in all environments.

The remaining problem is that the stem of the noun also undergoes alternations. We will use only back-vowel stems in order to avoid having to deal with vowel harmony. The other stem alternations, Vowel Rounding, Vowel Lowering, Vowel Dropping, and Vowel Shortening are illustrated in the table below.

Vowel Rounding: Short a is rounded to an o in front of the plural marker I. Examples: tikkaa Partitive Singular, tikkoja Partitive Plural, kauppaa Partitive Singular, kauppoja Partitive Plural. This does not happen if the vowel nucleus of the preceding syllable starts with a rounded vowel (o or u). See the rule for Vowel Dropping.

Vowel Lowering: Short i is lowered to e in front of the plural marker I. Examples: vahtia Partitive Singular, vahteja Partitive Plural, pappia Partitive Singular, pappeja Partive Plural. See the rule for Vowel Dropping.

Vowel Dropping: A short a is deleted in front of the plural marker I if the nucleus of the preceding syllable consists of, or begins with, a rounded vowel (u or o). Note the different behavior of kuoppa where the a is dropped and kauppa where the a is rounded to o in the plural. Examples: sotaa Partitive Singular, sotia Partitive Plural, kuoppaa Partitive Singular, kuoppia Partitive Plural.

Vowel Shortening: In front of the plural marker I, the long vowels aa, ee, ii, oo, uu, are shortened to a, e, i, o, u, respectively. The diphthongs uo and ie shorten to o and e, respectively. Examples: puuta Partitive Singular, puita Partitive Plural, suota Partitive Singular, soita Partitive Plural.

The Task

Your task is to write an xfst script, finnish.fst that takes the stems mentioned above, assembles them into morphotactically correct underlying Finnish forms, and then handles the realization of the surface forms according to the rules for the plural marker, the partitive endings, and stem-changing sketched above. You must combine the suffix realization rules with the stem alternation rules. Think about how to order the rules. It matters.

The noun stems you must handle are listed above, and they are also given in the file finnish_stems.txt. You need to start by creating an FST for these noun stems concatenated with the Number and Case morphemes. (Hint: you can use the file finnish_stems.txt to create an FST for just the noun stems using the read text command.  Alternatively, you can just list them out explicitly in a regular expression in your script.) Use the tags +Sg, +Pl, +Nom, +Part for marking number and case on the upper side. You should then end up with pairs like the following:







Tip: don’t forget about the distinction between the concatenation of several single character symbols and a multi-character symbol.

Once you have that working appropriately, create a series of replacement rules composed together that lead to the correct surface forms, and leave the resulting FST on the xfst stack for testing. The final result should contain pairs such as:







Verify that the lower side of the FST contains the properly inflected surface forms by terminating the script with the command:

print lower-words

Make sure you get all and only the forms given in the table above.

Jason Baldridge,
Sep 21, 2011, 1:57 PM