Assignments‎ > ‎

HW2: FSAs and Scala Programming

DUE: September 21, 2011 by 11am (submission on Blackboard) and 12pm for written answers

Your starting point is a set of stub programs that you can download from the course website in the file These are incomplete implementations that handle a few things for you (such as code to test your code), and have comments embedded in them to help you solve the problems.

Your submission will be a zipped up file with your implementations for each problem in it. Submit it on Blackboard as <lastname>_<firstname> (tgz is fine too). There will also be some answers to hand in on paper, which you should submit at the beginning of class on the due date.

If any of the instructions don’t make sense to you, please get in touch with the instructor right away.

1. Map data structures (a.k.a. associative arrays)

For this problem, you will write a program that takes a list of names and room numbers on the
command line, adds them to entries from a Map relating people to their rooms, and then creates a reverse Map that relates rooms to the people who are in them.

Look at the stub program rooms.scala. A Map named defaultRoomNumbers has already been defined with a few entries. As in previous problems, there are some print statements like print ’’Part (a)’’ – make sure not to delete those.

Note: you must use immutable Maps for this problem.

(a) The arguments on the command line are to be expected as a list of (person number) pairs, like Bill 315 Angela 412 Susan 325. Get this information from the command line (making sure to check that an even number of arguments has been provided), and use it to add create a new Map that contains both these person to room pairings and those in the defaultRoomNumbers Map. Then print out who is in which room, alphabetically by name.

Hint: use the mod function % to find out whether there are an even number of arguments. (Try 5 % 2 and 4 % 2 in the Scala REPL and you’ll see what it does.)

(b) Some people might share the same room. Create a new map that maps rooms to the people in them, so it will have room numbers as keys, and lists of people as values.

Here’s an example of how your program should work on some example input:

$ scala rooms.scala Bill 315 Angela 412 Susan 325

Part (a)
Angela: Room 412
Bill: Room 315
Jane: Room 312
Sam: Room 312
Susan: Room 325
Ted: Room 325

Part (b)
Room 312: Sam,Jane
Room 315: Bill
Room 325: Susan,Ted
Room 412: Angela

2. Word-by-word translation
For this problem you’ll create a very simple machine translator that looks at each word in a source language sentence and maps it to a unique word in a target language. This will involve working with the file system and with Maps. Use the stub file translate.scala for this.

You are given a (very) small translation dictionary in the file por2eng.txt. You must read this in from the command line (as the first argument) to create a Map that maps Portuguese words to the English words, as specified in por2eng.txt. Then, using that Map, you must translate any remaining arguments on the command line (which you can expect to be Portuguese sentences) and translate them one by one to English.

As with the previous problem, you should use only immutable data structures.

Here is an example call to translate.scala and the output it should produce for the given input.

$ scala translate.scala por2eng.txt "a menina ouviu o cachorro ontem" "o cachorro viu o gato hoje"
Portuguese: a menina ouviu o cachorro ontem
English:    the girl heard the dog yesterday

Portuguese: o cachorro viu o gato hoje
English:    the dog saw the cat today

3. Finite state automata
Draw (on paper) deterministic finite state automata that recognize the following languages, where the alphabet is {a, b}:

(a) { w | w contains at least three b’s }

For example, abbab and ababaababbb should be accepted, but aaabaaa and ab should not.

(b) { w | every odd position of w is a b }

For example, bab and bbbab should be accepted, but ab and bba should not. Your machine can accept the empty string.

(c) { w | w does not contain the substring bba }

For example, aaab, ababa, and abb should be accepted, but bbabb and abba should not.
4. Simple English automaton
Draw (on paper) a finite state automaton that recognizes strings of the following sort:
  • the man sleeps
  • the men eat apples
  • the old men give John apples
  • the very very old man eats apples
  • the very old old man gives John apples
  • the man sleeps and the man eats apples
Basically, your machine should be able to accept either the man or the men as subject (and possibly modified by (very) old), then sleep, eat, or give as a verb, and then any further arguments of the given verb, as appropriate. Examples of things your machine should not accept include:
  • the man sleep
  • the men eats apples
  • the very men sleep
  • the old very man sleeps
  • the man sleeps apples
  • the man eats
  • the man gives John
  • the man gives apples
  • the man sleeps and the man
Your machine can be non-deterministic. Try to use as few states as possible.

Write out a brief answer to the following question: how easy or difficult would it be to add
the indefinite article a to the machine in order to allow sentences like a man sleeps without
accepting a man sleep?

5. Implement a recognizer for deterministic finite state automata
For this problem, you will write code that reads a file containing the information needed to construct a deterministic finite state automaton, construct the FSA, and use it to judge whether a series of inputs are recognized or rejected by it. This problem is likely to be considerably more challenging that the previous ones. It will require understanding a non-trivial algorithm and using Scala data structures to implement it. There are many different ways to implement the data structures and the algorithm.

Work with the stub file runFsa.scala.
The input format is simple. The following is the example given in fsa2.txt.

start 0
final 3
0 to 1 on a
0 to 2 on b
1 to 0 on a
1 to 3 on b
2 to 3 on a
2 to 0 on b
3 to 2 on a
3 to 1 on b

The first line gives the start state, the second line gives the list of final states, and the remaining lines define the arcs (e.g. "3 to 1 on b" means that there is a transition from state 3 to state 1 for input symbol b).

(a) Draw this FSA on paper and give three strings it accepts and three it rejects (using the alphabet a,b). Provide an English description of the language defined by the FSA (like in problem 3).

(b) Write code to read in any FSA in this format into appropriate data structures, using fsa1.txt and fsa2.txt as test cases. The most important data structure is the transition matrix. My suggestion is to do this as a Map[Int, Map[String, Int]], such that each key in the transition matrix is a source state, and when you access that key in the transition matrix, it gives you back a Map[String, Int], the keys of which are symbols of the alphabet with their corresponding values being the state that it takes you to. So, if you have called this transitionMatrix, you should be able to do transitionMatrix(0)("b") for fsa2.txt and get the value 2 (corresponding to the "0 to 2 on b" arc defined in fsa2.txt). However, you are welcome to do this in a different way if you prefer.

Based on the above processing, output the alphabet (sigma) and the list of states.  (See below for format.)

(c ) Implement the D-RECOGNIZE algorithm in Jurafsky and Martin, p 29, Figure 2.12, by writing a function dRecognize that takes an input string and returns a Boolean that indicates whether the FSA accepts or rejects the input. Apply this function to all command line arguments after the argument giving the input FSA (e.g. args(1) and beyond). Your output should look like the following, using fsa1.txt as an example.

$ scala runFsa.scala fsa1.txt aabaaa b aaa abab
Sigma: a b
States: 0 1

Testing input strings.
Accepted: aabaaa
Accepted: b
Accepted: aaa
Rejected: abab

(d) Create input files for the FSAs you drew for problem 3, saving them as fsa3a.txt, fsa3b.txt, and fsa3c.txt. Test each with three strings that you know to be in the language and three that you know to not be in the language. Did you discover any problems this way that caused you to revise the FSAs? (Or that caused you to fix a bug in your runFsa.scala code?)

Make sure to include the fsa3 files in your submission.

(e) How long did this homework take you to complete?

Note: If you want to geek out, you can do more things, such as adding checks to make sure the input FSA is deterministic, writing a function that determinizes a non-determistic FSA, enabling epsilon transitions, or recognizing a non-deterministic FSA using backtracking or dynamic programming. You should do this in a separate implementation from runFsa.scala, e.g. extraFsa.scala. Feel free to consult with me if you are interested in any of these!
Jason Baldridge,
Sep 7, 2011, 7:38 PM