LIN386M Home‎ > ‎

XFST tutorial

This page is adapted from Lauri Karttunnen, Tamás Gaál, and André Kempe's XFST tutorial and Mark Gawron's XFST examples.

Xerox finite-state tool is a general-purpose utility for computing with finite-state networks. It enables the user to create simple automata and transducers from text and binary files, regular expressions and other networks by a variety of operations. The user can display, examine and modify the structure and the content of the networks. The result can be saved as text or binary files.

Introduction

XFST is a general-purpose Unix application for computing with finite-state networks, based on long-term Xerox research initiated by Ronald M. Kaplan and Martin Kay at the Xerox Palo Alto Research Center in the early 1980s. It is a successor to two earlier interfaces of the same type: IFSM and FSC. The IFSM interface was created at PARC by Lauri Karttunen and Todd Yampol around 1990-92. The FSC tool was developed at XRCE in 1994-95 by Pasi Tapanainen. XFST combines the best features of its two predecessors. It includes many extensions to the regular expresson calculus introduced at XRCE.

XFST can read finite-state networks from binary files and compile them from regular expressions and text files. The networks can be simple networks or finite-state transducers. They can be combined by means of a variety of operations, such as union and composition. The resulting networks can be saved into a binary or a text file. The user may apply a network to strings to determine whether the string is accepted by the network or to transform it to another string if the network is a transducer. XFST provides many ways to get information about a network and to inspect and modify its structure.

We introduce XFST with a brief tour that shows how the application is launched in Unix and some of the things that it can do. In the later sections we discuss the command set and other related issues in more depth. For help with regular expressions and other issues concerning the structure and application of finite-state networks, please consult the webpage for the XFST book by Kenneth Beesley and Lauri Karttunen.

Tutorial

Running XFST

XFST is launched with the command 'xfst'. The launch command can be modified by a number of optional flags. For example, 'xfst -f scriptfile' executes the commands in scriptfile and exits upon completion. If no additional modifiers are given, the 'xfst' command starts the application in an interactive loop waiting for commands from the user:

> xfst
Copyright (c) Palo Alto Research Center 2001-2007
Xerox Finite-State Tool, version 2.4.9

Type "help" to list all commands available or "help help" for further help.

Alternate way of running xfst with command history: You can invoke xfst as 'rlwrap xfst' and that will retain the command history in the program through up and down arrow keys (similar to the bash shell). You can also add an alias in your .bashrc file like this:

alias xfst='rlwrap xfst'

The xfst[0]: prompt indicates that the xfst application is waiting for a command. The number 0 indicates that the network stack is empty.

The great majority of XFST commands are about adding networks to the stack, replacing some or all of the stack by the result of some operation, and saving the stack into a file. Another set of commands is concerned with the network currently on the top of the stack, that is, the network that was most recently added to the stack. We give a few simple examples of both types of commands.

We present two examples. In the first example, the user creates a network with the help of XFST and saves the result into a file. In the second example, we retrieve the network from the file to use in another operation. We then show how a sequence of commands can be run by a script and aliased to a single command.

Making and saving a network

A network can be added to the stack by loading a previously compiled network from a binary file or by compiling a new one from some text source. In either case, the network becomes the topmost one on the stack. In this example, we compile a network from a regular expression using the command 'read regex'. We type

xfst[0]: read regex [%0 | 1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | 9] ;

This expression denotes the language that consists of the ten decimal digits. Because 0 is a special symbol (epsilon) in a regular expression, it is necessary to prefix it here with %, the escape character, to have it interpreted as a digit. The semicolon at the end of the line closes the regular expression. When the command is terminated with a carriage return, XFST responds:

344 bytes. 2 states, 10 arcs, 10 paths. Label Map: Default.
xfst[1]:

showing that the network representing this ten-word language consists of 2 states and 10 arcs. The new prompt, xfst[1]: shows that we now have one network on the stack. The command print net displays the structure of the network on the screen:

xfst[1]: print net
Sigma: %0 1 2 3 4 5 6 7 8 9
Size: 10, Label Map: Default
Net:
Flags: deterministic, pruned, minimized, epsilon_free, loop_free
Arity: 1
s0:   %0 -> fs1, 1 -> fs1, 2 -> fs1, 3 -> fs1, 4 -> fs1, 5 -> fs1, 6 -> fs1, 7 -> fs1, 8 -> fs1, 9 -> fs1.
fs1:  (no arcs)

Here the print net command displays the states of the network: s0 (a non-final state), fs1 (a final state); and the labeled arcs leading from s0 to fs1. In addition, we see the symbol alphabet of the network (Sigma), the regular expression it was compiled from, and some characteristics of the network (Flags, Arity).

It is often convenient to give a network a name that can be used in a regular expression to refer to it. The command for that assignment is 'define':

xfst[1]: define Digit
defined Digit: 344 bytes. 2 states, 10 arcs, 10 paths. Label Map: Default.
xfst[0]:

The define command requires at least one argument: the symbol that is being defined, here Digit. If no further specification is given, the network on the top of the stack becomes the value of the defined symbol and is removed from the stack.

Another way to achieve exactly the same result is to include in the define command a regular expression that denotes the desired language or relation. In this case, the single command

xfst[0]: define Digit [%0 | 1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | 9] ;
defined Digit: 344 bytes. 2 states, 10 arcs, 10 paths. Label Map: Default.
xfst[0]:

would have exactly the same effect as the sequence of commands shown above. Here the stack remains empty. Note the closing semicolon that marks the end of the regular expression.

Once defined, the name Digit can be used in regular expressions to represent the language in question. To give a simple example, let us construct a transducer that converts US numerals to the European format. In US numerals the comma is used as a separator, the period marks the beginning of the decimal part. In Europe the convention is the opposite. Thus 1,000.00 in the US corresponds to 1.000,00 in Europe. A transducer that does this conversion can be defined as follows, using the defined Digit symbol:

xfst[0]: read regex  %. -> %, , %, -> %. || Digit _ Digit ;
1.6 Kb. 4 states, 41 arcs, Circular. Label Map: Default.
xfst[1]:

This transducer represents the parallel replacement of ”.” by ”,” and ”,” by ”.” between two digits.

To verify that the transducer does what it is supposed to do we can use the apply command. Because transducers are bidirectional, we must specify the direction of application. In this case, it is 'down'; that is, the US representation is on the “upper” side of the transducer:

xfst[1]: apply down 1,234.99
1.234,99
xfst[1]:

The apply command may also be used to take the input strings from a file instead of typing them directly. The file is processed line by line. Create a file in your directory called “US-num.txt” containing the following three numbers:

1,234.99
0.5
100,000,000

Assuming this file is available, you can run the following command:

xfst[1]: apply down < US-num.txt
Opening file US-num.txt...

1,234.99
1.234,99

0.5
0,5

100,000,000
100.000.000
Closing file US-num.txt...

In order to have the transducer available in the future, we can save it to a file. The command save writes all the networks currently on the stack into a single file. In this case, the file will contain just one network:

xfst[1]: save stack US-to-EU-num.fst
Opening 'US-to-EU-num.fst'
Closing 'US-to-EU-num.fst'
xfst[1]:

The command quit exits from XFST and returns back to Unix:

xfst[1]: quit
bye.

However, instead of quitting, we move on to the next topic.

Loading and using a stored network

In this example, we load the network we just created from a file to the stack, add another network on the top of the first one, and then perform an operation to replace both of them with the result of that operation.

Let us first empty the stack with 'clear stack' and then load the network back from the file.

xfst[1]: clear stack
xfst[0]: load US-to-EU-num.fst
Opening 'US-to-EU-num.fst'
Closing 'US-to-EU-num.fst'

We will now create another network by compiling a simple network from the same little text file we already used above

xfst[1]: read text < US-num.txt
Reading iso-8859-1 text from 'US-num.txt'

764 bytes. 20 states, 21 arcs, 3 paths. Label Map: Default.
Closing 'US-num.txt'
xfst[2]:

The read text command expects as its argument a name of a file containing a list of words, one entry per line. It compiles the word list into a network. The command 'print words' displays the content of the compiled word list.

xfst[2]: print words
1,234.99
100,000,000
0.5
xfst[2]:

As the prompt xfst[2]: indicates, we now have two networks on the stack. We can get more information about the stack with the 'print stack' command:

xfst[2]: print stack
0: 764 bytes. 20 states, 21 arcs, 3 paths. Label Map: Default.
1: 748 bytes. 4 states, 41 arcs, Circular. Label Map: Default.
xfst[2]:

The positions in the stack are numbered from 0 upwards. The net in position 0, the three-word network, is on top of the stack. Unary commands such as print net and print words apply to the top network on the stack, if not followed by some defined name. The net below it, in position 1, is the transducer converting US-style numbers to European-style numbers.

Instead of applying the transducer to strings one at the time, as we did before with the apply down command, we can now transduce the entire list of words at once with the compose net command:

xfst[2]: compose net
764 bytes. 20 states, 21 arcs, 3 paths. Label Map: Default.
xfst[1]:

The compose operation replaces the two networks on the stack by the result of the operation. Thus we now have just one network left. It is instructive to view its contents using the same print words command as before.

xfst[1]: print words
1<,:.>234<.:,>99
100<,:.>000<,:.>000
0<.:,>5
xfst[1]:

The result of the composition is a transducer. It denotes a relation, a mapping from one regular language into another one. On its “upper side”, the transducer has the three original US-style numbers, each mapped to a corresponding European-style on the “lower side” of the transducer. For the most part, the mapping is an identity relation because each digit is mapped to itself. The only difference is that periods are mapped to commas, and vice versa. The angle brackets indicate pairs of non-identical symbols: <.:,> indicates the correspondence of an upper-side period with a lower-side comma; similarly for <,:.>.

We can view the upper and lower languages of the relation independently. print upper-words displays the three US numbers; print lower-words shows what they have been transduced into:

xfst[1]: print lower-words
1.234,99
100.000.000
0,5
xfst[1]:

As before, we can use the apply commands to map strings on one side of the transducer to the corresponding strings on the other side. For example, apply up maps any one of the three lower-side European numbers to the corresponding upper-side string.

xfst[1]: apply up 0,5
0.5

We can also extract one of the other language from the relation. The command 'lower-side net' extracts from the transducer a simple automaton that contains just the three European numbers.

xfst[1]: lower-side net
764 bytes. 20 states, 21 arcs, 3 paths. Label Map: Default.
xfst[1]: print words
1.234,99
100.000.000
0,5

At this point, go ahead and quit out of xfst using the quit command:

xfst[1]: quit
bye.

Running XFST with a script

The examples we have given so far are about using XFST in an interactive mode, with the user typing commands on the prompt line. It is more convenient, for many purposes, to write a list of commands to be run in batch mode without any user interaction. To illustrate this possibility, we write a script that compiles the US-to-European transducer. and uses it to produce a file of European-style numbers from a file of US-style numbers.

A script is an ordinary text file that can be prepared with any text editor, such as Emacs. Here we create the script with the help of the simple Unix cat command. The command cat > digit.xfst writes the rest of the text up to the closing ^D (control-D) into the file digit.xfst.

# A script for producing a file of European numbers
# from a file of US numbers.

echo >> Defining Digit
define Digit [%0 | 1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | 9] ;

echo >> Defining number converter
read regex  %. -> %, , %, -> %. || Digit _ Digit ;

echo >> reading US numbers
read text < US-num.txt

echo >> Composing and extracting EU numbers
compose net
lower-side net

echo >> Creating EU-num.txt
write text > EU-num.txt
system ls -l EU-num.txt

echo >> Done.
quit
^D

The lines beginning with # are comments that are ignored by xfst. The echo command prints the rest of the line to the standard output when the script is executed. In complex scripts it is often useful to print progress reports during the execution.

Because the purpose of this script is to produce an output file, we verify at the end that the file has indeed been created. The system command sends the rest of the line to the Unix operating system. If the file is in the directory, the script was run successfully

To execute the script, we invoke xfst with the -f flag:

> xfst -f digit.xfst
>> Defining Digit
defined Digit: 344 bytes. 2 states, 10 arcs, 10 paths. Label Map: Default.
>> Defining number converter
1.6 Kb. 4 states, 41 arcs, Circular. Label Map: Default.
>> reading US numbers
Reading iso-8859-1 text from 'US-num.txt'

764 bytes. 20 states, 21 arcs, 3 paths. Label Map: Default.
Closing 'US-num.txt'
>> Composing and extracting EU numbers
764 bytes. 20 states, 21 arcs, 3 paths. Label Map: Default.
764 bytes. 20 states, 21 arcs, 3 paths. Label Map: Default.
>> Creating EU-num.txt
Opening 'EU-num.txt'
Closing 'EU-num.txt'
-rw-r--r-- 1 jbaldrid users 54 2007-02-05 23:52 EU-num.txt
>> Done.
bye.

The -f flag causes xfst to exit after the script has been run. If the user wishes to take over at that point and continue running xfst in an interactive mode, he can call the script with -l flag instead of -f.

Another non-interactive way of running xfst is to include a sequence of xfst commands directly on the Unix command line. Each xfst command must be enclosed in double quotes and preceded by an -e flag. If the final -stop flag is omitted, xfst executes the commands and continues running in an interactive mode.

> xfst -e "define Digit [%0|1|2|3|4|5|6|7|8|9] ;" \
-e "read regex  %. -> %, , %, -> %. || Digit _ Digit;" \
-e "read text < US-num.txt" \
-e "compose net" \
-e "lower-side net" \
-e "write text > EU-num.txt" -stop

Note the backslash at the end of each line. It signals to to the Unix operating system that the entire sequence of commands counts as a single command line.

Further examples from Mark Gawron

These examples highlight the differences in rule ordering and feeding and bleeding relationships between rules as well as parallel application of rules.

Rule ordering

The following command creates a transducer for the kaNpankamman example:

xfst[0]: read regex [ N -> m || _ p ] .o. [ p -> m || m _ ];
1.3 Kb. 4 states, 15 arcs, Circular. Label Map: Default.
xfst[1]:

Now use the regex:

xfst[1]: down N
N
xfst[1]: down Np
mm
xfst[1]: down kaNpan
kamman
xfst[1]: up kamman
kaNpan
kampan
kamman
xfst[1]: down kampan
kamman
xfst[1]: down kamman
kamman

The expression given above created the FST in one go. An alternative, which will be handy below, is to define two FSTs, one for each rule, and then refer to them rather than the expressions directly.

xfst[0]: define Ntom N -> m || _ p;
defined Ntom: 1.2 Kb. 3 states, 10 arcs, Circular. Label Map: Default.
xfst[0]: define ptom p -> m || m _ ;
defined ptom: 1.2 Kb. 2 states, 6 arcs, Circular. Label Map: Default.
xfst[0]: regex Ntom .o. ptom ;
1.3 Kb. 4 states, 15 arcs, Circular. Label Map: Default.
xfst[1]:

We now have the same FST on the stack that we had earlier. As we look at alternative ways of combining the rules, the definitions of Ntom and ptom will be handy.

Reordering rules

Changing the order in which the rules are applied stops the Ntom rule from feeding the ptom rule:

xfst[1]: regex ptom .o. Ntom ;
1.3 Kb. 4 states, 15 arcs, Circular. Label Map: Default.
xfst[2]:

What happens now?

 1. down kaNpan
 2. down kampan
 3. up kamman
 4. up kampan
 5. Why isn't //kampan// part of the answer to the previous question? 

As a way of exploring the last question, you can try composing the rule transducers with a transducer for a particular word. For example:

xfst[2]: regex ptom .o. Ntom .o. {kampan};
488 bytes. 7 states, 6 arcs, 1 path. Label Map: Default.
xfst[3]: print words
ka<N:m>pan
xfst[3]: regex Ntom .o. ptom .o. {kamman} ;
540 bytes. 8 states, 9 arcs, 3 paths. Label Map: Default.
xfst[4]: print words
kam<p:m>an
kamman
ka<N:m><p:m>an
xfst[4]:

Parallel Rules

Consider the following combination.

xfst[4]: regex N -> m || _ p ,, p -> m || m _ ;
1.3 Kb. 4 states, 15 arcs, Circular. Label Map: Default.
xfst[5]:

This applies the rules in parallel. Note the double comma (”,,”), required only for parallel rules with contexts. (Note that we cannot use the previous definitions of Ntom and ptom.)

What will happen now?

xfst[5]: down N
N
xfst[5]: down Np
mp
xfst[5]: down kaNpam
kampam
xfst[5]: up kamman
kampan
kamman
xfst[5]: up kampan
kaNpan
xfst[5]: down kampan
kamman
xfst[5]: down kamman
kamman

Ponder these. Note that the results do differ from both of the two previous versions.

Loops and parallelism

Consider the following language: a → b , b → a. This turns a's into b's and b's into a's:

xfst[5]: regex a -> b, b -> a ;
348 bytes. 1 state, 3 arcs, Circular. Label Map: Default.
xfst[6]: up abba
baab
xfst[6]: up bbbbba
aaaaab
xfst[6]: up bababa
ababab
xfst[6]:

Note that neither way of ordering the rules gets this result:

xfst[6]: read regex a -> b .o. b -> a ;
348 bytes. 1 state, 3 arcs, Circular. Label Map: Default.
xfst[7]: up a
a
b
xfst[7]: up b
xfst[7]: down abba
aaaa
xfst[7]: down baab
aaaa
xfst[7]: up abba
xfst[7]: up bb
xfst[7]:

Try the other way around: b → a .o. a → b.

So there are things you can't do with rule ordering

Now think about this. Is this right semantics for these rules? Should they feed themselves instead? Get an infinite loop?

What happens in this case?

xfst[7]: read regex a -> b || x _ ,, a -> c || _ y ;
1.4 Kb. 4 states, 21 arcs, Circular. Label Map: Default.
xfst[8]:

In particular, consider what happens for the input xay. What should happen? Should the world explode? In a rule ordering system it's clear what would happen. The rule that came first would win, destroying the environment for the other rule (bleeding). What should happen here?

Here's what happens:

xfst[8]: down xa
xb
xfst[8]: down ay
cy
xfst[8]: down xay
xby
xcy

Moral: We are defining relations.

Lenient composition

Here's a script which implements the syllabification transducer described in Lauri Karttunen's paper the Proper Treatment of Optimality in Computational Phonology.

##############################################################################
# Script based on Karttunen (1998), The Proper Treatment of Optimality
# in Computational Phonology:
#
# http://www2.parc.com/istl/members/karttune/publications/pto/bilkent.pdf
#
# Prepared and adapted by Jason Baldridge. Updated 2011-9-28.
##############################################################################

##############################################################################
# Define the GEN function.

define C [b|c|d|f|g|h|j|k|l|m|n|p|q|r|s|t|v|w|x|y|z] ;
define V [a|e|i|o|u] ;

define Input [C|V]* ;
define Parse C -> ["O[" | "D[" | "X["] ... "]" .o. V -> ["N[" | "X["] ... "]" ;

define OverParse [. .] (->) ["O[" | "N[" | "D["] "]" ;

define Onset "O[" (C) "]" ;
define Nucleus "N[" (V) "]" ;
define Coda "D[" (C) "]" ;
define Unparsed "X[" [C|V] "]" ;

define SyllableStructure [[ (Onset) Nucleus (Coda)]/Unparsed]* ;

def GEN Input .o. OverParse .o. Parse .o. SyllableStructure;

# Uncomment these to see what happens with GEN before the constraints
# are applied.

#push GEN
#print random-lower

##############################################################################
# Define the constraints

define HaveOns "N[" => "O[" (C) "]" _ ;
define NoCoda ~$["D["] ;
define FillNuc ~$[ "N[" "]" ] ;
define FillOns ~$[ "O[" "]" ] ;
define FillCoda ~$[ "D[" "]" ] ;

# This is called "Parse" in the paper, but this conflicts with the
# expression for "Parse" given earlier, so it is called "MustParse" in
# this script.
define MustParse ~$"X[" ;

# For a finite number of multiple violations.
define MustParse0 ~$"X[" ;
define MustParse1 ~[[$"X["]^>1] ;
define MustParse2 ~[[$"X["]^>2] ;
define MustParse3 ~[[$"X["]^>3] ;
define MustParse4 ~[[$"X["]^>4] ;


##############################################################################
# Put GEN and the constraints together, in various ways.

# Use this for a transducer which only pays attention to one violation
# of "MustParse".
define OneViolation GEN .O. HaveOns .O. NoCoda .O. FillNuc .O. MustParse .O. FillOns ;

# Correct order, with multiple violations.
define MultiViolation GEN .O. HaveOns .O. NoCoda .O. FillNuc .O. MustParse0 .O. MustParse1 .O. MustParse2 .O. MustParse3 .O. MustParse4 .O. FillOns ;

# Alternative ranking that prefers codas to unparsed elements.
define PreferCoda GEN .O. HaveOns .O. FillNuc .O. MustParse0 .O. MustParse1 .O. MustParse2 .O. MustParse3 .O. MustParse4 .O. FillOns .O. NoCoda ;

# As with PreferCoda, but also requiring filled codas
define FilledCoda GEN .O. HaveOns .O. FillNuc .O. MustParse0 .O. MustParse1 .O. MustParse2 .O. MustParse3 .O. MustParse4 .O. FillOns .O. NoCoda .O. FillCoda;

##############################################################################
# Try it out.
echo >>
echo >> With only single violations of MustParse
push OneViolation
down bebop

echo >>
echo >> With multiple violations of MustParse
push MultiViolation
down bebop

echo >>
echo >> With preference for parsed codas
push PreferCoda
down bebop

echo >>
echo >> As above, but now also preferring filled codas
push FilledCoda
down bebop
 

Exercises

Many exercises can be found on the homepage for Lauri Karttunen's CIS639 course on XFST.

Pig Latin translator

Using the xfst tool, create a transducer for translating from English (or any other language) into Pig Latin.

There may be different varieties of Pig Latin. Let's pick the simplest one, defined by the rule

If the word begins with a consonant, the corresponding Pig Latin word is the same except that the initial consonant has been moved to the end of the word and suffixed with “ay”.

For example, the sentence pig latin is fun corresponds to igpay atinlay is unfay in Pig Latin.

Your task is to write an xfst script, piglatin.xfst to be run with the command,

xfst -l piglatin.xfst

that leaves on the stack a transducer with the following properties:

  1. The upper side language of the transducer is the universal language. That is, it contains all strings of any length.
  2. The transducer is unambiguous in the top-down direction. That is, each string in the upper language corresponds to one and only one string in the lower language.
  3. When you apply your transducer “downward” to an input line, xfst displays the corresponding Pig Latin translation for all words that begin with a consonant. Otherwise the output is the same as the input.
  4. Words consist of consonants and vowels.
  5. The consonants are b, c, d, f, g, h, j, k, l, m, n, p, q, r, s, t, v, w, x, y, z.
  6. The vowels are a, e, i, o, u.
  7. White space characters, space, and tab, are word separators. Thus the string “pig latin” contains two words.

Example

Here is an example of how things should work when you are done.

xfst -l piglatin.xfst 	//Execute piglatin.xfst and wait for commands.//
xfst[1]: apply down 	//Start applying the transducer downwards.//
apply down> pig
igpay
apply down> pig latin
igpay atinlay
apply down> the little brown fox jumped over the lazy dog
hetay ittlelay rownbay oxfay umpedjay over hetay azylay ogday

Hints

You will find it convenient to start your script with some definitions, such as:

define Cons [ b ] ;
define Vowel [a | e | i | o | u ] ;
define Ltr [ Cons | Vowel ]+ ;
define Limit [" " | "\t" | .#. ] ;

Because the trick in the Pig Latin transformation is the treatment of word-initial consonants, it is better to start experimenting with just one consonant. Once you succeed in constructing a transducer that correctly maps boa into oabay, the rest is easy.

Because you cannot literally “move” the initial consonant from the beginning to the end of the word, you need to think of movement in terms of two more primitive operations:

 1. Copy the initial consonant to the end of the word adding "ay".
 2. Delete the initial consonant. 

You need the definition of Limit to refer to the beginning and end of the word.

One more important hint:

[ ] -> a || b _ c ; 	Maps the string "bc" to the infinite language [ b a* c ].
[. .] -> a || b _ c ; 	Maps the string "bc" to the string "bac".

To insert just one instance of something, use [. .] in your replace expression.

Work with simple replacement and composition. Don't try using parallel replacement.

With just one consonant, your Pig Latin translator should have around 8 states and 44 arcs. With the full set of 21 consonants, the size of the transducer is about 48 states, 1220 arcs.

Solutions

The most straightforward solution for the Pig Latin problem is a brute force one which simply copies each letter explicitly. Another option is to use compile-replace to provide a solution based on reduplication. This is more elegant, but also comes with some computational cost unless an explicit dictionary is used (which is itself a fairly reasonable thing to do).

Credits

Lauri Karttunen created the original exercise, and credits Mark Steedman and Bonnie Webber for the idea.

Overview of Commands

This section contains a general overview of xfst commands. A complete alphabetical list, with explanations, is found in the next section.

Command Syntax

XFST commands are in general of the form '<command> <type or object>', where <command> specifies the operation to be performed and the second term, if any, gives some additional specification about the type of the operation or the object it applies to. For example, there are several variants of the 'print' command: 'print net', 'print sigma', 'print words', etc.

All display commands and all unary operations, such as 'lower-side net', apply to the network on the top of the stack. For example, 'print words' enumerates the paths of the topmost network only, although there may be others on the stack. Some commands, such as 'print net' and 'print words', can be followed by a name of network which has been given a name with the 'define' command, as shown below:

  xfst[0]: read text
  text> two
  text> words
  text> ^D
   
  8 states, 8 arcs, 2 words.
  fst[1]: define W2  
  fst[0]: print words W2
  two
  words
  xfst[0]:

Virtually all XFST commands can be abbreviated to a single word command. For example, the 'print' part of all print commands can be dropped. Thus 'sigma' as a command has the same effect as 'print sigma'. Similarly, 'regex' and 'read regex' are equivalent.

Short command names are convenient when one is working in an interactive mode. On the other hand, for scripts we recommend using the long commands because it tends to make the scripts more perspicuous.

Command Classes

The FST commands can be grouped into five classes. We describe first in general terms the five classes and list the commands in each class. A more detailed description of each command is given in an alphabetical listing in the next section.

Input/Output and Stack Commands

Move networks onto and off the stack, save the stack or the top network into a file in various formats. We use the commands 'load' and 'save' for binary files, 'read' and 'write' for text files. 'read regex' and 'define' compile a network from a regular expression; 'read regex' pushes the result on the stack, 'define' assigns it to the chosen name.

The 'read' commands take the input from the console (stdin) unless the user specifies another source by means of < symbol and a file name. For example, 'read text < /usr/dict/words' compiles the list in /usr/dict/words.

Similarly, the 'write' commands send the output to the console (stdout) unless the command is followed by > and a name of a target file. It is also possible to send the output directly to a printer. For example, 'write text | lp' uses the | convention in Unix to pass the output to the 'lp' print command.

Binary input/output

load stack, save stack, load defined, save defined

Text input/output

read prolog, read spaced-text, read text, write prolog, write spaced-text, write text.

Regular expressions

define, read regex, undefine

Stack commands

clear stack, pop stack, push defined, turn stack, rotate stack

Display commands

Give information about the content and properties of the network on the top of the stack, the language or relation it represents, the state of the system, and the available commands.

Like the 'write' commands in the previous section, the 'print' commands display the output on the console, unless the user directs it to a file or a printer. The 'print' commands do not have a corresponding input command.

Stack and network information

print eqv-labels, print flags, print label-tally, print net, print name, print size, print sigma, print labels, print stack, inspect net

System information

print aliases, print file-info, print defined, print directory, print storage

Network content

print random-lower, print random-upper, print words, print lower-words, print upper-words, print nth-upper, print nth-lower, print num-upper, print num-lower

Apply network to a string

apply up, apply down

Help

apropos, help

Other

echo

Tests of network properties

Unary tests check whether the network on the stack has certain properties. Binary tests compare the languages of the two top network on the stack.

Unary

test lower-bounded, test upper-bounded, test lower-universal, test upper-universal

Binary

test equivalent, test overlap, test sublanguage

Operations on networks

Produce a network that represents a new language or relation by modifying the top network, two top networks, or all the networks on the stack. The affected networks are removed from the stack and replaced by the result of the operation.

Property list commands modify or display the attribute-value pairs on the top network's property list. Maintenance commands perform various housekeeping operations of the top level network structure without changing the language or the relation it represents.

Unary operations

invert net, label net, lower-side net, negate net, one-plus net, reverse net, sigma net, substitute defined, substitute label, substitute symbol, substring net, upper-side net, zero-plus net

Binary operations

crossproduct net, minus net

N-ary operations

compose net, concatenate net, intersect net, shuffle net, union net

Property list

add properties, edit properties, name net, read properties, write properties

Maintenance

complete net, determinize net, eliminate flag, epsilon-remove net, minimize net, prune net, sort net, compact sigma, optimize net, unoptimize net

System commands

Change the internal configuration of XFST or interact with the Unix operating system.

Variables show, set
Aliases alias
System call system
Execute script source
Other quit

Alphabetical Index

This index consists of three parts. The first index lists all the currently implemented XFST commands. The second list describes the variables that the user can manipulate with the XFST 'set' command. The final part explains the correspondence between certain XFST commands and certain operators in the regular expression calculus. The XFST commands involved are the 'load' and 'read' commands and all commands that construct a new network by means of an operation on one or more networks on the stack. List of Commands This list gives a brief description of all the currently implemented XFST commands. This information is also available on-line in XFST with the 'help' command. For example, 'help alias' prints out the same text as shown below.

add properties

add properties < source-file

  reads a set of attribute-value pairs and adds them to the property list of the top network on the stack. Each line in the source file must contain one attribute-value pair. For example,
  COPYRIGHT: "Xerox Corporation"
  DATE: "June 10, 1997"
  VERSION: 9

The attributes, each terminated by a colon, must be strings (upper case recommended). The values may be strings or integers. String values must be enclosed in double quotes: “9” and 9 are considered as different values. If an attribute to be added already exists on the property list of the network, 'add properties' will reset it to a new value. See also: 'read properties', 'write-properties'. alias

alias name alias name command-sequence

  'alias' defines as a shorthand for the given sequence of XFST commands. For instance, the new command 'ls' can be created in the following way:
  alias ls print directory

Alias may contain several commands. A new command 'lexr', for reading a lexicon, can be created in the following way:

  fst[0]: alias lexr
  alias> load lexicon.fst
  alias> print properties
  alias> END;

A multi-command alias may also be defined on one line as follows:

  alias lexr load lexicon.fst; print properties

See also: 'print aliases'. apply up

apply up line apply up < source-file

  Simulates the composition of the input string with the lower side of the top network on the stack and extracts the result from the upper side. If the command 'apply up' is followed by a carriage return,the program prompts the user with apply>, prints the result and gives a prompt for a new input. Control-D exits from this mode.

apply down

apply down line apply down < source-file

  Simulates the composition of the input string with the upper side of the top network on the stack and extracts the result from the lower side. If the command 'apply down' is followed by a carriage return, the program prompts the user with apply>, prints the result and gives a prompt for a new input. Control-D exits from this mode.

apropos word

  prints brief information about the commands relating to the given word. If you need further information about a command, use the command 'help command-name'.

clear stack

  Removes all networks on the stack.

compact sigma

  all symbols that have the same distribution as the unknown symbol are eliminated from the alphabet and the corresponding redundant arcs are removed from the network.

complete net

  extends the top network until it has a transition for every symbol in sigma in every state. All new transitions lead to a failure state, thus the language of the network is preserved.Complete is defined only for simple networks (arity = 1), not for transducers.

compose net

  replaces the stack with the composition of all networks currently on the stack. If the stack consists of two networks, defined as A and B, with A on top of the stack, the result is equivalent to compiling the regular expression [A .o. B].

concatenate net

  replaces the stack with the concatenation of all networks currently on the stack. If the consists of two networks, defined as A and B, with A on top of the stack, the result is equivalent to compiling the regular expression [A B]. In regular expressions, concatenation is indicated simply by whitespace.

crossproduct net

  returns the crossproduct of the first two networks on the stack. They must not be transducers. If the stack consists of two networks, defined as A and B, with A on top of the stack, the result is equivalent to compiling the regular expression [A .x. B].

define symbol

define symbol regular-expression

  binds the given symbol to a network compiled from the regular expression. Every subsequent occurrence of the symbol in a regular expression is interpreted as the name of the corresponding language or relation. To unbind the symbol use 'undefine'. If the regular expression is omitted, i.e. if the command is 'define', the symbol is bound to the top net on the stack. The net is removed from the stack. See also: 'load defined', 'save defined', 'print defined', 'read regex', 'undefine'.

determinize net

  replaces the top network with an equivalent deterministic network. A deterministic net has at most one transition per label from each state.

echo text

  Prints the text, followed by a newline.

edit properties

  'edit properties' allows the user to edit the property list of the top network on the stack. Properties may be added, deleted, and modified. Values must be integers or strings.

eliminate flag

  replaces the top network on the stack with an equivalent network that encodes directly the constraints concerning the given flag attribute. A flag symbol contains two or three fields separated by a comma. Flag symbols with three fields are of the form @Action.Attribute.Value@ where Action is U (unify), R (require), P (set to positive value), N (set to the complement value). The other fields, Attribute and Value, can be chosen freely. Flag symbols with two fields are of the form @Action.Attribute@, where Action is either C (clear), or R (require), or D (disallow). For example, the command 'eliminate Harmony' removes all flags symbols with Harmony in the the attribute field and all paths that fail to meet the Harmony constraints.

epsilon-remove net

  replaces the top network with an equivalent one that has no epsilon transitions.

help

help command-name

  'help' lists the commands and the command classes. To get more information about a command, use the command 'help command'. It gives a longer explanations about the command and its usage. All possible arguments are listed and explained. For instance, the command 'help' (see above) may be used without arguments or it may have one argument that is a name of a command.

The command name may designate a class of commands, for instance, 'help stack' prints all stack commands. If the given command name is not recognised, command help works like command 'apropos'. inspect net

  invokes a mode for interactively inspecting the top network on the stack.

intersect net

  replaces the stack with the intersection of all networks currently on the stack. If the stack consists of two networks, defined as A and B, regardless of the order, the result is equivalent to compiling the regular expression [A & B].

invert net

  exchanges the two sides of the top network on the stack. If the network is defined as A, the result is equivalent to compiling the regular expression [A.i]. 'invert net' has no effect on simple networks.

label net

  replaces the top network on the stack with a network that represents the language or relation that includes all the strings and string pairs in the label alphabet of the original network. See also 'print labels', 'sigma net'.

load defined source-file

  restores the definitions from a binary source file. The networks are assigned to the chosen names directly, without pushing them onto the stack. The file must be in the binary format, as produced by the 'save defined' command.

load stack source-file

  reads the networks in the given file, and pushes them onto the the stack. The file must be in the binary format, as produced by the 'save stack' command. If the file only contains a single network, it can also be loaded with the command 'read regex [@bin source-file];'.

lower-side net

  extracts the lower language of the top network on the stack. If the network is defined as A, the result is equivalent to compiling the regular expression [A.l]. 'lower-side net' has no effect on simple networks.

minimize net

  replaces the top network on the stack with an equivalent network that has a minimal number of states.

minus net

  replaces the top two networks on the stack with a network containing all strings in the first network that are not in the second. If the two networks are defined as A and B, with A on top of the stack, the result is equivalent to compiling the regular expression [A - B].

name net symbol

  assigns the given name to the top network on the stack.

negate net

  replaces the top network on the stack with a network that accepts all and only those strings rejected by the original. If the network is defined as A, the result is equivalent to compiling the regular expression [~A].

one-plus net

  concatenates the top network any number of times with itself. If the network is defined as A, the result is equivalent to compiling the regular expression [A+].

optimize net

  runs a heuristic algorithm that tries to reduce the number of arcs, possibly at the cost of introducing more states. 'unoptimize net' restores the original network.

pop stack

  Removes the top network on the stack.

print aliases

print aliases > target-file

  prints each alias and the command sequence it encodes.

print defined

print defined > target-file

  prints each defined symbol and the size of the network it stands for.

print directory

print directory pattern print directory pattern > target-file

  prints the content of the current directory in its entirety or the files that match the given pattern, such as *.txt.

print eqv-labels

print eqv-labels > target-file

  prints the label equivalence classes of the networks on the stack. Labels that belong to the same equivalence class always appear together on arcs that lead to the same destination state.

print file-info

print file-info > target-file

  displays info about the last opened binary file.

print flags

print flags > target-file

  'print flags' prints information about the top network on the stack.

print labels

print labels symbol print labels > target-file

  prints the label alphabet of the top network on the stack (by default) or of the given defined network. The label alphabet includes all the symbols and symbol pairs that occur as arc labels in the network. See also 'print sigma', 'label net'.

print label-tally

print label-tally > target-file

  tallies label frequencies in the top network on the stack.

print lower-words

print lower-words name print lower-words number print lower-words name number print lower-words > target-file

  displays the words in the lower side language of the top network or in the network assigned to the defined symbol. If a number is given, only that many words are printed, otherwise every word is printed but not looping on any path more than once. The output can be directed to a file by adding a greater-than symbol, >, and a name of a target file to the end of the command. For example, 'print lower-words > lower-words.txt'. If the network is a simple network rather than a transducer, 'print lower-words', 'print words', and 'print upper-words' give the same output. See also the variable 'print-space'.

print name

  prints the name of the top network on the stack. If the net is unnamed, the hex address of the net is printed instead. Use the command 'name net' to give a name to the top network.

print net

print net symbol print net > target-file

  prints a text representation of the top network on the stack (by default) or the given defined network. The output is directed to a file by adding a greater-than symbol, >, and a name of a target file to the end of the command.

print nth-lower

print nth-lower name print nth-lower number print nth-lower name number print nth-lower > target-file

  displays the word on the lower side of the nth path in the top level network or in the network assigned to the defined symbol. This command only works on acyclic networks. The inverse command is 'print num-lower'.

print nth-upper

print nth-upper name print nth-upper number print nth-upper name number print nth-upper > target-file

  displays the word on the upper side of the nth path in the top level network or in the network assigned to the defined symbol. This command only works on acyclic networks. The inverse command is 'print num-upper'.

print num-lower word

print num-lower name word print num-lower > target-file

  displays the numbers of all paths in the top level network or int the named network that have word on the lower side. This command only works on acyclic networks. The inverse command is 'print nth-lower'.

print num-upper word

print num-upper name word print num-upper > target-file

  displays the numbers of all paths in the top level network that have word on the upper side. This command only works on acyclic networks. The inverse command is 'print nth-upper'.

print random-lower

print random-lower number print random-lower > target-file

  generates 15 random words (by default) from the top network on the stack or any number specified in the command. The words are from the lower side language of the net. If the network is a simple automaton, rather than a transducer, 'print random-lower' gives the same output as 'print random' and 'print random-upper'.

print random-upper

print random-upper number print random-upper > target-file

  generates 15 random words (by default) from the top network on the stack, or any number specified in the command. The words are from the upper side language of the net. If the network is a simple automaton, rather than a transducer, 'print random-upper' gives the same output as 'print random' and 'print random-lower'.

print random-words

print random-words number print random-words > target-file

  generates 15 random words (by default) from the top network on the stack or any number specified in the command. If the network is a transducer, the output may contain symbol pairs. If the network is a simple automaton, rather than a transducer, 'print random' gives the same output as 'print random-lower' and 'print random-upper'.

print sigma

print sigma name print sigma > target-file

  prints the sigmaalphabet of the top network on the stack (by default) or the given defined network. The sigma alphabet includes all the symbols that appear in the network, either as such or as a constituent of a symbol pair, and any symbols that have been excluded from the network by subtraction or complementation. See also 'print labels', 'sigma net'.

print size

print size name print size > target-file

  prints the size of the top network on the stack (by default) or the given defined network.

print stack

print stack > target-file

  displays the content of the stack.

print storage

print storage > target-file

  displays memory usage

print upper-words

print upper-words name print upper-words number print upper-words name number print upper-words > target-file

  displays the words in the upper side language of the top network or in the network assigned to the defined symbol. If a number is given, only that many words are printed, otherwise every word is printed but not looping on any path more than once. The output can be directed to a file by adding a greater-than symbol, >, and a name of a target file to the end of the command. For example, 'print upper-words > upper-words.txt'. If the network is a simple network rather than a transducer, 'print lower-words', 'print words', and 'print upper-words' give the same output. See also the variable 'print-space'.

print words

print words name print words number print words name number print words > target-file

  displays the paths in the top network or in the network assigned to the defined symbol. If a number is given, only that many paths are printed, otherwise every word is printed but not looping on any path more than once. The output can be directed to a file by adding a greater-than symbol, >, and a name of a target file to the end of the command. For example, 'print words > words.txt' If the network is a transducer, the output may contain symbol pairs. Otherwise, 'print lower-words', 'print words', and 'print upper-words' give the same output. See also the variable 'print-space'.

prune net

  removes all paths leading to non-final states in the top network on the stack.

push defined symbol

  makes a copy of the network named by a previous 'define symbol' command and pushes it onto the stack. 'push defined symbol' has the same effect as 'read regex symbol ;'.

quit

  Exits the program.

read prolog

read prolog < source-file

  reads the network(s) contained in the file and pushes them onto the stack. The file must be in the format produced by the 'write prolog' command. The the same effect can be produced with the command 'read regex [@pl source-file];'

read properties

read properties < source-file

  reads a set of attribute-value pairs resets the property list of the top network on the stack. See also: 'add properties', 'write properties'.

read regex

read regex < source-file

  creates a network from a regular expression and and pushes it onto the stack. The file should contain one regular expression. If no filename is given the input is read from the console. The input expression must be terminated by a semicolon.

read spaced-text

read spaced-text < source-file

  reads a transducer or a net with multicharacter symbols from a file. See also function 'print words' and variable 'print-space'.

read text

read text < source-file

  reads the net as text from a file. See also function 'print words' and variable 'print-space'.

reverse net

  replaces the top net on the stack with one that accepts the reverse language.

rotate stack

  Pushes the top element of the stack to the bottom.

save defined target-file

  stores the networks for all defined symbols into a binary file. Use 'load defined' to restore the definitions.

save stack target-file

  stores all the networks on the stack into the given file in a binary format. Use 'load stack' to load the file.

set variable value

  sets a new value to the variable. The value may be an integer, ON or OFF, or a string. See variable for details. See also: 'show'.

show variable

  shows the value of the variable. See also 'set'.

shuffle net

  replaces the stack with a single network that accepts every string formed by shuffling together (interdigitanting) one string from each of the input languages. For instance, if Net1 accepts ab and Net2 accepts xy, then the shuffle of these nets will accept the following six strings: abxy axby axyb xaby xayb xyab.

sigma net

  replaces the top network on the stack with a network that encodes the "sigma language" of the original network, that is, the union of all symbols in the sigma alphabet. See also 'print sigma', 'label net'. 

sort net

  Reorders the arcs of the top network in a canonical order.

source command-file

  executes a file of XFST commands. For example, 'source xfst.script' executes the commands in xfst.script.

substitute defined

  'substitute defined NAME for SYMBOL' replaces the top network on the stack with a network derived by splicing in the network associated with the given name for each arc that contains the symbol. If the network is a transducer, SYMBOL must not occur in a symbol pair. 

substitute label

  'substitute label LIST_OF_LABELS for LABEL' replaces the top network on the stack with a network derived by replacing every arc with the given label by an arc or arcs, each labeled with one of the substitute labels. If the list consists of the keyword NOTHING, all paths containing the given label are removed. The label and the substitutes may be single symbols or a symbol pairs. See also 'substitute symbol'. 

substitute symbol

  'substitute symbol LIST-OF-SYMBOLS for SYMBOL' replaces the top network on the stack with a network derived by replacing every arc whose label contains the given symbol by an arc or arcs whose label instead contains one of the substitute symbols. If the list consists of the keyword NOTHING, all paths containing the given symbol are removed. The symbol and the substitutes must be single symbols, not symbol pairs. See also 'substitute label'. 

substring net

  replaces the top network on the stack with a network that accepts every string that is a substring of some string in the language of the original network.

system system-command

  'system command' executes the given operating system command(s), for instance, 'system ls'.

test equivalent

  returns 1 (TRUE) if the two topmost networks contain the same language

test lower-bounded

  returns 1 (TRUE) if the lower side of the top level network has no epsilon cycles.

test lower-universal

  returns 1 (TRUE) if the lower side of the top level network represents the universal language.

test overlap

  returns 1 (TRUE) if the languages of the two topmost networks have a non-empty intersection.

test sublanguage

  returns 1 (TRUE) if the language of the topmost network is a sublanguage of the second network on the stack.

test upper-bounded

  returns 1 (TRUE) if the upper side of the top level network has no epsilon cycles.

test upper-universal

  returns 1 (TRUE) if the upper side of the top level network contains the universal language.

turn stack

  reverses the order of the networks on the the stack.

NOTE: Use the command 'reverse net' to create the reverse language of the top net. undefine list-of-symbols

  unbinds the symbol that was bound by the 'define' command. 'undefine symbol1 symbol2 symbol3 ...' etc. undefines all the listed symbols. 'undefine ALL' removes all the definitions.

union net

  replaces the stack with the union of all networks currently on the stack.

unoptimize net

  reverses the effect of 'optimize net' restoring the original network.

upper-side net

  extracts the upper language from the network on the top of the stack. If the network is defined as A, the result is equivalent to compiling the regular expression [A.u]. 'upper-side net' has no effect on simple networks.

write prolog

write prolog > target-file

  writes the network as Prolog-clauses. See also: 'read prolog'.

write properties

write properties name write properties > target-file

  prints the attribute-value pairs on the property list of the network. See also 'add properties', 'read properties'.

write spaced-text

write spaced-text > target-filetarget-file

  Stores the top network in a text format. This format is intended for transducers and networks with multi-character labels. Use 'read spaced-text' to read files in this format. Use 'write text' for simple networks with single-character labels. Use 'save stack' to store in a compact binary format. If no target file is specified, output will be sent to standard output (usually the terminal).

write text

write text > target-file

  stores the words in the top network, one per line. This format cannot be used for transducers or networks with multi-character labels. For such networks, use 'save-as-spaced-text' instead. Use 'save stack' to store in a compact binary format. If no target file is specified, output will be sent to standard output (usually the terminal).

zero-plus net

Variables This is the list of variables that the user can set with the command 'set variable value' with the list of available values. The command 'show variable' displays the current value of the variable.

directory filename

  the working directory.

mark-version (ON/OFF)

  when ON, the name and  version number of the application is recorded on the property list of each saved network under the property TOOLVERSIONS. For example, TOOLVERSIONS: (("XFST" "5.6.7")).

max-regex-errors (integer)

  the maximum number of errors for regular expressions. When it is exceeded the rest of the input file is discarded. Default is 10.

minimal (ON|OFF)

  when ON, network operations minimize the result. Minimization includes epsilon-removal and determinization. Default is ON.

name-nets (ON|OFF)

  when ON, regex names networks. Default is OFF.

print-pairs (ON|OFF)

  when ON,'apply up' and 'apply down' show both (UPPER and LOWER) sides of labels. Default is OFF.

print-sigma (ON|OFF)

  when ON, the sigma is shown when a network in printed. Default is ON.

print-space (ON|OFF)

  when ON, 'print word' inserts a space between symbols. Default is OFF.

process-in-order (ON|OFF)

  when ON, intersection and composition operations process networks in the order they appear in the stack. Default is OFF.

quit-on-fail (ON|OFF)

  when ON, the application exits with an error if a command is incorrectly spelled or cannot be executed because of a missing file or some other error. This is useful when the application is run by a shell script. Default is OFF.

quote-special (ON|OFF)

  When ON literal instances of special characters are enclosed in double quotes: zero and question mark are printed as "0" and "?" to distinguish them from epsilon and the unknown symbol. Space, tab, and newline are printed as " ", "\t", and "\n", respectively, to distinguish them from each other. The affected commands are 'print words', 'print lower-words', and 'print upper-words'. Default is OFF.

random-seed (integer)

  seed of the random number generator. It cannot be inspected.

sort-arcs (ON|OFF)

  when ON, the arcs of a network are sorted before the network is pushed onto the stack. Default is ON.

unicode (ON|OFF)

  if the value is ON, two-byte characters are printed as four hex integers preceded.by \u. If the value is OFF, we try to display 16-bit characters directly. Default is OFF.

verbose (ON|OFF)

  when ON, more information is printed. Default is ON.

Commands ↔ RE operators All the XFST commands that produce a new network from one or more expressions on the stack and replace them with the result of the operation correspond to an operation in the regular expression calculus (see Syntax and Semantics of Regular Expressions). Some of the load and read commands also have a counterpart in the regular expression language.

The correspondence is summarized below along with a characterization of the language or relation that the result implements, described in terms of the corresponding regular expression. We also indicate for each command whether it applies to the top network (unary), the two top networks (binary), or to all networks on the stack (n-ary).

concatenate net (n-ary)

  [A B] denotes the concatenation of all strings or string pairs in A with all strings or string pairs in B.

compose net (n-ary)

  [A .o. B] denotes the composition of the relations A and B.

crossproduct net (binary)

  [A .x. B] denotes the relation that maps every string of A to every string of B.

intersect net (n-ary)

  [A & B] denotes the set of strings that belong to both A and B.

invert net (unary)

  [A.i] denotes the relation that is the inverse of A; that is the upper and lower members of each string pair in the relation are switched around but the relation remains otherwise the same.

load stack

  [@bin source-file] denotes the topmost network in the binary source-file. Note that @bin only retrieves one network, whereas 'load stack' pushes onto the stack all the networks in the file. If the name of the source file contains a period, a slash or some other special character, the name should be enclosed in double quotes.

lower-side net (unary)

  [A.l] denotes the lower language of the relation A.

minus net (binary)

  [A - B] denotes the set of strings that are in A but not in B.

negate net (unary)

  [~A] denotes the set of all strings that are not in A.

one-plus net (unary)

  [A+] denotes all the strings that are composed by concatenating strings of A with each other any number of times.

read prolog

read regex read spaced-text read text

  have corresponding regular expression operators: @pl, @re, @stxt, and @txt, respectively, These unary operators are prefixed to the name of the source file. If the name contains a period, a slash or some other special character, the name must be enclosed in double quotes. For example,
  read regex [@re regexfile] ;

compiles the regular expression in regexfile;

  read regex [@txt "words.txt"] ;

has the same effect as

  read text < words.txt

reverse net (unary)

  [A.r] denotes the language or relation that is the reverse of A.

substitute symbol SYMBOL_LIST for SYMBOL (unary)

  `[A, SYMBOL, SYMBOL_LIST] denotes the language or relation derived from A by replacing every instance of the given symbol by the union of the symbols on the list. 

union net (n-ary)

  [A | B] denotes the union of expressions A and B. All the strings and string pairs that belong to either A or B (or both) also belong to the union.

upper-side net (unary)

  [A.u] denotes the upper language of the relation A.

zero-plus net (unary)

  [A*] denotes the set of strings that consists the empty string and any number of concatenations of A with itself. [A*] is like [A+] except that it always includes the empty string.

N.B. There are many regular expression operators without an equivalent command in XFST. For example, the replace (→) and restriction (⇒) operators do not correspond to any XFST command.


This content is licensed under the following license: CC Attribution-Noncommercial-Share Alike 3.0 Unported.
Comments