This page is adapted from Lauri Karttunnen, Tamás Gaál, and André Kempe's XFST tutorial and Mark Gawron's XFST examples. Xerox finite-state tool is a general-purpose utility for computing with finite-state networks. It enables the user to create simple automata and transducers from text and binary files, regular expressions and other networks by a variety of operations. The user can display, examine and modify the structure and the content of the networks. The result can be saved as text or binary files. IntroductionXFST is a general-purpose Unix application for computing with finite-state networks, based on long-term Xerox research initiated by Ronald M. Kaplan and Martin Kay at the Xerox Palo Alto Research Center in the early 1980s. It is a successor to two earlier interfaces of the same type: IFSM and FSC. The IFSM interface was created at PARC by Lauri Karttunen and Todd Yampol around 1990-92. The FSC tool was developed at XRCE in 1994-95 by Pasi Tapanainen. XFST combines the best features of its two predecessors. It includes many extensions to the regular expresson calculus introduced at XRCE. XFST can read finite-state networks from binary files and compile them from regular expressions and text files. The networks can be simple networks or finite-state transducers. They can be combined by means of a variety of operations, such as union and composition. The resulting networks can be saved into a binary or a text file. The user may apply a network to strings to determine whether the string is accepted by the network or to transform it to another string if the network is a transducer. XFST provides many ways to get information about a network and to inspect and modify its structure. We introduce XFST with a brief tour that shows how the application is launched in Unix and some of the things that it can do. In the later sections we discuss the command set and other related issues in more depth. For help with regular expressions and other issues concerning the structure and application of finite-state networks, please consult the webpage for the XFST book by Kenneth Beesley and Lauri Karttunen. TutorialRunning XFSTXFST is launched with the command 'xfst'. The launch command can be modified by a number of optional flags. For example, 'xfst -f scriptfile' executes the commands in scriptfile and exits upon completion. If no additional modifiers are given, the 'xfst' command starts the application in an interactive loop waiting for commands from the user: > xfst
Copyright (c) Palo Alto Research Center 2001-2007
Xerox Finite-State Tool, version 2.4.9
Type "help" to list all commands available or "help help" for further help.
Alternate way of running xfst with command history: You can invoke xfst as 'rlwrap xfst' and that will retain the command history in the program through up and down arrow keys (similar to the bash shell). You can also add an alias in your .bashrc file like this: alias xfst='rlwrap xfst'
The The great majority of XFST commands are about adding networks to the stack, replacing some or all of the stack by the result of some operation, and saving the stack into a file. Another set of commands is concerned with the network currently on the top of the stack, that is, the network that was most recently added to the stack. We give a few simple examples of both types of commands. We present two examples. In the first example, the user creates a network with the help of XFST and saves the result into a file. In the second example, we retrieve the network from the file to use in another operation. We then show how a sequence of commands can be run by a script and aliased to a single command. Making and saving a networkA network can be added to the stack by loading a previously compiled network from a binary file or by compiling a new one from some text source. In either case, the network becomes the topmost one on the stack. In this example, we compile a network from a regular expression using the command 'read regex'. We type xfst[0]: read regex [%0 | 1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | 9] ;
This expression denotes the language that consists of the ten decimal digits. Because 344 bytes. 2 states, 10 arcs, 10 paths. Label Map: Default.
xfst[1]:
showing that the network representing this ten-word language consists of 2 states and 10 arcs. The new prompt, xfst[1]: print net
Sigma: %0 1 2 3 4 5 6 7 8 9
Size: 10, Label Map: Default
Net:
Flags: deterministic, pruned, minimized, epsilon_free, loop_free
Arity: 1
s0: %0 -> fs1, 1 -> fs1, 2 -> fs1, 3 -> fs1, 4 -> fs1, 5 -> fs1, 6 -> fs1, 7 -> fs1, 8 -> fs1, 9 -> fs1.
fs1: (no arcs)
Here the It is often convenient to give a network a name that can be used in a regular expression to refer to it. The command for that assignment is 'define': xfst[1]: define Digit defined Digit: 344 bytes. 2 states, 10 arcs, 10 paths. Label Map: Default. xfst[0]:
The
Another way to achieve exactly the same result is to include in the xfst[0]: define Digit [%0 | 1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | 9] ; defined Digit: 344 bytes. 2 states, 10 arcs, 10 paths. Label Map: Default. xfst[0]: would have exactly the same effect as the sequence of commands shown above. Here the stack remains empty. Note the closing semicolon that marks the end of the regular expression.
Once defined, the name xfst[0]: read regex %. -> %, , %, -> %. || Digit _ Digit ; 1.6 Kb. 4 states, 41 arcs, Circular. Label Map: Default. xfst[1]: This transducer represents the parallel replacement of ”.” by ”,” and ”,” by ”.” between two digits.
To verify that the transducer does what it is supposed to do we can use the xfst[1]: apply down 1,234.99 1.234,99 xfst[1]:
The 1,234.99 0.5 100,000,000 Assuming this file is available, you can run the following command: xfst[1]: apply down < US-num.txt
Opening file US-num.txt...
1,234.99
1.234,99
0.5
0,5
100,000,000
100.000.000
Closing file US-num.txt...
In order to have the transducer available in the future, we can save it to a file. The command xfst[1]: save stack US-to-EU-num.fst Opening 'US-to-EU-num.fst' Closing 'US-to-EU-num.fst' xfst[1]:
The command xfst[1]: quit
bye.
However, instead of quitting, we move on to the next topic. Loading and using a stored networkIn this example, we load the network we just created from a file to the stack, add another network on the top of the first one, and then perform an operation to replace both of them with the result of that operation. Let us first empty the stack with 'clear stack' and then load the network back from the file. xfst[1]: clear stack xfst[0]: load US-to-EU-num.fst Opening 'US-to-EU-num.fst' Closing 'US-to-EU-num.fst' We will now create another network by compiling a simple network from the same little text file we already used above xfst[1]: read text < US-num.txt Reading iso-8859-1 text from 'US-num.txt' 764 bytes. 20 states, 21 arcs, 3 paths. Label Map: Default. Closing 'US-num.txt' xfst[2]:
The xfst[2]: print words 1,234.99 100,000,000 0.5 xfst[2]:
As the prompt xfst[2]: print stack 0: 764 bytes. 20 states, 21 arcs, 3 paths. Label Map: Default. 1: 748 bytes. 4 states, 41 arcs, Circular. Label Map: Default. xfst[2]:
The positions in the stack are numbered from 0 upwards. The net in
position 0, the three-word network, is on top of the stack. Unary
commands such as
Instead of applying the transducer to strings one at the time, as we did before with the xfst[2]: compose net 764 bytes. 20 states, 21 arcs, 3 paths. Label Map: Default. xfst[1]:
The compose operation replaces the two networks on the stack by the
result of the operation. Thus we now have just one network left. It is
instructive to view its contents using the same xfst[1]: print words 1<,:.>234<.:,>99 100<,:.>000<,:.>000 0<.:,>5 xfst[1]: The result of the composition is a transducer. It denotes a relation, a mapping from one regular language into another one. On its “upper side”, the transducer has the three original US-style numbers, each mapped to a corresponding European-style on the “lower side” of the transducer. For the most part, the mapping is an identity relation because each digit is mapped to itself. The only difference is that periods are mapped to commas, and vice versa. The angle brackets indicate pairs of non-identical symbols: <.:,> indicates the correspondence of an upper-side period with a lower-side comma; similarly for <,:.>. We can view the upper and lower languages of the relation independently. print upper-words displays the three US numbers; print lower-words shows what they have been transduced into: xfst[1]: print lower-words 1.234,99 100.000.000 0,5 xfst[1]:
As before, we can use the xfst[1]: apply up 0,5
0.5
We can also extract one of the other language from the relation. The command 'lower-side net' extracts from the transducer a simple automaton that contains just the three European numbers. xfst[1]: lower-side net 764 bytes. 20 states, 21 arcs, 3 paths. Label Map: Default. xfst[1]: print words 1.234,99 100.000.000 0,5
At this point, go ahead and quit out of xfst using the xfst[1]: quit
bye.
Running XFST with a scriptThe examples we have given so far are about using XFST in an interactive mode, with the user typing commands on the prompt line. It is more convenient, for many purposes, to write a list of commands to be run in batch mode without any user interaction. To illustrate this possibility, we write a script that compiles the US-to-European transducer. and uses it to produce a file of European-style numbers from a file of US-style numbers.
A script is an ordinary text file that can be prepared with any text
editor, such as Emacs. Here we create the script with the help of the
simple Unix # A script for producing a file of European numbers # from a file of US numbers. echo >> Defining Digit define Digit [%0 | 1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | 9] ; echo >> Defining number converter read regex %. -> %, , %, -> %. || Digit _ Digit ; echo >> reading US numbers read text < US-num.txt echo >> Composing and extracting EU numbers compose net lower-side net echo >> Creating EU-num.txt write text > EU-num.txt system ls -l EU-num.txt echo >> Done. quit ^D
The lines beginning with
Because the purpose of this script is to produce an output file, we
verify at the end that the file has indeed been created. The To execute the script, we invoke xfst with the -f flag: > xfst -f digit.xfst >> Defining Digit defined Digit: 344 bytes. 2 states, 10 arcs, 10 paths. Label Map: Default. >> Defining number converter 1.6 Kb. 4 states, 41 arcs, Circular. Label Map: Default. >> reading US numbers Reading iso-8859-1 text from 'US-num.txt' 764 bytes. 20 states, 21 arcs, 3 paths. Label Map: Default. Closing 'US-num.txt' >> Composing and extracting EU numbers 764 bytes. 20 states, 21 arcs, 3 paths. Label Map: Default. 764 bytes. 20 states, 21 arcs, 3 paths. Label Map: Default. >> Creating EU-num.txt Opening 'EU-num.txt' Closing 'EU-num.txt' -rw-r--r-- 1 jbaldrid users 54 2007-02-05 23:52 EU-num.txt >> Done. bye. The -f flag causes xfst to exit after the script has been run. If the user wishes to take over at that point and continue running xfst in an interactive mode, he can call the script with -l flag instead of -f. Another non-interactive way of running xfst is to include a sequence of xfst commands directly on the Unix command line. Each xfst command must be enclosed in double quotes and preceded by an -e flag. If the final -stop flag is omitted, xfst executes the commands and continues running in an interactive mode. > xfst -e "define Digit [%0|1|2|3|4|5|6|7|8|9] ;" \
-e "read regex %. -> %, , %, -> %. || Digit _ Digit;" \
-e "read text < US-num.txt" \
-e "compose net" \
-e "lower-side net" \
-e "write text > EU-num.txt" -stop
Note the backslash at the end of each line. It signals to to the Unix operating system that the entire sequence of commands counts as a single command line. Further examples from Mark GawronThese examples highlight the differences in rule ordering and feeding and bleeding relationships between rules as well as parallel application of rules. Rule orderingThe following command creates a transducer for the kaNpan → kamman example: xfst[0]: read regex [ N -> m || _ p ] .o. [ p -> m || m _ ]; 1.3 Kb. 4 states, 15 arcs, Circular. Label Map: Default. xfst[1]: Now use the regex: xfst[1]: down N N xfst[1]: down Np mm xfst[1]: down kaNpan kamman xfst[1]: up kamman kaNpan kampan kamman xfst[1]: down kampan kamman xfst[1]: down kamman kamman The expression given above created the FST in one go. An alternative, which will be handy below, is to define two FSTs, one for each rule, and then refer to them rather than the expressions directly. xfst[0]: define Ntom N -> m || _ p; defined Ntom: 1.2 Kb. 3 states, 10 arcs, Circular. Label Map: Default. xfst[0]: define ptom p -> m || m _ ; defined ptom: 1.2 Kb. 2 states, 6 arcs, Circular. Label Map: Default. xfst[0]: regex Ntom .o. ptom ; 1.3 Kb. 4 states, 15 arcs, Circular. Label Map: Default. xfst[1]:
We now have the same FST on the stack that we had earlier. As we look at
alternative ways of combining the rules, the definitions of Reordering rules
Changing the order in which the rules are applied stops the xfst[1]: regex ptom .o. Ntom ; 1.3 Kb. 4 states, 15 arcs, Circular. Label Map: Default. xfst[2]: What happens now? 1. down kaNpan 2. down kampan 3. up kamman 4. up kampan 5. Why isn't //kampan// part of the answer to the previous question? As a way of exploring the last question, you can try composing the rule transducers with a transducer for a particular word. For example: xfst[2]: regex ptom .o. Ntom .o. {kampan}; 488 bytes. 7 states, 6 arcs, 1 path. Label Map: Default. xfst[3]: print words ka<N:m>pan xfst[3]: regex Ntom .o. ptom .o. {kamman} ; 540 bytes. 8 states, 9 arcs, 3 paths. Label Map: Default. xfst[4]: print words kam<p:m>an kamman ka<N:m><p:m>an xfst[4]: Parallel RulesConsider the following combination. xfst[4]: regex N -> m || _ p ,, p -> m || m _ ; 1.3 Kb. 4 states, 15 arcs, Circular. Label Map: Default. xfst[5]:
This applies the rules in parallel. Note the double comma (”,,”),
required only for parallel rules with contexts. (Note that we cannot use
the previous definitions of What will happen now? xfst[5]: down N N xfst[5]: down Np mp xfst[5]: down kaNpam kampam xfst[5]: up kamman kampan kamman xfst[5]: up kampan kaNpan xfst[5]: down kampan kamman xfst[5]: down kamman kamman Ponder these. Note that the results do differ from both of the two previous versions. Loops and parallelism
Consider the following language: xfst[5]: regex a -> b, b -> a ; 348 bytes. 1 state, 3 arcs, Circular. Label Map: Default. xfst[6]: up abba baab xfst[6]: up bbbbba aaaaab xfst[6]: up bababa ababab xfst[6]: Note that neither way of ordering the rules gets this result: xfst[6]: read regex a -> b .o. b -> a ; 348 bytes. 1 state, 3 arcs, Circular. Label Map: Default. xfst[7]: up a a b xfst[7]: up b xfst[7]: down abba aaaa xfst[7]: down baab aaaa xfst[7]: up abba xfst[7]: up bb xfst[7]:
Try the other way around: So there are things you can't do with rule ordering Now think about this. Is this right semantics for these rules? Should they feed themselves instead? Get an infinite loop? What happens in this case? xfst[7]: read regex a -> b || x _ ,, a -> c || _ y ; 1.4 Kb. 4 states, 21 arcs, Circular. Label Map: Default. xfst[8]:
In particular, consider what happens for the input Here's what happens: xfst[8]: down xa xb xfst[8]: down ay cy xfst[8]: down xay xby xcy Moral: We are defining relations. Lenient compositionHere's a script which implements the syllabification transducer described in Lauri Karttunen's paper the Proper Treatment of Optimality in Computational Phonology. ############################################################################## # Script based on Karttunen (1998), The Proper Treatment of Optimality # in Computational Phonology: # # http://www2.parc.com/istl/members/karttune/publications/pto/bilkent.pdf # # Prepared and adapted by Jason Baldridge. Updated 2011-9-28. ############################################################################## ExercisesMany exercises can be found on the homepage for Lauri Karttunen's CIS639 course on XFST. Pig Latin translatorUsing the xfst tool, create a transducer for translating from English (or any other language) into Pig Latin. There may be different varieties of Pig Latin. Let's pick the simplest one, defined by the rule If the word begins with a consonant, the corresponding Pig Latin word is the same except that the initial consonant has been moved to the end of the word and suffixed with “ay”. For example, the sentence pig latin is fun corresponds to igpay atinlay is unfay in Pig Latin.
Your task is to write an xfst script,
that leaves on the stack a transducer with the following properties:
ExampleHere is an example of how things should work when you are done. xfst -l piglatin.xfst //Execute piglatin.xfst and wait for commands.// xfst[1]: apply down //Start applying the transducer downwards.// apply down> pig igpay apply down> pig latin igpay atinlay apply down> the little brown fox jumped over the lazy dog hetay ittlelay rownbay oxfay umpedjay over hetay azylay ogday HintsYou will find it convenient to start your script with some definitions, such as: define Cons [ b ] ; define Vowel [a | e | i | o | u ] ; define Ltr [ Cons | Vowel ]+ ; define Limit [" " | "\t" | .#. ] ; Because the trick in the Pig Latin transformation is the treatment of word-initial consonants, it is better to start experimenting with just one consonant. Once you succeed in constructing a transducer that correctly maps boa into oabay, the rest is easy. Because you cannot literally “move” the initial consonant from the beginning to the end of the word, you need to think of movement in terms of two more primitive operations: 1. Copy the initial consonant to the end of the word adding "ay". 2. Delete the initial consonant. You need the definition of Limit to refer to the beginning and end of the word. One more important hint: [ ] -> a || b _ c ; Maps the string "bc" to the infinite language [ b a* c ]. [. .] -> a || b _ c ; Maps the string "bc" to the string "bac".
To insert just one instance of something, use Work with simple replacement and composition. Don't try using parallel replacement. With just one consonant, your Pig Latin translator should have around 8 states and 44 arcs. With the full set of 21 consonants, the size of the transducer is about 48 states, 1220 arcs. SolutionsThe most straightforward solution for the Pig Latin problem is a brute force one which simply copies each letter explicitly. Another option is to use compile-replace to provide a solution based on reduplication. This is more elegant, but also comes with some computational cost unless an explicit dictionary is used (which is itself a fairly reasonable thing to do). CreditsLauri Karttunen created the original exercise, and credits Mark Steedman and Bonnie Webber for the idea. Overview of CommandsThis section contains a general overview of xfst commands. A complete alphabetical list, with explanations, is found in the next section. Command SyntaxXFST commands are in general of the form '<command> <type or object>', where <command> specifies the operation to be performed and the second term, if any, gives some additional specification about the type of the operation or the object it applies to. For example, there are several variants of the 'print' command: 'print net', 'print sigma', 'print words', etc. All display commands and all unary operations, such as 'lower-side net', apply to the network on the top of the stack. For example, 'print words' enumerates the paths of the topmost network only, although there may be others on the stack. Some commands, such as 'print net' and 'print words', can be followed by a name of network which has been given a name with the 'define' command, as shown below: xfst[0]: read text text> two text> words text> ^D 8 states, 8 arcs, 2 words. fst[1]: define W2 fst[0]: print words W2 two words xfst[0]: Virtually all XFST commands can be abbreviated to a single word command. For example, the 'print' part of all print commands can be dropped. Thus 'sigma' as a command has the same effect as 'print sigma'. Similarly, 'regex' and 'read regex' are equivalent. Short command names are convenient when one is working in an interactive mode. On the other hand, for scripts we recommend using the long commands because it tends to make the scripts more perspicuous. Command ClassesThe FST commands can be grouped into five classes. We describe first in general terms the five classes and list the commands in each class. A more detailed description of each command is given in an alphabetical listing in the next section. Input/Output and Stack CommandsMove networks onto and off the stack, save the stack or the top network into a file in various formats. We use the commands 'load' and 'save' for binary files, 'read' and 'write' for text files. 'read regex' and 'define' compile a network from a regular expression; 'read regex' pushes the result on the stack, 'define' assigns it to the chosen name. The 'read' commands take the input from the console (stdin) unless the user specifies another source by means of < symbol and a file name. For example, 'read text < /usr/dict/words' compiles the list in /usr/dict/words. Similarly, the 'write' commands send the output to the console (stdout) unless the command is followed by > and a name of a target file. It is also possible to send the output directly to a printer. For example, 'write text | lp' uses the | convention in Unix to pass the output to the 'lp' print command. Binary input/outputload stack, save stack, load defined, save defined Text input/outputread prolog, read spaced-text, read text, write prolog, write spaced-text, write text. Regular expressionsdefine, read regex, undefine Stack commandsclear stack, pop stack, push defined, turn stack, rotate stack Display commandsGive information about the content and properties of the network on the top of the stack, the language or relation it represents, the state of the system, and the available commands. Like the 'write' commands in the previous section, the 'print' commands display the output on the console, unless the user directs it to a file or a printer. The 'print' commands do not have a corresponding input command. Stack and network informationprint eqv-labels, print flags, print label-tally, print net, print name, print size, print sigma, print labels, print stack, inspect net System informationprint aliases, print file-info, print defined, print directory, print storage Network contentprint random-lower, print random-upper, print words, print lower-words, print upper-words, print nth-upper, print nth-lower, print num-upper, print num-lower Apply network to a stringapply up, apply down Helpapropos, help Otherecho Tests of network propertiesUnary tests check whether the network on the stack has certain properties. Binary tests compare the languages of the two top network on the stack. Unarytest lower-bounded, test upper-bounded, test lower-universal, test upper-universal Binarytest equivalent, test overlap, test sublanguage Operations on networks Produce a network that represents a new language or relation by modifying the top network, two top networks, or all the networks on the stack. The affected networks are removed from the stack and replaced by the result of the operation. Property list commands modify or display the attribute-value pairs on the top network's property list. Maintenance commands perform various housekeeping operations of the top level network structure without changing the language or the relation it represents. Unary operationsinvert net, label net, lower-side net, negate net, one-plus net, reverse net, sigma net, substitute defined, substitute label, substitute symbol, substring net, upper-side net, zero-plus net Binary operationscrossproduct net, minus net N-ary operationscompose net, concatenate net, intersect net, shuffle net, union net Property listadd properties, edit properties, name net, read properties, write properties Maintenancecomplete net, determinize net, eliminate flag, epsilon-remove net, minimize net, prune net, sort net, compact sigma, optimize net, unoptimize net System commandsChange the internal configuration of XFST or interact with the Unix operating system.
Alphabetical IndexThis index consists of three parts. The first index lists all the currently implemented XFST commands. The second list describes the variables that the user can manipulate with the XFST 'set' command. The final part explains the correspondence between certain XFST commands and certain operators in the regular expression calculus. The XFST commands involved are the 'load' and 'read' commands and all commands that construct a new network by means of an operation on one or more networks on the stack. List of Commands This list gives a brief description of all the currently implemented XFST commands. This information is also available on-line in XFST with the 'help' command. For example, 'help alias' prints out the same text as shown below. add properties add properties < source-file reads a set of attribute-value pairs and adds them to the property list of the top network on the stack. Each line in the source file must contain one attribute-value pair. For example, COPYRIGHT: "Xerox Corporation" DATE: "June 10, 1997" VERSION: 9 The attributes, each terminated by a colon, must be strings (upper case recommended). The values may be strings or integers. String values must be enclosed in double quotes: “9” and 9 are considered as different values. If an attribute to be added already exists on the property list of the network, 'add properties' will reset it to a new value. See also: 'read properties', 'write-properties'. alias alias name alias name command-sequence 'alias' defines as a shorthand for the given sequence of XFST commands. For instance, the new command 'ls' can be created in the following way:
alias ls print directory Alias may contain several commands. A new command 'lexr', for reading a lexicon, can be created in the following way: fst[0]: alias lexr alias> load lexicon.fst alias> print properties alias> END; A multi-command alias may also be defined on one line as follows: alias lexr load lexicon.fst; print properties See also: 'print aliases'. apply up apply up line apply up < source-file Simulates the composition of the input string with the lower side of the top network on the stack and extracts the result from the upper side. If the command 'apply up' is followed by a carriage return,the program prompts the user with apply>, prints the result and gives a prompt for a new input. Control-D exits from this mode. apply down apply down line apply down < source-file Simulates the composition of the input string with the upper side of the top network on the stack and extracts the result from the lower side. If the command 'apply down' is followed by a carriage return, the program prompts the user with apply>, prints the result and gives a prompt for a new input. Control-D exits from this mode. apropos word prints brief information about the commands relating to the given word. If you need further information about a command, use the command 'help command-name'. clear stack Removes all networks on the stack. compact sigma all symbols that have the same distribution as the unknown symbol are eliminated from the alphabet and the corresponding redundant arcs are removed from the network. complete net extends the top network until it has a transition for every symbol in sigma in every state. All new transitions lead to a failure state, thus the language of the network is preserved.Complete is defined only for simple networks (arity = 1), not for transducers. compose net replaces the stack with the composition of all networks currently on the stack. If the stack consists of two networks, defined as A and B, with A on top of the stack, the result is equivalent to compiling the regular expression [A .o. B]. concatenate net replaces the stack with the concatenation of all networks currently on the stack. If the consists of two networks, defined as A and B, with A on top of the stack, the result is equivalent to compiling the regular expression [A B]. In regular expressions, concatenation is indicated simply by whitespace. crossproduct net returns the crossproduct of the first two networks on the stack. They must not be transducers. If the stack consists of two networks, defined as A and B, with A on top of the stack, the result is equivalent to compiling the regular expression [A .x. B]. define symbol define symbol regular-expression binds the given symbol to a network compiled from the regular expression. Every subsequent occurrence of the symbol in a regular expression is interpreted as the name of the corresponding language or relation. To unbind the symbol use 'undefine'. If the regular expression is omitted, i.e. if the command is 'define', the symbol is bound to the top net on the stack. The net is removed from the stack. See also: 'load defined', 'save defined', 'print defined', 'read regex', 'undefine'. determinize net replaces the top network with an equivalent deterministic network. A deterministic net has at most one transition per label from each state. echo text Prints the text, followed by a newline. edit properties 'edit properties' allows the user to edit the property list of the top network on the stack. Properties may be added, deleted, and modified. Values must be integers or strings. eliminate flag replaces the top network on the stack with an equivalent network that encodes directly the constraints concerning the given flag attribute. A flag symbol contains two or three fields separated by a comma. Flag symbols with three fields are of the form @Action.Attribute.Value@ where Action is U (unify), R (require), P (set to positive value), N (set to the complement value). The other fields, Attribute and Value, can be chosen freely. Flag symbols with two fields are of the form @Action.Attribute@, where Action is either C (clear), or R (require), or D (disallow). For example, the command 'eliminate Harmony' removes all flags symbols with Harmony in the the attribute field and all paths that fail to meet the Harmony constraints. epsilon-remove net replaces the top network with an equivalent one that has no epsilon transitions. help help command-name 'help' lists the commands and the command classes. To get more information about a command, use the command 'help command'. It gives a longer explanations about the command and its usage. All possible arguments are listed and explained. For instance, the command 'help' (see above) may be used without arguments or it may have one argument that is a name of a command. The command name may designate a class of commands, for instance, 'help stack' prints all stack commands. If the given command name is not recognised, command help works like command 'apropos'. inspect net invokes a mode for interactively inspecting the top network on the stack. intersect net replaces the stack with the intersection of all networks currently on the stack. If the stack consists of two networks, defined as A and B, regardless of the order, the result is equivalent to compiling the regular expression [A & B]. invert net exchanges the two sides of the top network on the stack. If the network is defined as A, the result is equivalent to compiling the regular expression [A.i]. 'invert net' has no effect on simple networks. label net replaces the top network on the stack with a network that represents the language or relation that includes all the strings and string pairs in the label alphabet of the original network. See also 'print labels', 'sigma net'. load defined source-file restores the definitions from a binary source file. The networks are assigned to the chosen names directly, without pushing them onto the stack. The file must be in the binary format, as produced by the 'save defined' command. load stack source-file reads the networks in the given file, and pushes them onto the the stack. The file must be in the binary format, as produced by the 'save stack' command. If the file only contains a single network, it can also be loaded with the command 'read regex [@bin source-file];'. lower-side net extracts the lower language of the top network on the stack. If the network is defined as A, the result is equivalent to compiling the regular expression [A.l]. 'lower-side net' has no effect on simple networks. minimize net replaces the top network on the stack with an equivalent network that has a minimal number of states. minus net replaces the top two networks on the stack with a network containing all strings in the first network that are not in the second. If the two networks are defined as A and B, with A on top of the stack, the result is equivalent to compiling the regular expression [A - B]. name net symbol assigns the given name to the top network on the stack. negate net replaces the top network on the stack with a network that accepts all and only those strings rejected by the original. If the network is defined as A, the result is equivalent to compiling the regular expression [~A]. one-plus net concatenates the top network any number of times with itself. If the network is defined as A, the result is equivalent to compiling the regular expression [A+]. optimize net runs a heuristic algorithm that tries to reduce the number of arcs, possibly at the cost of introducing more states. 'unoptimize net' restores the original network. pop stack Removes the top network on the stack. print aliases print aliases > target-file prints each alias and the command sequence it encodes. print defined print defined > target-file prints each defined symbol and the size of the network it stands for. print directory print directory pattern print directory pattern > target-file prints the content of the current directory in its entirety or the files that match the given pattern, such as *.txt. print eqv-labels print eqv-labels > target-file prints the label equivalence classes of the networks on the stack. Labels that belong to the same equivalence class always appear together on arcs that lead to the same destination state. print file-info print file-info > target-file displays info about the last opened binary file. print flags print flags > target-file 'print flags' prints information about the top network on the stack. print labels print labels symbol print labels > target-file prints the label alphabet of the top network on the stack (by default) or of the given defined network. The label alphabet includes all the symbols and symbol pairs that occur as arc labels in the network. See also 'print sigma', 'label net'. print label-tally print label-tally > target-file tallies label frequencies in the top network on the stack. print lower-words print lower-words name print lower-words number print lower-words name number print lower-words > target-file displays the words in the lower side language of the top network or in the network assigned to the defined symbol. If a number is given, only that many words are printed, otherwise every word is printed but not looping on any path more than once. The output can be directed to a file by adding a greater-than symbol, >, and a name of a target file to the end of the command. For example, 'print lower-words > lower-words.txt'. If the network is a simple network rather than a transducer, 'print lower-words', 'print words', and 'print upper-words' give the same output. See also the variable 'print-space'. print name prints the name of the top network on the stack. If the net is unnamed, the hex address of the net is printed instead. Use the command 'name net' to give a name to the top network. print net print net symbol print net > target-file prints a text representation of the top network on the stack (by default) or the given defined network. The output is directed to a file by adding a greater-than symbol, >, and a name of a target file to the end of the command. print nth-lower print nth-lower name print nth-lower number print nth-lower name number print nth-lower > target-file displays the word on the lower side of the nth path in the top level network or in the network assigned to the defined symbol. This command only works on acyclic networks. The inverse command is 'print num-lower'. print nth-upper print nth-upper name print nth-upper number print nth-upper name number print nth-upper > target-file displays the word on the upper side of the nth path in the top level network or in the network assigned to the defined symbol. This command only works on acyclic networks. The inverse command is 'print num-upper'. print num-lower word print num-lower name word print num-lower > target-file displays the numbers of all paths in the top level network or int the named network that have word on the lower side. This command only works on acyclic networks. The inverse command is 'print nth-lower'. print num-upper word print num-upper name word print num-upper > target-file displays the numbers of all paths in the top level network that have word on the upper side. This command only works on acyclic networks. The inverse command is 'print nth-upper'. print random-lower print random-lower number print random-lower > target-file generates 15 random words (by default) from the top network on the stack or any number specified in the command. The words are from the lower side language of the net. If the network is a simple automaton, rather than a transducer, 'print random-lower' gives the same output as 'print random' and 'print random-upper'. print random-upper print random-upper number print random-upper > target-file generates 15 random words (by default) from the top network on the stack, or any number specified in the command. The words are from the upper side language of the net. If the network is a simple automaton, rather than a transducer, 'print random-upper' gives the same output as 'print random' and 'print random-lower'. print random-words print random-words number print random-words > target-file generates 15 random words (by default) from the top network on the stack or any number specified in the command. If the network is a transducer, the output may contain symbol pairs. If the network is a simple automaton, rather than a transducer, 'print random' gives the same output as 'print random-lower' and 'print random-upper'. print sigma print sigma name print sigma > target-file prints the sigmaalphabet of the top network on the stack (by default) or the given defined network. The sigma alphabet includes all the symbols that appear in the network, either as such or as a constituent of a symbol pair, and any symbols that have been excluded from the network by subtraction or complementation. See also 'print labels', 'sigma net'. print size print size name print size > target-file prints the size of the top network on the stack (by default) or the given defined network. print stack print stack > target-file displays the content of the stack. print storage print storage > target-file displays memory usage print upper-words print upper-words name print upper-words number print upper-words name number print upper-words > target-file displays the words in the upper side language of the top network or in the network assigned to the defined symbol. If a number is given, only that many words are printed, otherwise every word is printed but not looping on any path more than once. The output can be directed to a file by adding a greater-than symbol, >, and a name of a target file to the end of the command. For example, 'print upper-words > upper-words.txt'. If the network is a simple network rather than a transducer, 'print lower-words', 'print words', and 'print upper-words' give the same output. See also the variable 'print-space'. print words print words name print words number print words name number print words > target-file displays the paths in the top network or in the network assigned to the defined symbol. If a number is given, only that many paths are printed, otherwise every word is printed but not looping on any path more than once. The output can be directed to a file by adding a greater-than symbol, >, and a name of a target file to the end of the command. For example, 'print words > words.txt' If the network is a transducer, the output may contain symbol pairs. Otherwise, 'print lower-words', 'print words', and 'print upper-words' give the same output. See also the variable 'print-space'. prune net removes all paths leading to non-final states in the top network on the stack. push defined symbol makes a copy of the network named by a previous 'define symbol' command and pushes it onto the stack. 'push defined symbol' has the same effect as 'read regex symbol ;'. quit Exits the program. read prolog read prolog < source-file reads the network(s) contained in the file and pushes them onto the stack. The file must be in the format produced by the 'write prolog' command. The the same effect can be produced with the command 'read regex [@pl source-file];' read properties read properties < source-file reads a set of attribute-value pairs resets the property list of the top network on the stack. See also: 'add properties', 'write properties'. read regex read regex < source-file creates a network from a regular expression and and pushes it onto the stack. The file should contain one regular expression. If no filename is given the input is read from the console. The input expression must be terminated by a semicolon. read spaced-text read spaced-text < source-file reads a transducer or a net with multicharacter symbols from a file. See also function 'print words' and variable 'print-space'. read text read text < source-file reads the net as text from a file. See also function 'print words' and variable 'print-space'. reverse net replaces the top net on the stack with one that accepts the reverse language. rotate stack Pushes the top element of the stack to the bottom. save defined target-file stores the networks for all defined symbols into a binary file. Use 'load defined' to restore the definitions. save stack target-file stores all the networks on the stack into the given file in a binary format. Use 'load stack' to load the file. set variable value sets a new value to the variable. The value may be an integer, ON or OFF, or a string. See variable for details. See also: 'show'. show variable shows the value of the variable. See also 'set'. shuffle net replaces the stack with a single network that accepts every string formed by shuffling together (interdigitanting) one string from each of the input languages. For instance, if Net1 accepts ab and Net2 accepts xy, then the shuffle of these nets will accept the following six strings: abxy axby axyb xaby xayb xyab. sigma net replaces the top network on the stack with a network that encodes the "sigma language" of the original network, that is, the union of all symbols in the sigma alphabet. See also 'print sigma', 'label net'. sort net Reorders the arcs of the top network in a canonical order. source command-file executes a file of XFST commands. For example, 'source xfst.script' executes the commands in xfst.script. substitute defined 'substitute defined NAME for SYMBOL' replaces the top network on the stack with a network derived by splicing in the network associated with the given name for each arc that contains the symbol. If the network is a transducer, SYMBOL must not occur in a symbol pair. substitute label 'substitute label LIST_OF_LABELS for LABEL' replaces the top network on the stack with a network derived by replacing every arc with the given label by an arc or arcs, each labeled with one of the substitute labels. If the list consists of the keyword NOTHING, all paths containing the given label are removed. The label and the substitutes may be single symbols or a symbol pairs. See also 'substitute symbol'. substitute symbol 'substitute symbol LIST-OF-SYMBOLS for SYMBOL' replaces the top network on the stack with a network derived by replacing every arc whose label contains the given symbol by an arc or arcs whose label instead contains one of the substitute symbols. If the list consists of the keyword NOTHING, all paths containing the given symbol are removed. The symbol and the substitutes must be single symbols, not symbol pairs. See also 'substitute label'. substring net replaces the top network on the stack with a network that accepts every string that is a substring of some string in the language of the original network. system system-command 'system command' executes the given operating system command(s), for instance, 'system ls'. test equivalent returns 1 (TRUE) if the two topmost networks contain the same language test lower-bounded returns 1 (TRUE) if the lower side of the top level network has no epsilon cycles. test lower-universal returns 1 (TRUE) if the lower side of the top level network represents the universal language. test overlap returns 1 (TRUE) if the languages of the two topmost networks have a non-empty intersection. test sublanguage returns 1 (TRUE) if the language of the topmost network is a sublanguage of the second network on the stack. test upper-bounded returns 1 (TRUE) if the upper side of the top level network has no epsilon cycles. test upper-universal returns 1 (TRUE) if the upper side of the top level network contains the universal language. turn stack reverses the order of the networks on the the stack. NOTE: Use the command 'reverse net' to create the reverse language of the top net. undefine list-of-symbols unbinds the symbol that was bound by the 'define' command. 'undefine symbol1 symbol2 symbol3 ...' etc. undefines all the listed symbols. 'undefine ALL' removes all the definitions. union net replaces the stack with the union of all networks currently on the stack. unoptimize net reverses the effect of 'optimize net' restoring the original network. upper-side net extracts the upper language from the network on the top of the stack. If the network is defined as A, the result is equivalent to compiling the regular expression [A.u]. 'upper-side net' has no effect on simple networks. write prolog write prolog > target-file writes the network as Prolog-clauses. See also: 'read prolog'. write properties write properties name write properties > target-file prints the attribute-value pairs on the property list of the network. See also 'add properties', 'read properties'. write spaced-text write spaced-text > target-filetarget-file Stores the top network in a text format. This format is intended for transducers and networks with multi-character labels. Use 'read spaced-text' to read files in this format. Use 'write text' for simple networks with single-character labels. Use 'save stack' to store in a compact binary format. If no target file is specified, output will be sent to standard output (usually the terminal). write text write text > target-file stores the words in the top network, one per line. This format cannot be used for transducers or networks with multi-character labels. For such networks, use 'save-as-spaced-text' instead. Use 'save stack' to store in a compact binary format. If no target file is specified, output will be sent to standard output (usually the terminal). zero-plus net Variables This is the list of variables that the user can set with the command 'set variable value' with the list of available values. The command 'show variable' displays the current value of the variable. directory filename the working directory. mark-version (ON/OFF) when ON, the name and version number of the application is recorded on the property list of each saved network under the property TOOLVERSIONS. For example, TOOLVERSIONS: (("XFST" "5.6.7")).
max-regex-errors (integer) the maximum number of errors for regular expressions. When it is exceeded the rest of the input file is discarded. Default is 10. minimal (ON|OFF) when ON, network operations minimize the result. Minimization includes epsilon-removal and determinization. Default is ON. name-nets (ON|OFF) when ON, regex names networks. Default is OFF. print-pairs (ON|OFF) when ON,'apply up' and 'apply down' show both (UPPER and LOWER) sides of labels. Default is OFF. print-sigma (ON|OFF) when ON, the sigma is shown when a network in printed. Default is ON. print-space (ON|OFF) when ON, 'print word' inserts a space between symbols. Default is OFF. process-in-order (ON|OFF) when ON, intersection and composition operations process networks in the order they appear in the stack. Default is OFF. quit-on-fail (ON|OFF) when ON, the application exits with an error if a command is incorrectly spelled or cannot be executed because of a missing file or some other error. This is useful when the application is run by a shell script. Default is OFF. quote-special (ON|OFF) When ON literal instances of special characters are enclosed in double quotes: zero and question mark are printed as "0" and "?" to distinguish them from epsilon and the unknown symbol. Space, tab, and newline are printed as " ", "\t", and "\n", respectively, to distinguish them from each other. The affected commands are 'print words', 'print lower-words', and 'print upper-words'. Default is OFF. random-seed (integer) seed of the random number generator. It cannot be inspected. sort-arcs (ON|OFF) when ON, the arcs of a network are sorted before the network is pushed onto the stack. Default is ON. unicode (ON|OFF) if the value is ON, two-byte characters are printed as four hex integers preceded.by \u. If the value is OFF, we try to display 16-bit characters directly. Default is OFF. verbose (ON|OFF) when ON, more information is printed. Default is ON. Commands ↔ RE operators All the XFST commands that produce a new network from one or more expressions on the stack and replace them with the result of the operation correspond to an operation in the regular expression calculus (see Syntax and Semantics of Regular Expressions). Some of the load and read commands also have a counterpart in the regular expression language. The correspondence is summarized below along with a characterization of the language or relation that the result implements, described in terms of the corresponding regular expression. We also indicate for each command whether it applies to the top network (unary), the two top networks (binary), or to all networks on the stack (n-ary). concatenate net (n-ary) [A B] denotes the concatenation of all strings or string pairs in A with all strings or string pairs in B. compose net (n-ary) [A .o. B] denotes the composition of the relations A and B. crossproduct net (binary) [A .x. B] denotes the relation that maps every string of A to every string of B. intersect net (n-ary) [A & B] denotes the set of strings that belong to both A and B. invert net (unary) [A.i] denotes the relation that is the inverse of A; that is the upper and lower members of each string pair in the relation are switched around but the relation remains otherwise the same. load stack [@bin source-file] denotes the topmost network in the binary source-file. Note that @bin only retrieves one network, whereas 'load stack' pushes onto the stack all the networks in the file. If the name of the source file contains a period, a slash or some other special character, the name should be enclosed in double quotes. lower-side net (unary) [A.l] denotes the lower language of the relation A. minus net (binary) [A - B] denotes the set of strings that are in A but not in B. negate net (unary) [~A] denotes the set of all strings that are not in A. one-plus net (unary) [A+] denotes all the strings that are composed by concatenating strings of A with each other any number of times. read prolog read regex read spaced-text read text have corresponding regular expression operators: @pl, @re, @stxt, and @txt, respectively, These unary operators are prefixed to the name of the source file. If the name contains a period, a slash or some other special character, the name must be enclosed in double quotes. For example, read regex [@re regexfile] ; compiles the regular expression in regexfile; read regex [@txt "words.txt"] ; has the same effect as read text < words.txt reverse net (unary) [A.r] denotes the language or relation that is the reverse of A. substitute symbol SYMBOL_LIST for SYMBOL (unary) `[A, SYMBOL, SYMBOL_LIST] denotes the language or relation derived from A by replacing every instance of the given symbol by the union of the symbols on the list. union net (n-ary) [A | B] denotes the union of expressions A and B. All the strings and string pairs that belong to either A or B (or both) also belong to the union. upper-side net (unary) [A.u] denotes the upper language of the relation A. zero-plus net (unary) [A*] denotes the set of strings that consists the empty string and any number of concatenations of A with itself. [A*] is like [A+] except that it always includes the empty string. N.B. There are many regular expression operators without an equivalent command in XFST. For example, the replace (→) and restriction (⇒) operators do not correspond to any XFST command. This content is licensed under the following license: CC Attribution-Noncommercial-Share Alike 3.0 Unported. |
LIN386M Home >