OpenFst Library
- OpenFst is a library for constructing, combining, optimizing, and searching weighted finite-state transducers (FSTs,有限状态机)
- Weighted finite-state transducers are automata where each transition has an input label, an output label, and a weight.
- Finite-state acceptors are used to represent sets of strings (specifically, regular or rational sets);
- finite-state transducers are used to represent binary relations between pairs of strings (specifically, rational transductions).
- The weights can be used to represent the cost of taking a particular transition.
- FSTs have key applications in speech recognition and synthesis, machine translation, optical character recognition, pattern matching, string processing, machine learning, information extraction and retrieval among others.
- Often a weighted transducer is used to represent a probabilistic model (e.g., an n-gram model, pronunciation model).
OpenFst Quick Tour
Example FST
The following picture depicts a finite state transducer:
- The initial state is label 0. There can only be one initial state.
- The final state is 2 with final weight of 3.5. Any state with non-infinite final weight is a final state.
- There is an arc (or transition) from state 0 to 1 with input label a, output label x, and weight 0.5.
- This FST transduces, for instance, the string ac to xz with weight 6.5 (the sum of the arc and final weights).
Creating FSTs
- Creating FSTs Using Constructors and Mutators From C++
Creating FSTs Using Text Files from the Shell
create the text FST file
123456789# arc format: src dest ilabel olabel [weight]# final state format: state [weight]# lines may occur in any order except initial state must be first line$ cat >text.fst <<EOF0 1 a x .50 1 b y 1.51 2 c z 2.52 3.5EOFThe internal representation of an arc label in an integer. We must provide the mapping from symbols to integers explicitly with a symbol table file
123456$ cat >isyms.txt <<EOF<eps> 0a 1b 2c 3EOF123456$ cat >osyms.txt <<EOF<eps> 0x 1y 2z 3EOFYou may use any string for a label; you may use any non-negative integer for a label ID.
This text FST must be converted into a binary FST file before it can be used by the OpenFst library.
123# Creates binary Fst from text file.# The symbolic labels will be converted into integers using the symbol table files.$ fstcompile --isymbols=isyms.txt --osymbols=osyms.txt text.fst binary.fst12# As above but the symbol tables are stored with the FST.$ fstcompile --isymbols=isyms.txt --osymbols=osyms.txt --keep_isymbols --keep_osymbols text.fst binary.fstThe internal representation of the FST
Accessing FSTs
- Accessing FSTs from C++
Printing, Drawing and Summarizing FSTs from the Shell
print out an FST in AT&T text format:
1$ fstprint --isymbols=isyms.txt --osymbols=osyms.txt binary.fstdraw an FST using Graphviz dot format
123# Draw FST using symbol table files and Graphviz dot:$ fstdraw --isymbols=isyms.txt --osymbols=osyms.txt binary.fst binary.dot$ dot -Tps binary.dot >binary.ps