Text Algorithms

Semestr: Winter

Range: 2+2s

Completion:

Credits: 4

Programme type: Undefined

Study form: Fulltime

Course language:

Summary:

Text is the simplest and natural representation of information in different areas. Text is a linear sequence of symbols from some alphabet. The manipulation of text is used in many application areas: processing of text in natural and formal languages, study of sequences in molecular biology, music analysis, etc. The basic problems covered by this lecture is string matching and repetitions finding. In both cases, exact and approximate matching is taken into account. These problems are of the main interest in lectures and seminars. The main formal system used for the description of respective algorithms are finite automata. They appears as a very useful tool not only for understanding but for solving of many problems as well.

Keywords:

Text searching, forward searching, backward searching, finite automata, simulation of nondeterministic finite automata, factor automata, repetitions in text.

Course syllabus:

1. Fultext systems, basic notions, classification of searching problems.
2. Forward searching, models of searching algorithms.
3. Nondeterministic searching automata.
4. Simulation of nondeterministic searching automata, bit parallelism, dynamic programming.
5. Prefix and suffix automata.
6. Factor automata, factor oracle, suffix oracle.
7. Borders and periods in text.
8. Repetitions in text, exact and approximate.
9. Simulation of nondeterministic searching automata, fail function, MP and KMP algorithms.
10. Simulation of nondeterministic searching automata, fail function, AC algorithms.
11. Backward matching of one pattern, BM algorithm.
12. Backward match of set of patterns, CW algorithm.
13. Actual results in text searching algorithms (for example searching in two-dimensional text).
14. Actual results in text searching algorithms (for example searching in compressed text).

Seminar syllabus:

1. Finite automata, basic operations with finite automata.
2. Formulation of basic searching problems for exact and approximate matching.
3. Nondeterministic finite automata as models for text searching.
4. Deterministic searching automata and their complexity.
5. Simulation of searching automata, bit parallelism, dynamic programming.
6. Construction of prefix and suffix automata.
7. Construction of factor automata, factor and suffix oracle automata.
8. Looking for borders and periods in text.
9. Looking for exact and approximate repetitions in text.
10. Simulation of searching automata, MP and KMP algorithms.
11. Simulation of searching automata, AC algorithm.
12. Backward pattern matching, BM algorithm.
13. Backward pattern matching, CW algorithm.
14. Assessment.

Literature:

1. Melichar, B., Holub, J., Polcar, T.: Text searching algorithms. Volume I,
II. Tutorial for lectures.
2. Melichar, B. a kol.: Text searching algorithms. Seminars. Collection of examples for seminars.

Examiners:

Lecturers:

Instructors: