A didactic implementation of regex validation using FSM in python with backend parsing in C++
- regexPy Main package
- regex.py Module containing the interface for the final user.
- fsm.py The core module, handling FSMs creation and simulation, actually executing the regex matching.
- utils Package
- stack.py A simple stack data structure implementation used to simplify stack manipulation operations.
- fsmutils.py The module containing usefull functions for manipulating FSMs in order to achieve a Regex.
- tests Package containing unit tests.
- Parser The parser implementation which is used to obtain a polish notation stack based representation of the regex used for FSM generation, this folder only contains files used for execution.
- ParserSource Tools and sources used to easly generate ad hoc parser, can be used for recompile purpose. Special thanks: Professor Mauro Leoncini
- main.py, timeTest.py Testing files.
- tests.py Script for launching all tests.
shell:
python3 ./main.py "abcd" "a*"
False
python3 ./main.py "aaaa" "a*"
True
python command line:
>>> from regexpy.regex import Regex
>>> Regex.match("abcd", "a*")
False
>>> Regex.match("aaaa", "a*")
True
import Regex from regex class and use either the instance method validate() or the static method match() with the string and the pattern as done in main.py file
- Instantiate a FSMNotDeterministic object, passing all the parameters describing the regex, or using FSMUtils class and use it to obtain a FSMDeterministic using the subsetConstruction() method.
- Use checkRegex() method of the obtained object for string validation.
- RECOMPILING: Use ParserSource/Makefile if you need.
NOTE: you could instantiate directly a FSMDeterministic object in simple cases, but it recomended to
regex class now implements:
- matchAny() "One of" matching
- matchAll() "All of" matching Allowing for multiple regex to be evaluated at once.
The program achieved the instantiation of a regex object just parsing a string describing it, without having to project the FSM itself.
This is computable via an algorithm that turns the regex into a not deterministic fsm, then using the subset construction to obtain a deterministic fsm, which is also simulable for string validation.
FSMUtils contains every function necessary for this process.