Regular expressions of SmartEiffel 2.2
Since SmartEiffel 2.2 a cluster is included that allow you to use either POSIX regular expressions or PERL5 regular expressions.
Currently the regular expression cluster makes use of the backtracking cluster.
The main restriction comes from used the character set. The used one is the ASCII character set. UNICODE characters are not supported currently.
An other restriction is that most of the common character escapes are not made: \n is n not new line, \t is t not tab, etc.. The reason is that we supposed that such a preprocessing can be made by users and we had a schizophrenic conflict between %N and \n. Your opinion about that point is very wellcome.
- The constructs (a?b?)* gives an infinite loop and fills the stack until crash.
- The construct (a+)+ can use so much CPU that you believe that it is buggy.
- The construct (a(b)?)* applied on aba will give you a group 2 not empty, beginning at 2 and finishing at 2.
The POSIX regular expressions
The POSIX regular expression are supported except for the [=...=] construct (same class that ...).
See POSIX 1003.2, section 2.8 (Regular Expression Notation). www.unix.org
On LINUX: man 7 regex.
The PERL 5 regular expressions
With the main restriction that UNICODE is not handled, most of the PERL 5 regular expressions is implemented. See Perl Regular expressions.
- The experimental behaviours are not integrated
- Some construct like [[:] that are well interpreted by perl produce an error. In these cases, you must write [:.
- The look behind (negative or positive) is not restricted to have a fixed length.
- The POSIX constructs [:<:] and [:>:] are allowed (it stands for begin and end of words).
- The shell (bash) file matching (help wanted)
- The basic regular expression (help wanted)
- The JAVA regular expressions (help wanted, trick inherit PERL5 builder and make the hungry repetition)