Difference between revisions of "Lib/regular expression"
Hzwakenberg (talk | contribs) m (→Coming soon?) |
|||
(15 intermediate revisions by 4 users not shown) | |||
Line 1: | Line 1: | ||
+ | [[Category: Library]] |
||
− | == Regular expressions of SmartEiffel 2.2 == |
||
+ | == Regular expressions in Liberty Eiffel == |
||
− | |||
− | + | In Liberty Eiffel, a cluster is included that allows you to use either POSIX regular expressions or PERL5 regular expressions. |
|
Currently the regular expression cluster makes use of the backtracking cluster. |
Currently the regular expression cluster makes use of the backtracking cluster. |
||
− | The main restriction comes from |
+ | The main restriction comes from the character set used; this is the ASCII character set. |
+ | UNICODE characters are not supported at the moment. |
||
− | + | Another restriction is that most of the common character escapes do not work: |
|
+ | \n is a literal n, not new line; \t is a literal t, not tab; etc.. The reason is that we thought that |
||
+ | users could do such preprocessing themselves, and we had a schizophrenic conflict |
||
+ | between %N and \n. Your opinion about that point is very welcome. |
||
Known problems: |
Known problems: |
||
− | * The |
+ | * The construct <TT>(a?b?)*</TT> causes an infinite loop and crashes with a stack overflow. |
− | * The construct (a+)+ can use so much CPU that you believe that it is buggy. |
+ | * The construct <TT>(a+)+</TT> can use so much CPU that you believe that it is buggy. |
− | * The construct (a(b)?)* applied on aba will give you a group 2 not empty, beginning at 2 and finishing at 2. |
+ | * The construct <TT>(a(b)?)*</TT> applied on <TT>aba</TT> will give you a group 2 not empty, beginning at 2 and finishing at 2. |
As always, see the tutorial to learn how to use regular expressions. |
As always, see the tutorial to learn how to use regular expressions. |
||
Line 42: | Line 46: | ||
* The basic regular expression (help wanted) |
* The basic regular expression (help wanted) |
||
* The JAVA regular expressions (help wanted, trick inherit PERL5 builder and make the hungry repetition) |
* The JAVA regular expressions (help wanted, trick inherit PERL5 builder and make the hungry repetition) |
||
+ | |||
+ | ''If you want to look forward, consider implementing Perl 6 regular expressions (now called Perl 6 rules, because they go so far beyond regexps). Perl 6 rules combine traditional regexps with parsing grammars, eliminating the use for tools such as lex/yacc. Here are some details:'' |
||
+ | |||
+ | * Larry Wall's explanation of the changes: http://dev.perl.org/perl6/doc/design/apo/A05.html<br> |
||
+ | |||
+ | * Damian Conway's specification of Perl 6 rules: http://dev.perl.org/perl6/doc/design/exe/E05.html |
||
+ | |||
+ | * Patrick Michaud's tutorial (assumes knowledge of Perl 5 regexps): http://dev.perl.org/perl6/doc/design/syn/S05.html |
||
+ | |||
+ | * IBM overview of Perl 6 Rules: http://www-128.ibm.com/developerworks/linux/library/l-cpregex.html?ca=dgr-lnxw01Perl6Gram |
||
+ | |||
+ | [[User:Eiffel|Roger Browne]] 23:12, 4 Dec 2005 (CET) |
Latest revision as of 07:54, 20 June 2016
Regular expressions in Liberty Eiffel
In Liberty Eiffel, a cluster is included that allows you to use either POSIX regular expressions or PERL5 regular expressions.
Currently the regular expression cluster makes use of the backtracking cluster.
The main restriction comes from the character set used; this is the ASCII character set. UNICODE characters are not supported at the moment.
Another restriction is that most of the common character escapes do not work: \n is a literal n, not new line; \t is a literal t, not tab; etc.. The reason is that we thought that users could do such preprocessing themselves, and we had a schizophrenic conflict between %N and \n. Your opinion about that point is very welcome.
Known problems:
- The construct (a?b?)* causes an infinite loop and crashes with a stack overflow.
- The construct (a+)+ can use so much CPU that you believe that it is buggy.
- The construct (a(b)?)* applied on aba will give you a group 2 not empty, beginning at 2 and finishing at 2.
As always, see the tutorial to learn how to use regular expressions.
The POSIX regular expressions
The POSIX regular expression are supported except for the [=...=] construct (same class that ...).
See POSIX 1003.2, section 2.8 (Regular Expression Notation). www.unix.org
On LINUX: man 7 regex.
The PERL 5 regular expressions
With the main restriction that UNICODE is not handled, most of the PERL 5 regular expressions is implemented. See Perl Regular expressions.
Missing things:
- The experimental behaviours are not integrated
- Some construct like [[:] that are well interpreted by perl produce an error. In these cases, you must write [:[].
Added things:
- The look behind (negative or positive) is not restricted to have a fixed length.
- The POSIX constructs [:<:] and [:>:] are allowed (it stands for begin and end of words).
Coming soon?
- The shell (bash) file matching (help wanted)
- The basic regular expression (help wanted)
- The JAVA regular expressions (help wanted, trick inherit PERL5 builder and make the hungry repetition)
If you want to look forward, consider implementing Perl 6 regular expressions (now called Perl 6 rules, because they go so far beyond regexps). Perl 6 rules combine traditional regexps with parsing grammars, eliminating the use for tools such as lex/yacc. Here are some details:
- Larry Wall's explanation of the changes: http://dev.perl.org/perl6/doc/design/apo/A05.html
- Damian Conway's specification of Perl 6 rules: http://dev.perl.org/perl6/doc/design/exe/E05.html
- Patrick Michaud's tutorial (assumes knowledge of Perl 5 regexps): http://dev.perl.org/perl6/doc/design/syn/S05.html
- IBM overview of Perl 6 Rules: http://www-128.ibm.com/developerworks/linux/library/l-cpregex.html?ca=dgr-lnxw01Perl6Gram
Roger Browne 23:12, 4 Dec 2005 (CET)