Abstract
The Regex Coach is a graphical application for Windows which can be used to experiment with (Perl-compatible) regular expressions interactively. It has the following features:
- It shows whether a regular expression matches a particular target string.
- It can also show which parts of the target string correspond to captured register groups or to arbitrary parts of the regular expression.
- It can "walk" through the target string one match at a time.
- It can simulate Perl's
split
ands///
(substitution) operators.- It tries to describe the regular expression in plain English.
- It can show a graphical representation of the regular expression's parse tree.
- It can single-step through the matching process as performed by the regex engine.
- Everything happens in "real time", i.e. as soon as you make a change somewhere in the application all other parts are instantly updated.
The Regex Coach together with this documentation can be downloaded from http://weitz.de/files/regex-coach.exe. The current version is 0.9.2 - see the changelog for what's new. The file (an installer) is about 2MB in size.
You should use Windows 2000 or Windows XP with all updates and service packs installed. The program might work with older or unpatched Windows versions, but don't expect support for these configurations. See also below.
You also must have the Microsoft runtime
library msvcr80.dll
installed. If you don't have it or
if you aren't sure, you can get
it from http://www.microsoft.com/downloads/details.aspx?familyid=32BC1BEE-A3F9-4C13-9C99-220B62A191EE&displaylang=en.
If you have a previous version (0.8.5 or earlier) of The Regex
Coach installed, uninstall it first before you install
the new version! If you haven't done this, and the new application won't
start, remove the
file The Regex Coach.exe.manifest
from the
application directory.
If you have an older version of Windows and the current version of The Regex Coach doesn't work for you, you can try the last release which was built with LispWorks 4.4.6 - it is at http://weitz.de/files/regex-coach-0.8.5.exe. If that works for you - fine. Don't expect support or updates, though.
There is no Mac version and I have no plans to release one. Sending me email and begging for it won't change that. And, no, I don't want to open source the application or send the source code to you privately - no need to ask...
Jeremy Rayner has written a "homage" to The Regex Coach in Java - see here for more details.
The Regex Coach is free for private or non-commercial use. The Regex Coach is also free for commercial use but you are not allowed to re-distribute it and/or charge money for it without written permission by the author - email me at edi@weitz.de for details.
The program is provided 'as is' with no warranty - use at your own risk.
m//
,
s///
, and split
) interactively and in
"real time", i.e. as soon as you make changes somewhere the
results are instantly displayed. You can also query the regex engine
about selected parts of your regular expression and watch how it
parses your input.
Of course, this application should also be useful to programmers using Perl-compatible regex toolkits like PCRE (which is used by projects like Python, Apache, and PHP) or CL-PPCRE. Also, Java's regular expressions and those of XML Schema are very similar to Perl's.
The following descriptions will use the notions introduced by this
annotated screenshot. The screenshot itself is an imagemap - click on
any part of it to go directly to the relevant section of the
docs.
GNU
Emacs
. (If you have never used Emacs you might know a
couple of these keybindings from the bash
shell.) You
can use the TAB
key to switch between these editors. This
will also cycle through the replacement pane if
it's visible.
The upper pane is the regex pane. Here you'll type the regular expression you want to investigate.
The second pane is the target pane. Here you'll type the text (the target string) the regular expression will try to match.
If there's a match, the part of the target string that matched will be
emphasized by a yellow background. (If you also check the
'g
' modifier checkbox
all matches will be emphasized - the "current" one in yellow, the
others in green.)
The target message area will show the extent of the match (or notify you that there isn't a match at all). This is particularly useful if there's a zero-length match because you won't see any highlighted characters in the target pane in this case. The message "Match from n to m" means that the characters starting from position n up to m (exclusively) belong to the match. The first character of the string is character 0 (zero) as usual.
b
' in the regular expression was
selected which corresponds to the fourth 'b
' in the
target string.
If you've made an invalid selection the selection highlight button is disabled. You'll also see a message about your selection being invalid in the info pane.
If you have no idea what a "valid subexpression" of the
regular expression could be consider the following rule of thumb:
Every part of the regular expression which can be wrapped in a
non-capturing group - i.e. with (?:...)
- without
altering the meaning of the expression is valid.
(A more precise description of this would be: Consider the parse tree of the regular expression and assume that every leaf of the tree which is a string is further divided into the single characters which together constitute the string. Now, every contiguous part of the regular expression which can be completely and exactly covered by nodes of the parse tree is a valid subexpression.)
Press the "nothing" button to disable highlighting.
g
' modifier -
or if you apply the split
operator.)
The headline above the scan buttons which usually says "Scan from 0" will change accordingly showing a message like "Scan #n from m" which means that the regex engine is trying to find the nth match starting at character m of the target string. The target message area will be changed as well - it'll say "Match #n from k to l" instead of "Match from k to l" (or it'll say "No further match" instead of "No match" if you've pressed the scan forward button too often).
s///
(substitution) operator. The second pane will
show the result of the substitution. The contents of these panes are
meaningless if the regular expression has syntactical
errors.
Note that you'll have to use "\&
",
"\`
", "\'
" and
"\n
" instead of Perl's
"$&
", "$`
",
"$'
" and "$n
" -
see the CL-PPCRE
documentation for the gory details.
split
operator to the target string. As this result is
usually an array of strings the elements of this array are visually
divided by vertical lines the size of a space character. (This implies
that two vertical lines in a row denote that there's a zero-length string
between them. And it also follows that the array has only one element
if there's no vertical line at all.)
You can use the radio buttons below the pane to select another divider if the vertical line happens to be a part of your target string. But note that choosing the "block" option might significantly slow down the program if your target strings are long.
You can type a non-negative integer into the "Limit"
field. This corresponds to the optional third argument to Perl's
split
operator.
Note that many of the optimizations done by the CL-PPCRE engine are turned off
here for pedagogical reasons. (For example, when trying to match the
regex a*abc
against the target string aaaabd
the "real" engine wouldn't even start because it'll first
use a Boyer-Moore-Horspool search to check if the constant string
abc
is somewhere in the target.) Some of them remain,
however: The engine will only try to match from position 0 if the
regex starts with .*
and is in single-line mode. Also, as
you'll see, the stepper tries to match constant strings as a whole
(instead of single characters which would be quite boring).
i
" checkbox toggles
between case-sensitive and case-insensitive matching. Note that the
"g
" ('global') modifier only affects
the replacement operation - it has no effect on
the match itself. If it's enabled other matches the engine would find
are highlighted in green in the target
pane, though.
Ctrl-s
(or Ctrl-x Ctrl-s
on Linux). The contents of these
two panes will also remain persistent between two invocations of
The Regex Coach.
Note: Due to the way Motif works, the file menu can't be used like this on
Linux. Instead you can use the Emacs key
sequences Ctrl-x Ctrl-w
and Ctrl-x i
.
No automatic scrolling occurs while the target pane has the input
focus.
aa...abb...b
" (with
enough characters inbetween) might fail while
"(?:aa...a)(?:bb...b)
" doesn't.
Also, there seem to be problems with Eastern European versions of Windows, specifically with "character set 1250" or similar. Sorry, I currently don't have the time and resources to investigate this any further.
If you encounter any other bugs or problems please send them to the mailing list.
It might be worthwhile to note that due to the dynamic nature of Lisp
The Regex Coach could be written without changing a single
line of code in the CL-PPCRE engine itself although the application
has to track information and query the engine while the
regular expressions is parsed and the scanners are built. All this
could be done 'after the fact' by using facilities like
defadvice
and :around
methods. Imagine
writing this application in Perl without touching Perl's regex
engine... :)
Also, thanks to LispWork's cross-platform CAPI toolkit the code for
the Windows and Linux versions is nearly identical without any
platform-specific parts (except for some lines regarding different
fonts and keybindings).
Brigitte Bovy from LispWorks ("Xanalys" at that time) support helped with the tricky interaction between the editor panes. I also got a couple of helpful tips from the Lispworks mailing list, specifically from Jeff Caldwell, John DeSoi, David Fox, and Nick Levine.
Thanks to the guys at "Café Olé" in Hamburg where I wrote most of the code.
Development of the The Regex Coach has been supported by Euphemismen.de.