AbstractThe Regex Coach is a graphical application for Windows which can be used to experiment with (Perl-compatible) regular expressions interactively. It has the following features:
- It shows whether a regular expression matches a particular target string.
- It can also show which parts of the target string correspond to captured register groups or to arbitrary parts of the regular expression.
- It can "walk" through the target string one match at a time.
- It can simulate Perl's
- It tries to describe the regular expression in plain English.
- It can show a graphical representation of the regular expression's parse tree.
- It can single-step through the matching process as performed by the regex engine.
- Everything happens in "real time", i.e. as soon as you make a change somewhere in the application all other parts are instantly updated.
The Regex Coach together with this documentation can be downloaded from http://weitz.de/files/regex-coach.exe. The current version is 0.9.2 - see the changelog for what's new. The file (an installer) is about 2MB in size.
You should use Windows 2000 or Windows XP with all updates and service packs installed. The program might work with older or unpatched Windows versions, but don't expect support for these configurations. See also below.
You also must have the Microsoft runtime
msvcr80.dll installed. If you don't have it or
if you aren't sure, you can get
it from http://www.microsoft.com/downloads/details.aspx?familyid=32BC1BEE-A3F9-4C13-9C99-220B62A191EE&displaylang=en.
If you have a previous version (0.8.5 or earlier) of The Regex
Coach installed, uninstall it first before you install
the new version! If you haven't done this, and the new application won't
start, remove the
The Regex Coach.exe.manifest from the
If you have an older version of Windows and the current version of The Regex Coach doesn't work for you, you can try the last release which was built with LispWorks 4.4.6 - it is at http://weitz.de/files/regex-coach-0.8.5.exe. If that works for you - fine. Don't expect support or updates, though.
There is no Mac version and I have no plans to release one. Sending me email and begging for it won't change that. And, no, I don't want to open source the application or send the source code to you privately - no need to ask...
Jeremy Rayner has written a "homage" to The Regex Coach in Java - see here for more details.
The Regex Coach is free for private or non-commercial use. The Regex Coach is also free for commercial use but you are not allowed to re-distribute it and/or charge money for it without written permission by the author - email me at firstname.lastname@example.org for details.
The program is provided 'as is' with no warranty - use at your own risk.
split) interactively and in "real time", i.e. as soon as you make changes somewhere the results are instantly displayed. You can also query the regex engine about selected parts of your regular expression and watch how it parses your input.
Of course, this application should also be useful to programmers using Perl-compatible regex toolkits like PCRE (which is used by projects like Python, Apache, and PHP) or CL-PPCRE. Also, Java's regular expressions and those of XML Schema are very similar to Perl's.
The following descriptions will use the notions introduced by this
annotated screenshot. The screenshot itself is an imagemap - click on
any part of it to go directly to the relevant section of the
GNU Emacs. (If you have never used Emacs you might know a couple of these keybindings from the
bashshell.) You can use the
TABkey to switch between these editors. This will also cycle through the replacement pane if it's visible.
The upper pane is the regex pane. Here you'll type the regular expression you want to investigate.
The second pane is the target pane. Here you'll type the text (the target string) the regular expression will try to match.
If there's a match, the part of the target string that matched will be
emphasized by a yellow background. (If you also check the
g' modifier checkbox
all matches will be emphasized - the "current" one in yellow, the
others in green.)
The target message area will show the extent of the match (or notify you that there isn't a match at all). This is particularly useful if there's a zero-length match because you won't see any highlighted characters in the target pane in this case. The message "Match from n to m" means that the characters starting from position n up to m (exclusively) belong to the match. The first character of the string is character 0 (zero) as usual.
b' in the regular expression was selected which corresponds to the fourth '
b' in the target string.
If you've made an invalid selection the selection highlight button is disabled. You'll also see a message about your selection being invalid in the info pane.
If you have no idea what a "valid subexpression" of the
regular expression could be consider the following rule of thumb:
Every part of the regular expression which can be wrapped in a
non-capturing group - i.e. with
(?:...) - without
altering the meaning of the expression is valid.
(A more precise description of this would be: Consider the parse tree of the regular expression and assume that every leaf of the tree which is a string is further divided into the single characters which together constitute the string. Now, every contiguous part of the regular expression which can be completely and exactly covered by nodes of the parse tree is a valid subexpression.)
Press the "nothing" button to disable highlighting.
g' modifier - or if you apply the
The headline above the scan buttons which usually says "Scan from 0" will change accordingly showing a message like "Scan #n from m" which means that the regex engine is trying to find the nth match starting at character m of the target string. The target message area will be changed as well - it'll say "Match #n from k to l" instead of "Match from k to l" (or it'll say "No further match" instead of "No match" if you've pressed the scan forward button too often).
s///(substitution) operator. The second pane will show the result of the substitution. The contents of these panes are meaningless if the regular expression has syntactical errors.
Note that you'll have to use "
\n" instead of Perl's
$'" and "
see the CL-PPCRE
documentation for the gory details.
splitoperator to the target string. As this result is usually an array of strings the elements of this array are visually divided by vertical lines the size of a space character. (This implies that two vertical lines in a row denote that there's a zero-length string between them. And it also follows that the array has only one element if there's no vertical line at all.)
You can use the radio buttons below the pane to select another divider if the vertical line happens to be a part of your target string. But note that choosing the "block" option might significantly slow down the program if your target strings are long.
You can type a non-negative integer into the "Limit"
field. This corresponds to the optional third argument to Perl's
Note that many of the optimizations done by the CL-PPCRE engine are turned off
here for pedagogical reasons. (For example, when trying to match the
a*abc against the target string
the "real" engine wouldn't even start because it'll first
use a Boyer-Moore-Horspool search to check if the constant string
abc is somewhere in the target.) Some of them remain,
however: The engine will only try to match from position 0 if the
regex starts with
.* and is in single-line mode. Also, as
you'll see, the stepper tries to match constant strings as a whole
(instead of single characters which would be quite boring).
i" checkbox toggles between case-sensitive and case-insensitive matching. Note that the "
g" ('global') modifier only affects the replacement operation - it has no effect on the match itself. If it's enabled other matches the engine would find are highlighted in green in the target pane, though.
Ctrl-x Ctrl-son Linux). The contents of these two panes will also remain persistent between two invocations of The Regex Coach.
Note: Due to the way Motif works, the file menu can't be used like this on
Linux. Instead you can use the Emacs key
No automatic scrolling occurs while the target pane has the input
aa...abb...b" (with enough characters inbetween) might fail while "
Also, there seem to be problems with Eastern European versions of Windows, specifically with "character set 1250" or similar. Sorry, I currently don't have the time and resources to investigate this any further.
If you encounter any other bugs or problems please send them to the mailing list.
It might be worthwhile to note that due to the dynamic nature of Lisp
The Regex Coach could be written without changing a single
line of code in the CL-PPCRE engine itself although the application
has to track information and query the engine while the
regular expressions is parsed and the scanners are built. All this
could be done 'after the fact' by using facilities like
:around methods. Imagine
writing this application in Perl without touching Perl's regex
Also, thanks to LispWork's cross-platform CAPI toolkit the code for
the Windows and Linux versions is nearly identical without any
platform-specific parts (except for some lines regarding different
fonts and keybindings).
Brigitte Bovy from LispWorks ("Xanalys" at that time) support helped with the tricky interaction between the editor panes. I also got a couple of helpful tips from the Lispworks mailing list, specifically from Jeff Caldwell, John DeSoi, David Fox, and Nick Levine.
Thanks to the guys at "Café Olé" in Hamburg where I wrote most of the code.
Development of the The Regex Coach has been supported by Euphemismen.de.