The Regex Coach - interactive regular expressions


 

Abstract

Made with LispThe Regex Coach is a graphical application for Windows which can be used to experiment with (Perl-compatible) regular expressions interactively. It has the following features:

 

Contents


 

Download and installation

The program hasn't changed since 2008 and this page is also essentially still the same. But I can confirm that in June 2023 the program still works fine for me on Windows 10.

The Regex Coach together with this documentation can be downloaded from http://weitz.de/files/regex-coach.exe. The current version is 0.9.2 - see the changelog for what's new. The file (an installer) is about 2MB in size.

You should use Windows 2000 or Windows XP with all updates and service packs installed. The program might work with older or unpatched Windows versions, but don't expect support for these configurations. See also below.

You also must have the Microsoft runtime library msvcr80.dll installed. If you don't have it or if you aren't sure, you can get it from http://www.microsoft.com/downloads/details.aspx?familyid=32BC1BEE-A3F9-4C13-9C99-220B62A191EE&displaylang=en.

If you have a previous version (0.8.5 or earlier) of The Regex Coach installed, uninstall it first before you install the new version! If you haven't done this, and the new application won't start, remove the file The Regex Coach.exe.manifest from the application directory.

Older versions, Linux, FreeBSD, Mac

Beginning with version 0.9.0, there will no longer be a Linux version of The Regex Coach - too few people were using it, and it's simply too much work for me to maintain both versions. You can still download the last (now unsupported) Linux release from http://weitz.de/files/regex-coach-0.8.5.tgz - it will also run on FreeBSD, see documentation.

If you have an older version of Windows and the current version of The Regex Coach doesn't work for you, you can try the last release which was built with LispWorks 4.4.6 - it is at http://weitz.de/files/regex-coach-0.8.5.exe. If that works for you - fine. Don't expect support or updates, though.

There is no Mac version and I have no plans to release one. Sending me email and begging for it won't change that. And, no, I don't want to open source the application or send the source code to you privately - no need to ask...

Jeremy Rayner has written a "homage" to The Regex Coach in Java - see here for more details.
 

License

The Regex Coach is Copyright © 2003-2008 Dr. Edmund Weitz - All Rights Reserved.

The Regex Coach is free for private or non-commercial use. The Regex Coach is also free for commercial use but you are not allowed to re-distribute it and/or charge money for it without written permission by the author - email me at edi@weitz.de for details.

The program is provided 'as is' with no warranty - use at your own risk.
 

How to use The Regex Coach

The Regex Coach enables you to try out the behaviour of Perl's regular expression operators (namely m//, s///, and split) interactively and in "real time", i.e. as soon as you make changes somewhere the results are instantly displayed. You can also query the regex engine about selected parts of your regular expression and watch how it parses your input.

Of course, this application should also be useful to programmers using Perl-compatible regex toolkits like PCRE (which is used by projects like Python, Apache, and PHP) or CL-PPCRE. Also, Java's regular expressions and those of XML Schema are very similar to Perl's.

The following descriptions will use the notions introduced by this annotated screenshot. The screenshot itself is an imagemap - click on any part of it to go directly to the relevant section of the docs.
 
Screen Shot Saving to and loading from files Single-stepping Splitting text Replacing text The parse tree The info pane Walking through the target string Narrowing the scan Modifiers Modifiers Walking through the target string Narrowing the scan The message areas The message areas The message areas The message areas The main panes The main panes The main panes The main panes Resizing Resizing Resizing The highlight messages The highlight messages The highlight buttons The highlight buttons The highlight buttons

The main panes

The main area of the application is inhabitated by two panes which are always visible. Both behave like simple editors, i.e. you can type text into them and modify it. You can also copy and paste text between these panes and other applications. On Windows, the keybindings resemble those of typical Windows editors, on Linux the keybindings are those of GNU Emacs. (If you have never used Emacs you might know a couple of these keybindings from the bash shell.) You can use the TAB key to switch between these editors. This will also cycle through the replacement pane if it's visible.

The upper pane is the regex pane. Here you'll type the regular expression you want to investigate.

The second pane is the target pane. Here you'll type the text (the target string) the regular expression will try to match.

If there's a match, the part of the target string that matched will be emphasized by a yellow background. (If you also check the 'g' modifier checkbox all matches will be emphasized - the "current" one in yellow, the others in green.)

The message areas

Both of the afore-mentioned panes have message areas directly below them. The regex message area is usually empty but it will show an error message in red letters if the regular expression isn't syntactically correct. It'll also show a warning in grey letters if the content of the regex pane ends with whitespace because this might not be what you want. You can of course ignore this warning if you typed the whitespace characters on purpose.

The target message area will show the extent of the match (or notify you that there isn't a match at all). This is particularly useful if there's a zero-length match because you won't see any highlighted characters in the target pane in this case. The message "Match from n to m" means that the characters starting from position n up to m (exclusively) belong to the match. The first character of the string is character 0 (zero) as usual.

Highlighting selected parts of the match

If there's a match you can highlight selected parts of the match which are shown in orange. The default setting is to reflect the selection you've made in the regex pane. It works like this: If you've selected a valid subexpression of the regular expression in the regex pane the corresponding part of the target string is shown in orange. You see an example in the screen shot above where the 'b' in the regular expression was selected which corresponds to the fourth 'b' in the target string.

If you've made an invalid selection the selection highlight button is disabled. You'll also see a message about your selection being invalid in the info pane.

If you have no idea what a "valid subexpression" of the regular expression could be consider the following rule of thumb: Every part of the regular expression which can be wrapped in a non-capturing group - i.e. with (?:...) - without altering the meaning of the expression is valid.

(A more precise description of this would be: Consider the parse tree of the regular expression and assume that every leaf of the tree which is a string is further divided into the single characters which together constitute the string. Now, every contiguous part of the regular expression which can be completely and exactly covered by nodes of the parse tree is a valid subexpression.)

The highlight buttons

Apart from highlighting the part of the target string which corresponds to the selected area in the regex pane you can also highlight the parts which correspond to captured register groups (enclosed by parentheses) in the regular expression. This is done by selecting one of the highlight buttons. These are only enabled if there are any captured registers.

Press the "nothing" button to disable highlighting.

The highlight messages

Each of the highlight buttons has a small highlight message associated with it (similar to the message area of the target pane) which shows which part would be highlighted if the corresponding button were selected. Again, this is particularly useful in the case of zero-length (sub-)matches.

Walking through the target string

Usually, the application will try to find the first match beginning from position 0 of the target string. You can use the scan buttons to move forward (or backward) one match at a time if there's more than one match. (This is how the Perl regex engine would behave in case of 'global' matches - i.e. those with a 'g' modifier - or if you apply the split operator.)

The headline above the scan buttons which usually says "Scan from 0" will change accordingly showing a message like "Scan #n from m" which means that the regex engine is trying to find the nth match starting at character m of the target string. The target message area will be changed as well - it'll say "Match #n from k to l" instead of "Match from k to l" (or it'll say "No further match" instead of "No match" if you've pressed the scan forward button too often).

Narrowing the scan

By using the border buttons you can narrow the scan to a part of the target string. This effectively hides characters from the start and/or end of the target string from the regex engine. The characters which are masked thusly are covered with a dark grey color in the target pane. Note that the effect of the scan buttons is reset by the border buttons.

The info pane

Choosing the "Info" tab will reveal the info pane which is an area where the application tries to explain what the regular expression is supposed to do in plain English. If you've selected a part of the regular expression only this part will be explained.

The parse tree

If you select the "Tree" tab you'll see a (simplified) graphical representation of the parse tree of the regular expression. This is how the regex engine "sees" the expression and it might help you to understand what's going on (or why the regular expression isn't interpreted as you intended it to be).

Replacing text

By choosing the "Replace" tab you'll open up an area with two panes. The first one includes a simple editor like the ones in the main panes. Here you can type a replacement string which acts like the second argument to Perl's s/// (substitution) operator. The second pane will show the result of the substitution. The contents of these panes are meaningless if the regular expression has syntactical errors.

Note that you'll have to use "\&", "\`", "\'" and "\n" instead of Perl's "$&", "$`", "$'" and "$n" - see the CL-PPCRE documentation for the gory details.

Splitting text

The "Split" tab will reveal a pane which shows the result of applying Perl's split operator to the target string. As this result is usually an array of strings the elements of this array are visually divided by vertical lines the size of a space character. (This implies that two vertical lines in a row denote that there's a zero-length string between them. And it also follows that the array has only one element if there's no vertical line at all.)

You can use the radio buttons below the pane to select another divider if the vertical line happens to be a part of your target string. But note that choosing the "block" option might significantly slow down the program if your target strings are long.

You can type a non-negative integer into the "Limit" field. This corresponds to the optional third argument to Perl's split operator.

Single-stepping through the matching process

Finally, the "Step" tab will lead you to two panes which have the same content as the two main panes. However, here you can watch the regex engine "at work". This is best explained with an example, so see the corresponding part of the tutorial.

Note that many of the optimizations done by the CL-PPCRE engine are turned off here for pedagogical reasons. (For example, when trying to match the regex a*abc against the target string aaaabd the "real" engine wouldn't even start because it'll first use a Boyer-Moore-Horspool search to check if the constant string abc is somewhere in the target.) Some of them remain, however: The engine will only try to match from position 0 if the regex starts with .* and is in single-line mode. Also, as you'll see, the stepper tries to match constant strings as a whole (instead of single characters which would be quite boring).

Modifiers

Pressing one of the modifier checkboxes is equivalent to using the corresponding modifier character in Perl. For example, the "i" checkbox toggles between case-sensitive and case-insensitive matching. Note that the "g" ('global') modifier only affects the replacement operation - it has no effect on the match itself. If it's enabled other matches the engine would find are highlighted in green in the target pane, though.

Resizing

You can resize the application window as usual by dragging the lower right corner. But you can also resize the panes relative to each other by dragging one of the resize dividers. These aren't visible in the Windows version but you'll note that the cursor changes if you position the mouse above them. There's also a resize divider between the two replacement panes. The Regex Coach will remember the size and position of its main window between two invocations.

Saving to and loading from files

If one of the two main panes has the focus you can - from the file menu - insert the contents of a text file into this pane or save the contents of this pane to disk. The latter can also be done by pressing Ctrl-s (or Ctrl-x Ctrl-s on Linux). The contents of these two panes will also remain persistent between two invocations of The Regex Coach.

Note: Due to the way Motif works, the file menu can't be used like this on Linux. Instead you can use the Emacs key sequences Ctrl-x Ctrl-w and Ctrl-x i.

Autoscroll

The Regex Coach has an Autoscroll feature which can be switched on and off via the corresponding menu. If Autoscroll is on, then each time the target string is parsed the scrollbar of the target pane will be moved such that the start (or end - depending on what you've chosen) of the match is visible more or less in the middle of the pane. If you've chosen to highlight specific parts of the match, then the scrollbar will move to the start or end of the highlighted region instead. This is of course only meaningful if the target string is too large to fit into the pane.

No automatic scrolling occurs while the target pane has the input focus.
 

Known bugs and limitations

The regex engine might give up with a stack overflow on relatively long regular expressions. (This will happen much earlier as with CL-PPCRE alone as the parsing process is interwoven with code specific to The Regex Coach.) Although maybe counter-intuitive, it might help to add some non-capturing groups, i.e. "aa...abb...b" (with enough characters inbetween) might fail while "(?:aa...a)(?:bb...b)" doesn't.

Also, there seem to be problems with Eastern European versions of Windows, specifically with "character set 1250" or similar. Sorry, I currently don't have the time and resources to investigate this any further.

If you encounter any other bugs or problems please send them to the mailing list.
 

Technical information

The Regex Coach is written in Common Lisp and was developed using the LispWorks development environment. The regex engine used is CL-PPCRE.

It might be worthwhile to note that due to the dynamic nature of Lisp The Regex Coach could be written without changing a single line of code in the CL-PPCRE engine itself although the application has to track information and query the engine while the regular expressions is parsed and the scanners are built. All this could be done 'after the fact' by using facilities like defadvice and :around methods. Imagine writing this application in Perl without touching Perl's regex engine... :)

Also, thanks to LispWork's cross-platform CAPI toolkit the code for the Windows and Linux versions is nearly identical without any platform-specific parts (except for some lines regarding different fonts and keybindings).
 

Compatibility with Perl

See the CL-PPCRE documentation.
 

Acknowledgements

The script to compile the Windows installer was kindly provided by Ian H. The icon for the Windows application was created by André Derouaux. The PNG included with the Linux distribution was contributed by John Troy Hurteau and is based on André's icon. The Lisp logo was designed by Manfred Spiller. Thanks to Alex Wood for RPM information. Thanks to Jim Prewett for FreeBSD info.

Brigitte Bovy from LispWorks ("Xanalys" at that time) support helped with the tricky interaction between the editor panes. I also got a couple of helpful tips from the Lispworks mailing list, specifically from Jeff Caldwell, John DeSoi, David Fox, and Nick Levine.

Thanks to the guys at "Café Olé" in Hamburg where I wrote most of the code.

Development of the The Regex Coach has been supported by Euphemismen.de.
 

 

Impressum, Datenschutzerklärung