Regular Expressions — Beware Of (+)^✻

Two levels of optional repetition yield a power set of possibilities.

Brief Explanation

If a regular expression (RE) engine uses backtracking to emulate parallelism (an NFA in the jargon) it can suffer from exponential slowdown. This page graphs the exhaustive nesting that can make a perfectly logical calculation seem to crash.

Try It Yourself

The regular expression we use is very reasonable: a list of comma-separated numbers. How hard can it be? The problem arises with an unexpected non-comma, here a semicolon.

If there is a match, the black line just goes right across. If there is no match, the line shows the path a regex engine might take, with backtracks in red.

Regular Expression (it’s fixed for now)

([0-9]+|,)^✻

Click a or type some text here and hit the tab key.

Particular values

A single digit match.
A multiple-digit match.
A multiple-digit match with comma.
A non-match.
A longer non-match.
A compound non-match.
Variation.
Another variation.
More structure.
A long path to failure.
A longer path to failure.

Remarks

The problem arises with the nesting of two (or more) optional repetitions. Some RE engines (see below) will — quite reasonably — explore all the implications of this nesting. This results in examining all the possible ways a sequence (not set) can be divided into subsequences. For a sequence of N elements, there are 2^N-1 subsequences. For this reason, it may take longer than the known lifetime of the Universe to apply such an RE to a non-matching real-world source text.

This is an example more completely dissected in Jeffrey Friedl’s articles in The Perl Journal and in his monumental, magisterial book Mastering Regular Expressions. The behavior of the tool (e.g. awk, sed, Perl, Python, etc.) when confronted with such a RE can help you distinguish between three types of RE engines:

Fast on success, slow on failure: Traditional NFA (expect, grep, perl, python, sed)
Fast on both: DFA (egrep, awk, lex, flex)
Slow on both: POSIX NFA (mawk, MKS egrep, MKS lex, MKS awk)

Regular Expressions — Beware Of (+)✻

Brief Explanation

Try It Yourself

Particular values

Remarks

See Also

Regular Expressions — Beware Of (+)^✻