How to Match Up Until First Occurrence of Regex Pattern
Let’s see how we can match up until the first occurrence of a pattern in a regular expression.
Suppose we’re working with the text below.
abc! def!
Problem: greedy dot-star regex (.*
)
Suppose we want to match abc!
, so we naturally test the following regular expression:
(.*)!
But, this matches the entire line.
abc! def!
Solution 1: non-greedy dot-star regex (.*?
)
In order to stop matching after the first occurrence of a pattern, we need to make our regular expression lazy, or non-greedy.
Inside the capture, we can include a question mark ?
.
(.*?)!
This will match until the first occurrence of the succeeding pattern.
abc!
Adding a
?
on any quantifier (?
,*
or+
) will make it non-greedy. Keep in mind that this?
is only available in regex engines that implement Perl 5 extensions (e.g. Java, Python, Ruby). In traditional engines likeawk
,sed
, andgrep
without-P
, we’ll have to resort to the next method.
Solution 2: match all but exclude ([^abc]*
)
Another way to avoid matching after the first occurrence is to exclude characters in the capture.
We can do this using the caret ^
inside a set of brackets []
.
For instance, [^abc]
will match any character except for a
, b
, and c
.
Naturally, [^abc]*
will match any number of characters excluding a
, b
, and c
.
Inside the capture, we can exclude the succeeding pattern, which is the exclamation mark !
in this case.
([^!]*)!
This will match only the first occurrence of the pattern.
abc!
With most regex engines,
[^!]*
is likely faster than.*?
since it does not need to look up the pattern after the current pattern. That being said,.*?
is a more generic pattern than can be applied to any regular expression.