How to Match Up Until First Occurrence of Regex Pattern
Let’s see how we can match up until the first occurrence of a pattern in a regular expression.
Suppose we’re working with the text below.
abc! def!
Problem: greedy dot-star regex (.*)
Suppose we want to match abc!, so we naturally test the following regular expression:
(.*)!
But, this matches the entire line.
abc! def!
Solution 1: non-greedy dot-star regex (.*?)
In order to stop matching after the first occurrence of a pattern, we need to make our regular expression lazy, or non-greedy.
Inside the capture, we can include a question mark ?.
(.*?)!
This will match until the first occurrence of the succeeding pattern.
abc!
Adding a
?on any quantifier (?,*or+) will make it non-greedy. Keep in mind that this?is only available in regex engines that implement Perl 5 extensions (e.g. Java, Python, Ruby). In traditional engines likeawk,sed, andgrepwithout-P, we’ll have to resort to the next method.
Solution 2: match all but exclude ([^abc]*)
Another way to avoid matching after the first occurrence is to exclude characters in the capture.
We can do this using the caret ^ inside a set of brackets [].
For instance, [^abc] will match any character except for a, b, and c.
Naturally, [^abc]* will match any number of characters excluding a, b, and c.
Inside the capture, we can exclude the succeeding pattern, which is the exclamation mark ! in this case.
([^!]*)!
This will match only the first occurrence of the pattern.
abc!
With most regex engines,
[^!]*is likely faster than.*?since it does not need to look up the pattern after the current pattern. That being said,.*?is a more generic pattern than can be applied to any regular expression.