How to Match Up Until First Occurrence of Regex Pattern


Let’s see how we can match up until the first occurrence of a pattern in a regular expression.

Suppose we’re working with the text below.

abc! def!

Problem: greedy dot-star regex (.*)

Suppose we want to match abc!, so we naturally test the following regular expression:

(.*)!

But, this matches the entire line.

abc! def!

Solution 1: non-greedy dot-star regex (.*?)

In order to stop matching after the first occurrence of a pattern, we need to make our regular expression lazy, or non-greedy.

Inside the capture, we can include a question mark ?.

(.*?)!

This will match until the first occurrence of the succeeding pattern.

abc!

Adding a ? on any quantifier (?, * or +) will make it non-greedy. Keep in mind that this ? is only available in regex engines that implement Perl 5 extensions (e.g. Java, Python, Ruby). In traditional engines like awk, sed, and grep without -P, we’ll have to resort to the next method.

Solution 2: match all but exclude ([^abc]*)

Another way to avoid matching after the first occurrence is to exclude characters in the capture.

We can do this using the caret ^ inside a set of brackets [].

For instance, [^abc] will match any character except for a, b, and c.

Naturally, [^abc]* will match any number of characters excluding a, b, and c.

Inside the capture, we can exclude the succeeding pattern, which is the exclamation mark ! in this case.

([^!]*)!

This will match only the first occurrence of the pattern.

abc!

With most regex engines, [^!]* is likely faster than .*? since it does not need to look up the pattern after the current pattern. That being said, .*? is a more generic pattern than can be applied to any regular expression.