How to Check if a String Contains a Substring in Python


How can we check if a string contains some substring in Python?

For instance, how can we check if the string shih exists in the string shih tzu?

Let’s go over five methods of checking for substrings in Python.

Method 1: the in operator

This is probably the simplest method of checking for substrings.

'shih' in 'shih tzu' # True
'corgi' in 'shih tzu' # False

The in operator checks for membership in a collection (in this case, the string) by calling the object’s __contain__() method. We are essentially running the following:

'shih tzu'.__contains__('shih') # True

We can use this in an if statement as follows:

string = 'shih tzu'
substring = 'shih'
if substring in string:
   print(f'{substring} exists')

Method 2: the find method

The find method looks like this:

y.find(x)

This method will return an integer that indicates the index of the beginning of substring x in string y.

If x does not exist in y, then the method will return -1.

dog = 'shih tzu'
dog.find('shih')  # 0
dog.find('tzu')   # 5
dog.find('corgi') # -1

You can also limit your search to part of the entire string using this structure:

y.find(x, start_index, end_index)

This will first create a substring of the original string using start_index and end_index.

They are optional parameters. If not specified, the function searches the entire string.

dog = 'shih tzu'
dog[0:5] # 'shih '
dog.find('tzu', 0, 5) # -1
dog = 'shih tzu'
dog[3:8] # 'h tzu'
dog.find('tzu', 3, 8) # 5

If we’re just checking for the existence of a substring in another string, then we can just check for a -1:

string = 'shih tzu'
substring = 'shih'
if string.find(substring) != -1:
   print(f'{substring} exists')

Method 3: the index method

This method functions exactly the same as find (same parameters), but instead of returning -1, it will return a ValueError, so you would need to run this method inside a try-except-else block.

string = 'shih tzu'
substring = 'shih'
try:
   string.index(substring)
except ValueError:
   print(f'{substring} does not exist')
else:
   print(f'{substring} exists')

Method 4: the count method

The count method can be used to determine the number of times a substring appears in a string.

string = 'shih tzu'
substring = 'shih'
string.count(substring) # 1
string2 = 'shih shih tzu'
string2.count(substring) # 2

For existence, we can simply check that the number of occurrences is greater than zero.

string = 'shih tzu'
substring = 'shih'
if string.count(substring) > 0:
   print(f'{substring} exists')

Method 5: regex

The last method I’m going to present to you uses regular expressions.

Regex can do much more than check for the existence of a substring. Using complex matching functions, it can perform pattern checking, case-insensitive matching, and many more things.

Python has a built-in module just for regular expressions called re. This module contains a function called search that we can use for simple substring matching. While this works, it is wildly slow for this use-case, so be sure to read up on regex to understand its capabilities.

from re import search
string = 'shih tzu'
substring = 'shih'
if search(substring, string):
   print(f'{substring} exists')

Don’t Forget Empty Strings

Empty strings are also considered substrings of any string, so every method will indicate existence:

'' in 'shih tzu' # True
'shih tzu'.find('') # 0
'shih tzu'.index('') # 0
'shih tzu'.count('') # 9 = 1 + len(string)
search('', 'shih tzu') # <re.Match object; span=(0, 0), match=''>

Why You Should Always Use in

Even though you’ve just learned all these different methods, you should always use the in operater when checking for the existence of a string within another string.

It is significantly faster at checking for existence than all the other operations.

We can use the timeit module to compare the efficiency of each method. The method is fairly straight-forward to use. For example, suppose we wanted to time this statement:

'shih' in 'shih tzu' # True

We can import the timeit module and then run the statement we want to time inside quotes. The output gives us the time, in seconds, it takes to run the given statement one million times.

import timeit
timeit.timeit("'shih' in 'shih tzu'") # 0.18620452799950726

In our case, we want to run other lines (e.g. declare variables, import search for regex), but we don’t want timeit to time that variable creation. That’s when we can pass a statement into the second parameter, which will run before timeit starts timing the first statement. We will use the triple quotation mark notation to denote multi-line code.

setup_strings = """
from re import search
string = 'shih tzu'
substring = 'shih'
"""
import timeit
# 0.095994764000352
timeit.timeit('substring in string', setup_strings) # in
# 0.5236705630013603
timeit.timeit('string.find(substring)', setup_strings) # find
# 0.47110080900165485
timeit.timeit('string.index(substring)', setup_strings) # index
# 0.49605825699836714
timeit.timeit('string.count(substring)', setup_strings) # count
# 4.052273493995017
timeit.timeit('search(substring, string)', setup_strings) # regex

Over a million iterations, the in operator manages to outperform the other four methods.

Conclusion

So if you’re just checking if a string exists in another string, be sure to use the in operator.

That is all the in operator is good for, though. It won’t return indexes as find and index do.

It won’t return the number of occurrences as count does.

And it certainly is not as powerful as regex.

Just know your use-case and adapt.

Simple, enough 🙂