How to Check if a String Contains a Substring in Python
How can we check if a string contains some substring in Python?
For instance, how can we check if the string shih
exists in the string shih tzu
?
Let’s go over five methods of checking for substrings in Python.
Method 1: the in
operator
This is probably the simplest method of checking for substrings.
'shih' in 'shih tzu' # True
'corgi' in 'shih tzu' # False
The in
operator checks for membership in a collection (in this case, the string) by calling the object’s __contain__()
method. We are essentially running the following:
'shih tzu'.__contains__('shih') # True
We can use this in an if statement as follows:
string = 'shih tzu'
substring = 'shih'
if substring in string:
print(f'{substring} exists')
Method 2: the find
method
The find method looks like this:
y.find(x)
This method will return an integer that indicates the index of the beginning of substring x
in string y
.
If x
does not exist in y
, then the method will return -1
.
dog = 'shih tzu'
dog.find('shih') # 0
dog.find('tzu') # 5
dog.find('corgi') # -1
You can also limit your search to part of the entire string using this structure:
y.find(x, start_index, end_index)
This will first create a substring of the original string using start_index
and end_index
.
They are optional parameters. If not specified, the function searches the entire string.
dog = 'shih tzu'
dog[0:5] # 'shih '
dog.find('tzu', 0, 5) # -1
dog = 'shih tzu'
dog[3:8] # 'h tzu'
dog.find('tzu', 3, 8) # 5
If we’re just checking for the existence of a substring in another string, then we can just check for a -1
:
string = 'shih tzu'
substring = 'shih'
if string.find(substring) != -1:
print(f'{substring} exists')
Method 3: the index
method
This method functions exactly the same as find (same parameters), but instead of returning -1
, it will return a ValueError
, so you would need to run this method inside a try-except-else
block.
string = 'shih tzu'
substring = 'shih'
try:
string.index(substring)
except ValueError:
print(f'{substring} does not exist')
else:
print(f'{substring} exists')
Method 4: the count
method
The count method can be used to determine the number of times a substring appears in a string.
string = 'shih tzu'
substring = 'shih'
string.count(substring) # 1
string2 = 'shih shih tzu'
string2.count(substring) # 2
For existence, we can simply check that the number of occurrences is greater than zero.
string = 'shih tzu'
substring = 'shih'
if string.count(substring) > 0:
print(f'{substring} exists')
Method 5: regex
The last method I’m going to present to you uses regular expressions.
Regex can do much more than check for the existence of a substring. Using complex matching functions, it can perform pattern checking, case-insensitive matching, and many more things.
Python has a built-in module just for regular expressions called re
. This module contains a function called search that we can use for simple substring matching. While this works, it is wildly slow for this use-case, so be sure to read up on regex to understand its capabilities.
from re import search
string = 'shih tzu'
substring = 'shih'
if search(substring, string):
print(f'{substring} exists')
Don’t Forget Empty Strings
Empty strings are also considered substrings of any string, so every method will indicate existence:
'' in 'shih tzu' # True
'shih tzu'.find('') # 0
'shih tzu'.index('') # 0
'shih tzu'.count('') # 9 = 1 + len(string)
search('', 'shih tzu') # <re.Match object; span=(0, 0), match=''>
Why You Should Always Use in
Even though you’ve just learned all these different methods, you should always use the in
operater when checking for the existence of a string within another string.
It is significantly faster at checking for existence than all the other operations.
We can use the timeit module to compare the efficiency of each method. The method is fairly straight-forward to use. For example, suppose we wanted to time this statement:
'shih' in 'shih tzu' # True
We can import the timeit module and then run the statement we want to time inside quotes. The output gives us the time, in seconds, it takes to run the given statement one million times.
import timeit
timeit.timeit("'shih' in 'shih tzu'") # 0.18620452799950726
In our case, we want to run other lines (e.g. declare variables, import search for regex), but we don’t want timeit
to time that variable creation. That’s when we can pass a statement into the second parameter, which will run before timeit
starts timing the first statement. We will use the triple quotation mark notation to denote multi-line code.
setup_strings = """
from re import search
string = 'shih tzu'
substring = 'shih'
"""
import timeit
# 0.095994764000352
timeit.timeit('substring in string', setup_strings) # in
# 0.5236705630013603
timeit.timeit('string.find(substring)', setup_strings) # find
# 0.47110080900165485
timeit.timeit('string.index(substring)', setup_strings) # index
# 0.49605825699836714
timeit.timeit('string.count(substring)', setup_strings) # count
# 4.052273493995017
timeit.timeit('search(substring, string)', setup_strings) # regex
Over a million iterations, the in
operator manages to outperform the other four methods.
Conclusion
So if you’re just checking if a string exists in another string, be sure to use the in
operator.
That is all the in
operator is good for, though. It won’t return indexes as find
and index
do.
It won’t return the number of occurrences as count
does.
And it certainly is not as powerful as regex
.
Just know your use-case and adapt.
Simple, enough 🙂