Check if a Substring Exists in a String in Python
Five ways to check if a substring exists in a Python string: in operator, find(), index(), count(), and re.search(). Edge cases and KMP included.
Python’s in operator checks substring existence in one expression and handles most real-world use cases.
The in Operator: Python’s Default Substring Check
The in keyword is the idiomatic way to test whether a substring appears inside a string. It returns a boolean directly.
text = "Welcome to Python Programming"
substring = "Python"
if substring in text:
print("Found")
else:
print("Not found")
- Output:
Found
Three things worth noting:
- The check is case-sensitive.
"python" in textreturnsFalsebecause the capital P does not match. - An empty substring always returns
True:"" in "anything"evaluates toTrue. This is consistent with the formal definition that every string contains the empty string at every position. - Internally, CPython delegates
intostr.__contains__(), which calls the fastsearch implementation using a blend of Boyer-Moore and Horspool heuristics. For typical inputs, treat the cost as O(n) where n is the length of the haystack.
For a broader set of Python fundamentals tested in placement rounds, see the Python basic programs collection.
str.find() and str.index(): Getting the Position
When you need the position of the first occurrence, not just a yes/no answer, use str.find() or str.index().
text = "Welcome to Python Programming"
pos = text.find("Python")
print(pos) # 11
pos_missing = text.find("Java")
print(pos_missing) # -1
| Method | On match | On miss | Use when |
|---|---|---|---|
str.find(sub) | Returns lowest index | Returns -1 | Missing substring is a normal branch |
str.index(sub) | Returns lowest index | Raises ValueError | Missing substring is a bug |
Both accept optional start and end parameters to restrict the search window:
# Search only within positions 5 through 20
text.find("to", 5, 20) # 8
A common placement-test pattern asks you to find all occurrences of a substring. Loop with find():
def find_all(text, sub):
positions = []
start = 0
while True:
idx = text.find(sub, start)
if idx == -1:
break
positions.append(idx)
start = idx + 1 # move past this match
return positions
print(find_all("abcabcabc", "abc")) # [0, 3, 6]
str.count(): Counting Occurrences
str.count(sub) returns the number of non-overlapping occurrences of sub in the string.
text = "banana"
print(text.count("ana")) # 1, not 2
The result is 1 because count() does not detect overlapping matches. After finding “ana” at index 1, it resumes searching from index 4, missing the overlapping “ana” at index 3.
Counting overlapping occurrences
A sliding-window approach handles overlaps:
def count_overlapping(text, sub):
count = 0
start = 0
while True:
idx = text.find(sub, start)
if idx == -1:
break
count += 1
start = idx + 1
return count
print(count_overlapping("banana", "ana")) # 2
This is functionally identical to the find_all loop above, just returning a count instead of positions. For character-level classification problems (digits, uppercase, special characters), see character classification in Python.
Regex with re.search(): Pattern-Based Substring Detection
When the substring is not a fixed literal but a pattern (wildcards, character classes, optional segments), the re module is the right tool.
import re
text = "Order number: ORD-2026-4471"
# Check if text contains a pattern like ORD-YYYY-NNNN
match = re.search(r"ORD-\d{4}-\d{4}", text)
if match:
print(f"Found: {match.group()}") # Found: ORD-2026-4471
Case-insensitive search
text = "Python is great"
if re.search(r"python", text, re.IGNORECASE):
print("Match") # Match
Finding all overlapping matches
Use a lookahead inside re.finditer():
text = "banana"
overlaps = [m.start() for m in re.finditer(r"(?=ana)", text)]
print(overlaps) # [1, 3]
The lookahead (?=ana) asserts “ana” follows at this position without consuming characters, so the engine advances one character at a time and catches overlaps.
When regex is overkill
For fixed-literal checks, in is faster by a constant factor (no regex compilation overhead). Reserve re.search() for patterns that actually vary. For string ordering operations, see sorting a string alphabetically in Python.
KMP Algorithm: Interview-Level Substring Search
Placement interviews at product companies sometimes ask you to implement substring search without using built-in methods. The Knuth-Morris-Pratt (KMP) algorithm is the standard answer.
Why KMP exists
Naive substring search compares the pattern against every position in the text. Worst case: O(n * m) comparisons for text length n and pattern length m (think searching “aaaaab” inside “aaaaaaaaaaab”). KMP preprocesses the pattern into a prefix table (also called the failure function) that tells the algorithm how far to skip on a mismatch, guaranteeing O(n + m) worst-case.
Minimal implementation
def kmp_search(text, pattern):
n, m = len(text), len(pattern)
if m == 0:
return 0 # empty pattern matches at position 0
# Build prefix table
lps = [0] * m
length = 0
i = 1
while i < m:
if pattern[i] == pattern[length]:
length += 1
lps[i] = length
i += 1
elif length != 0:
length = lps[length - 1]
else:
lps[i] = 0
i += 1
# Search
i = j = 0
while i < n:
if text[i] == pattern[j]:
i += 1
j += 1
if j == m:
return i - j # match found
elif i < n and text[i] != pattern[j]:
if j != 0:
j = lps[j - 1]
else:
i += 1
return -1 # no match
print(kmp_search("abxabcabcaby", "abcaby")) # 6
When interviewers expect KMP
- The question explicitly says “do not use built-in string methods.”
- The question asks for O(n + m) time complexity.
- Pattern matching on very large inputs where worst-case matters (competitive programming).
For everyday Python scripts and placement coding rounds that allow built-ins, in or find() is the correct choice. KMP is an algorithm demonstration, not a production tool in Python. Knowing when to use each method is the real skill: in for existence checks, find() for position, re.search() for patterns, and KMP for interview whiteboard problems.
Edge Cases Worth Memorising
| Scenario | Behaviour |
|---|---|
Empty substring ("" in text) | Always True |
Empty text ("abc" in "") | Always False (unless substring is also empty) |
Case mismatch ("Py" in "python") | False (use .lower() on both for case-insensitive) |
| None as input | Raises TypeError (guard with if text is not None) |
Unicode ("cafe\u0301" vs "caf\u00e9") | Not equal by default; use unicodedata.normalize() before comparing |
Overlapping matches ("aaa".count("aa")) | Returns 1, not 2 (non-overlapping by design) |
Substring Detection in LLM Output Parsing
Every method covered above maps directly to a real task in LLM application code. Parsing structured output from a language model (checking whether a response contains a JSON block, extracting a tag via regex, validating that a safety prefix exists) is substring detection applied to model outputs. The re.search() pattern from the regex section is especially common for extracting structured data from free-text completions.
TinkerLLM’s exercises include prompt-output parsing tasks where you write exactly this kind of re.search() and in-based detection code against live model responses:
- Entry price: ₹299
- Format: browser-based exercises against a live LLM API
Primary sources
Frequently asked questions
Does the in operator work with lists or only strings?
The in operator works with any iterable in Python, including lists, tuples, sets, and dictionaries (checks keys). For strings specifically, it checks substring containment rather than character membership when the left operand is a multi-character string.
What happens if the substring is an empty string?
An empty string is always considered present in any string. Both 'abc'.__contains__('') and '' in 'abc' return True. This follows from the formal definition: every string contains the empty string at every position.
How do I do a case-insensitive substring check in Python?
The simplest approach is to lowercase both strings before checking: 'python' in text.lower(). For pattern-based checks, use re.search(pattern, text, re.IGNORECASE). The lower() approach is faster for literal substrings; regex is better when you need wildcards or character classes alongside case insensitivity.
What is the time complexity of Python's in operator for strings?
CPython uses a mix of Boyer-Moore and Horspool optimisations internally (implemented in fastsearch.h). Average case is O(n/m) for typical text, worst case is O(n*m) for pathological inputs. For most placement-test inputs, treat it as O(n).
When should I use find() instead of index()?
Use find() when a missing substring is a normal case you handle with an if-check (find returns -1). Use index() when a missing substring is an error that should raise an exception. In placement coding rounds, find() is more common because the question typically asks you to print a message rather than crash.
Can re.search() find overlapping matches?
re.search() finds only the first match. To find all overlapping matches, use re.finditer() with a lookahead pattern: re.finditer(r'(?=pattern)', text). Each match object gives the start position of an overlapping occurrence.
A self-paced playground for building with LLMs.
TinkerLLM is FACE Prep's sister property. A guided environment for shipping real LLM applications, the kind of project that earns a paragraph on your resume, not a line.
Try TinkerLLM (₹299 launch)