Placement Prep

Check if a Substring Exists in a String in Python

Five ways to check if a substring exists in a Python string: in operator, find(), index(), count(), and re.search(). Edge cases and KMP included.

By FACE Prep Team 5 min read
python string-methods substring placement-prep coding-questions regex

Python’s in operator checks substring existence in one expression and handles most real-world use cases.

The in Operator: Python’s Default Substring Check

The in keyword is the idiomatic way to test whether a substring appears inside a string. It returns a boolean directly.

text = "Welcome to Python Programming"
substring = "Python"

if substring in text:
    print("Found")
else:
    print("Not found")
  • Output: Found

Three things worth noting:

  • The check is case-sensitive. "python" in text returns False because the capital P does not match.
  • An empty substring always returns True: "" in "anything" evaluates to True. This is consistent with the formal definition that every string contains the empty string at every position.
  • Internally, CPython delegates in to str.__contains__(), which calls the fastsearch implementation using a blend of Boyer-Moore and Horspool heuristics. For typical inputs, treat the cost as O(n) where n is the length of the haystack.

For a broader set of Python fundamentals tested in placement rounds, see the Python basic programs collection.

str.find() and str.index(): Getting the Position

When you need the position of the first occurrence, not just a yes/no answer, use str.find() or str.index().

text = "Welcome to Python Programming"

pos = text.find("Python")
print(pos)  # 11

pos_missing = text.find("Java")
print(pos_missing)  # -1
MethodOn matchOn missUse when
str.find(sub)Returns lowest indexReturns -1Missing substring is a normal branch
str.index(sub)Returns lowest indexRaises ValueErrorMissing substring is a bug

Both accept optional start and end parameters to restrict the search window:

# Search only within positions 5 through 20
text.find("to", 5, 20)  # 8

A common placement-test pattern asks you to find all occurrences of a substring. Loop with find():

def find_all(text, sub):
    positions = []
    start = 0
    while True:
        idx = text.find(sub, start)
        if idx == -1:
            break
        positions.append(idx)
        start = idx + 1  # move past this match
    return positions

print(find_all("abcabcabc", "abc"))  # [0, 3, 6]

str.count(): Counting Occurrences

str.count(sub) returns the number of non-overlapping occurrences of sub in the string.

text = "banana"
print(text.count("ana"))  # 1, not 2

The result is 1 because count() does not detect overlapping matches. After finding “ana” at index 1, it resumes searching from index 4, missing the overlapping “ana” at index 3.

Counting overlapping occurrences

A sliding-window approach handles overlaps:

def count_overlapping(text, sub):
    count = 0
    start = 0
    while True:
        idx = text.find(sub, start)
        if idx == -1:
            break
        count += 1
        start = idx + 1
    return count

print(count_overlapping("banana", "ana"))  # 2

This is functionally identical to the find_all loop above, just returning a count instead of positions. For character-level classification problems (digits, uppercase, special characters), see character classification in Python.

Regex with re.search(): Pattern-Based Substring Detection

When the substring is not a fixed literal but a pattern (wildcards, character classes, optional segments), the re module is the right tool.

import re

text = "Order number: ORD-2026-4471"

# Check if text contains a pattern like ORD-YYYY-NNNN
match = re.search(r"ORD-\d{4}-\d{4}", text)
if match:
    print(f"Found: {match.group()}")  # Found: ORD-2026-4471
text = "Python is great"
if re.search(r"python", text, re.IGNORECASE):
    print("Match")  # Match

Finding all overlapping matches

Use a lookahead inside re.finditer():

text = "banana"
overlaps = [m.start() for m in re.finditer(r"(?=ana)", text)]
print(overlaps)  # [1, 3]

The lookahead (?=ana) asserts “ana” follows at this position without consuming characters, so the engine advances one character at a time and catches overlaps.

When regex is overkill

For fixed-literal checks, in is faster by a constant factor (no regex compilation overhead). Reserve re.search() for patterns that actually vary. For string ordering operations, see sorting a string alphabetically in Python.

Placement interviews at product companies sometimes ask you to implement substring search without using built-in methods. The Knuth-Morris-Pratt (KMP) algorithm is the standard answer.

Why KMP exists

Naive substring search compares the pattern against every position in the text. Worst case: O(n * m) comparisons for text length n and pattern length m (think searching “aaaaab” inside “aaaaaaaaaaab”). KMP preprocesses the pattern into a prefix table (also called the failure function) that tells the algorithm how far to skip on a mismatch, guaranteeing O(n + m) worst-case.

Minimal implementation

def kmp_search(text, pattern):
    n, m = len(text), len(pattern)
    if m == 0:
        return 0  # empty pattern matches at position 0

    # Build prefix table
    lps = [0] * m
    length = 0
    i = 1
    while i < m:
        if pattern[i] == pattern[length]:
            length += 1
            lps[i] = length
            i += 1
        elif length != 0:
            length = lps[length - 1]
        else:
            lps[i] = 0
            i += 1

    # Search
    i = j = 0
    while i < n:
        if text[i] == pattern[j]:
            i += 1
            j += 1
        if j == m:
            return i - j  # match found
        elif i < n and text[i] != pattern[j]:
            if j != 0:
                j = lps[j - 1]
            else:
                i += 1
    return -1  # no match

print(kmp_search("abxabcabcaby", "abcaby"))  # 6

When interviewers expect KMP

  • The question explicitly says “do not use built-in string methods.”
  • The question asks for O(n + m) time complexity.
  • Pattern matching on very large inputs where worst-case matters (competitive programming).

For everyday Python scripts and placement coding rounds that allow built-ins, in or find() is the correct choice. KMP is an algorithm demonstration, not a production tool in Python. Knowing when to use each method is the real skill: in for existence checks, find() for position, re.search() for patterns, and KMP for interview whiteboard problems.

Edge Cases Worth Memorising

ScenarioBehaviour
Empty substring ("" in text)Always True
Empty text ("abc" in "")Always False (unless substring is also empty)
Case mismatch ("Py" in "python")False (use .lower() on both for case-insensitive)
None as inputRaises TypeError (guard with if text is not None)
Unicode ("cafe\u0301" vs "caf\u00e9")Not equal by default; use unicodedata.normalize() before comparing
Overlapping matches ("aaa".count("aa"))Returns 1, not 2 (non-overlapping by design)

Substring Detection in LLM Output Parsing

Every method covered above maps directly to a real task in LLM application code. Parsing structured output from a language model (checking whether a response contains a JSON block, extracting a tag via regex, validating that a safety prefix exists) is substring detection applied to model outputs. The re.search() pattern from the regex section is especially common for extracting structured data from free-text completions.

TinkerLLM’s exercises include prompt-output parsing tasks where you write exactly this kind of re.search() and in-based detection code against live model responses:

  • Entry price: ₹299
  • Format: browser-based exercises against a live LLM API

Primary sources

Frequently asked questions

Does the in operator work with lists or only strings?

The in operator works with any iterable in Python, including lists, tuples, sets, and dictionaries (checks keys). For strings specifically, it checks substring containment rather than character membership when the left operand is a multi-character string.

What happens if the substring is an empty string?

An empty string is always considered present in any string. Both 'abc'.__contains__('') and '' in 'abc' return True. This follows from the formal definition: every string contains the empty string at every position.

How do I do a case-insensitive substring check in Python?

The simplest approach is to lowercase both strings before checking: 'python' in text.lower(). For pattern-based checks, use re.search(pattern, text, re.IGNORECASE). The lower() approach is faster for literal substrings; regex is better when you need wildcards or character classes alongside case insensitivity.

What is the time complexity of Python's in operator for strings?

CPython uses a mix of Boyer-Moore and Horspool optimisations internally (implemented in fastsearch.h). Average case is O(n/m) for typical text, worst case is O(n*m) for pathological inputs. For most placement-test inputs, treat it as O(n).

When should I use find() instead of index()?

Use find() when a missing substring is a normal case you handle with an if-check (find returns -1). Use index() when a missing substring is an error that should raise an exception. In placement coding rounds, find() is more common because the question typically asks you to print a message rather than crash.

Can re.search() find overlapping matches?

re.search() finds only the first match. To find all overlapping matches, use re.finditer() with a lookahead pattern: re.finditer(r'(?=pattern)', text). Each match object gives the start position of an overlapping occurrence.

Build AI projects

A self-paced playground for building with LLMs.

TinkerLLM is FACE Prep's sister property. A guided environment for shipping real LLM applications, the kind of project that earns a paragraph on your resume, not a line.

Try TinkerLLM (₹299 launch)
Free AI Roadmap PDF