Placement Prep

Regular Expressions in Python: Complete Guide

Python's re module explained: metacharacters, special sequences, and pattern-matching methods. Includes worked examples for phone validation and email extraction.

By FACE Prep Team 5 min read
python regex re-module string-processing pattern-matching placement-coding python-programs

Python’s re module turns what would be 15 lines of character-by-character loop logic into a single expression, and pattern matching shows up consistently in placement coding rounds at TCS, Infosys, and Wipro.

Before writing any code, the mechanics: a regular expression is a string that describes a pattern. The re module is part of Python’s standard library, available without any additional installation. It gives you functions to search, extract, and replace text based on that pattern. You’ll realistically use five of them.

What Metacharacters Do

Every regex pattern is made of two kinds of characters: literals (the letter a matches only a) and metacharacters (special symbols that match broader classes). The table below covers the ones that appear most often in placement problems.

SymbolMatchesExample
.Any character except newlineh.llo matches hello
^Start of string^hello matches hello world
$End of stringworld$ matches hello world
*0 or more repetitionsab* matches a, ab, abb
+1 or more repetitionsab+ matches ab, abb
?0 or 1 repetitionab? matches a or ab
[]Character class, any one inside brackets[aeiou] matches any vowel
|Either/orcat|dog matches cat or dog
()Grouping for capture or alternation(ab)+ matches ab, abab

A common placement-round question uses this exact structure: find names that start with h, end with i, and are exactly six characters long. The pattern r'^h....i$' handles it. The ^ anchors to the start, four dots match any middle four characters, and i$ locks the end.

Special Sequences

Special sequences begin with a backslash and match an entire category of characters. These are the ones worth knowing before any placement round.

SequenceMatchesExample
\dAny digit 0-9\d+ matches 123
\DAny non-digit\D+ matches abc
\wLetters, digits, and _\w+ matches hello_123
\WNon-alphanumeric\W+ matches @#
\sWhitespace (space, tab, newline)\s+ matches
\SNon-whitespace\S+ matches hello
\bWord boundary\bword\b matches word but not sword

The Python re module documentation lists the full set of supported sequences. The table above covers what placement-level problems actually test.

Raw Strings

Always write patterns as raw strings: r'\d+' rather than '\\d+'. Without the r prefix, Python’s string parser interprets \d before the re module ever sees it, turning it into an unrecognized escape that silently changes meaning. Raw strings pass the backslashes through unchanged. Every pattern in this article uses the r prefix.

Five re Module Methods

Python’s re module ships with several pattern-matching functions. Five carry most of the work.

re.match()

re.match() checks whether the pattern matches at the beginning of the string only. If the string starts differently, it returns None.

import re

pattern = r'^h....i$'
print(re.match(pattern, 'harini'))   # Match object
print(re.match(pattern, 'shalini'))  # None

re.search()

re.search() scans from left to right and returns the first match anywhere in the string, not just the start. Use it when you don’t care where in the string the match sits.

import re

result = re.search(r'\d+', 'Roll number: 2024001')
if result:
    print(result.group())  # '2024001'

re.findall()

re.findall() returns a list of every non-overlapping match. If the pattern contains groups, each item in the list is a tuple of the groups.

import re

text = 'Exam scores: 78, 92, 65, 88'
scores = re.findall(r'\d+', text)
print(scores)  # ['78', '92', '65', '88']

re.split()

re.split() divides a string wherever the pattern matches, similar to str.split() but using a regex instead of a fixed delimiter.

import re

result = re.split(r'\d+', 'FACE10Prep3Python')
print(result)  # ['FACE', 'Prep', 'Python']

re.sub()

re.sub() replaces every match of the pattern with a replacement string. Useful for cleaning messy input.

import re

cleaned = re.sub(r'\s+', ' ', 'too   many    spaces')
print(cleaned)  # 'too many spaces'

Placement-Ready Worked Examples

These three patterns appear in variations across placement coding rounds.

Validate a 10-Digit Phone Number

An Indian mobile number starts with 6, 7, 8, or 9, followed by nine more digits.

import re

def is_valid_phone(number):
    pattern = r'^[6-9]\d{9}$'
    return bool(re.match(pattern, number))

print(is_valid_phone('9876543210'))  # True
print(is_valid_phone('1234567890'))  # False

[6-9] matches the first digit, \d{9} matches exactly nine more, and $ ensures nothing follows. The bool() call converts the match object (or None) to True/False.

Extract All Email Addresses from Text

import re

text = 'Contact us at [email protected] or [email protected]'
emails = re.findall(r'[\w.-]+@[\w.-]+\.\w{2,4}', text)
print(emails)  # ['[email protected]', '[email protected]']

Check for Repeated Consecutive Characters

import re

def has_repeated_char(s):
    return bool(re.search(r'(.)\1', s))

print(has_repeated_char('aabbcc'))  # True
print(has_repeated_char('abcdef'))  # False

(.) captures any single character into group 1; \1 is a backreference that must match the same character immediately after. If any two identical characters sit side by side, the pattern matches.

Compiled Patterns

If the same pattern runs inside a loop, call re.compile() once outside the loop and reuse the compiled object. According to the Python HOWTO for regular expressions, this avoids re-parsing the pattern string on every iteration.

import re

phone_re = re.compile(r'^[6-9]\d{9}$')

numbers = ['9876543210', '8765432109', '1234567890']
valid = [n for n in numbers if phone_re.match(n)]
print(valid)  # ['9876543210', '8765432109']

For a script that runs once against a handful of strings, re.compile() adds nothing visible. For a data-processing pipeline that validates thousands of records, the difference is measurable.

When Plain String Methods Work Better

A common mistake in placement coding: reaching for regex when a simpler built-in does the job faster and reads more clearly.

  • Checking whether a string starts with a prefix: s.startswith('http') is cleaner than re.match(r'^http', s).
  • Splitting on a fixed character: 'a,b,c'.split(',') outperforms re.split(r',', 'a,b,c').
  • Checking substring membership: 'abc' in s beats re.search(r'abc', s) on both clarity and speed.
  • Testing whether a single character is a digit, letter, or symbol: Python’s built-in isdigit(), isupper(), and isalpha() cover it without a pattern. The article on checking character type in Python walks through all four methods alongside their regex equivalents.

Use regex when the pattern can’t be expressed as a fixed string, when multiple conditions need to compose (first digit must be 6-9, followed by exactly nine more digits), or when you need to capture groups from text. For sorting a string alphabetically in Python, sorted() requires no pattern at all. If you’re still building the foundational string-manipulation skills that make regex approachable, the Python basic programs collection is a practical starting point.

The pattern-extraction thinking you’ve built here (isolating structured signals from noisy text using formal rules) transfers directly to how LLMs parse and respond to prompts. Regex says “match \d{10} at position 0”; a well-structured prompt says “extract the 10-digit number from this input.” If you want to push that connection further, TinkerLLM at ₹299 gives you a hands-on environment where that bridge is concrete rather than theoretical.

Primary sources

Frequently asked questions

What is the difference between re.match() and re.search() in Python?

re.match() checks for a pattern only at the beginning of a string and returns None if the string does not start with the pattern. re.search() scans the entire string and returns the first match found anywhere in it.

Why do Python regex patterns use raw strings like r'\d+'?

Raw strings tell Python not to interpret backslashes as escape sequences, so \d stays as two characters (backslash + d) rather than becoming a special character. Without the r prefix you would have to double every backslash: '\\d+'.

How do I match one or more digits in a Python string?

Use the pattern r'\d+' with re.search() or re.findall(). For example, re.findall(r'\d+', 'Order 12 or 345') returns ['12', '345'].

What does re.compile() do in Python?

re.compile() converts a pattern string into a compiled regex object. Using it avoids re-parsing the same pattern in every loop iteration, which matters when the same pattern is applied to thousands of strings.

Can I validate an email address using Python regex?

A basic validation pattern checks the structural format of an email address. For production systems, dedicated email validation libraries handle more edge cases than a custom pattern.

Build AI projects

A self-paced playground for building with LLMs.

TinkerLLM is FACE Prep's sister property. A guided environment for shipping real LLM applications, the kind of project that earns a paragraph on your resume, not a line.

Try TinkerLLM (₹299 launch)
Free AI Roadmap PDF