Remove All Non-Alphabet Characters from a String in Python
Three Python methods to strip digits, spaces, and symbols from a string, keeping only letters. Covers isalpha(), regex, and translate() with worked examples.
Python’s str.isalpha() method keeps only alphabetic characters from a string in one line of code, with no imports.
That is the baseline. This article covers three approaches: isalpha() for readability, re.sub() for compactness, and str.translate() when input strings are large. All three produce the same output on standard ASCII text. They differ in Unicode handling and performance on long inputs, which matters more than most tutorials mention.
Why strip non-alphabet characters
String manipulation is a standard topic in placement coding tests. CoCubes, AMCAT, and on-campus written rounds include tasks where you receive a mixed string, letters alongside digits and symbols, and must extract only the alphabetic portion. The problem tests whether you know Python’s built-in string methods, whether you can write a correct list comprehension, and whether you understand the difference between isalpha(), isalnum(), and isdigit().
Outside placement tests, the same pattern appears in data preprocessing and input validation. A form field that should accept only a name, a text column in a dataset that contains stale punctuation, a file header with embedded version numbers: stripping non-alphabet characters is a routine first step in all three scenarios.
Before writing the filter, it helps to understand the underlying classification. The article on checking whether a character is uppercase, lowercase, a digit, or a special character covers the building blocks that isalpha() uses internally. Reading that first makes the filter logic clearer.
Method 1: isalpha() and a list comprehension
str.isalpha() returns True if every character in the string is alphabetic, False otherwise. Called on a single character, it is the cleanest test for “is this a letter.”
def keep_only_alpha(s):
return ''.join(char for char in s if char.isalpha())
user_input = "asdfg1326%^$hjk"
print(keep_only_alpha(user_input))
# Output: asdfghjk
The generator expression inside join() iterates over every character in s and passes only the alphabetic ones through. No intermediate list is created. The generator feeds join() directly, so memory use stays minimal even when s is long.
One detail worth knowing: Python’s isalpha() is Unicode-aware. The character é returns True from isalpha(), as does ñ. If your string could contain accented characters or non-Latin scripts, isalpha() includes them by default. The regex approach in Method 2 does not.
For placement test submissions, this is the expected approach. Interviewers recognise it as idiomatic Python, and it handles edge cases, empty strings and single-character inputs, correctly without added checks.
Keeping spaces alongside letters
The standard filter removes spaces because a space is not alphabetic. If the task asks you to keep spaces, extend the condition:
def keep_alpha_and_spaces(s):
return ''.join(char for char in s if char.isalpha() or char == ' ')
print(keep_alpha_and_spaces("Hello, World 123"))
# Output: Hello World
This is a common variant in placement rounds and data cleaning tasks.
Method 2: re.sub() with a regex pattern
For ASCII-only strings, re.sub() with the pattern [^a-zA-Z] is the most compact version. It replaces every non-letter character with an empty string in a single call.
import re
def keep_only_alpha_regex(s):
return re.sub(r'[^a-zA-Z]', '', s)
user_input = "Hello, World 123"
print(keep_only_alpha_regex(user_input))
# Output: HelloWorld
The pattern [^a-zA-Z] means “any character that is NOT an ASCII letter.” The re module is part of Python’s standard library, so no third-party package is needed.
The trade-off: [^a-zA-Z] is ASCII-only. The character é would be removed because it falls outside the a-z and A-Z ranges. For English-only data, this is rarely a problem. For multilingual text, isalpha() from Method 1 is the better choice.
In coding rounds that specify “ASCII input only,” the regex approach is fully acceptable. In data preprocessing contexts, verify your data range before committing to this method.
Method 3: str.translate() for large strings
str.translate() works by building a deletion table once and then running the lookup at C-level iteration speed. For strings of tens of thousands of characters, it is measurably faster than a Python-level character-by-character loop, because the inner loop runs in compiled C rather than interpreted Python bytecode.
import string
def keep_only_alpha_translate(s):
# Build deletion table: every printable ASCII char that is not a letter
delete_chars = ''.join(
set(string.printable) - set(string.ascii_letters)
)
table = str.maketrans('', '', delete_chars)
return s.translate(table)
user_input = "asdfg1326%^$hjk"
print(keep_only_alpha_translate(user_input))
# Output: asdfghjk
str.maketrans('', '', delete_chars) creates a translation table where the third argument lists characters to delete. str.translate() applies that table to the input string. The table is built once per function call; if you call the function in a tight loop, move the table construction outside for maximum speed.
Like the regex method, this approach targets ASCII-printable characters. Characters outside string.printable pass through unchanged. For placement tests with well-defined ASCII input, that is not a concern. For production text-processing pipelines, isalpha() is the safer default unless you have measured a real performance bottleneck.
Comparing the three methods
| Method | Import | Unicode-aware | Best for |
|---|---|---|---|
isalpha() + generator | None | Yes | General use, placement tests, multilingual data |
re.sub('[^a-zA-Z]', ...) | re | No | Compact one-liner, ASCII-only inputs |
str.translate() | string | No | Large strings where iteration speed matters |
All three produce “asdfghjk” from “asdfg1326%^$hjk”. The choice depends on two questions: does your input contain non-ASCII letters, and how large are the strings?
For practice problems in placement rounds, isalpha() with a generator expression is the first choice. It is readable, correct for all Unicode input, and requires no import. Once you have a clean alphabetic string, the natural follow-up is to sort it: the article on sorting a string alphabetically in Python covers that next step directly.
For a broader set of string and array problems that appear across placement coding rounds, Python basic programs for practice covers the range of tasks you will encounter in CoCubes and AMCAT tests.
The isalpha() filter cleans the input. Once the string holds only letters, the interesting work begins: passing it to a classifier, extracting named entities, or sending it as context to an LLM API. TinkerLLM (₹299) covers that second stage, with Python exercises that wire string preprocessing into live API calls rather than stopping at a terminal print statement.
Primary sources
Frequently asked questions
Does isalpha() keep spaces in the filtered string?
No. A space is not an alphabetic character, so str.isalpha() returns False for it. To keep spaces alongside letters, add a check for the space character inside your list comprehension.
How do I keep alphabets and spaces but remove everything else in Python?
Use a generator expression with two conditions: check isalpha() OR check for a space character. Digits, punctuation, and symbols are excluded, while letters and spaces pass through.
What is the difference between isalpha() and re.sub() for character filtering?
isalpha() is Unicode-aware and keeps accented letters like e-acute and n-tilde. The ASCII regex pattern keeps only a-z and A-Z and strips accented characters. Choose based on whether your input is English-only or multilingual.
Does str.isalpha() accept Unicode characters like accented letters?
Yes. Python's str.isalpha() returns True for any Unicode alphabetic character, including accented letters from Latin, Greek, Cyrillic, and other scripts. This is standard Python 3 behaviour.
How do I remove special characters but keep digits in Python?
Use str.isalnum() instead of str.isalpha(). The isalnum() method returns True for both letters and digits, so only symbols, spaces, and punctuation are removed.
Which approach is fastest for filtering non-alphabet characters from a string?
str.translate() is typically fastest for large strings because the character lookup table is built once and iteration runs at the C layer. For strings under a few thousand characters, the performance difference between all three methods is negligible.
A self-paced playground for building with LLMs.
TinkerLLM is FACE Prep's sister property. A guided environment for shipping real LLM applications, the kind of project that earns a paragraph on your resume, not a line.
Try TinkerLLM (₹299 launch)