Python Program to Remove Duplicate Characters from a String
Three Python methods to remove duplicate characters from a string: simple loop, set-based O(n), and dict.fromkeys(). Covers verified outputs and time complexity.
Three Python methods can remove duplicate characters from a string, each trading code simplicity for speed. This article covers all three, with verified outputs and time complexity for each.
Understanding the Problem
Removing duplicates from a string means keeping the first occurrence of each character and discarding every subsequent repeat. The order of the first-seen characters stays intact.
Two verified examples:
- Input:
"hello"— h, e, l, l, o. The secondlis a duplicate. Output:"helo" - Input:
"programming"— p, r, o, g, r, a, m, m, i, n, g. Tracing character by character: p(keep), r(keep), o(keep), g(keep), r(skip), a(keep), m(keep), m(skip), i(keep), n(keep), g(skip). Output:"progamin"
Note: some older guides show "programming" as the output for that second example. That is a transcription error in the original. Re-deriving from first principles gives "progamin" (8 unique characters), not the full 11-character original string.
This problem appears in placement coding rounds at TCS, Infosys, and Wipro, typically phrased as “return the string with only the first occurrence of each character.” It is also a building block for anagram detection and frequency-analysis problems. Pair it with the guide on sorting a string alphabetically to cover the full set of basic string-manipulation patterns tested in coding rounds.
Method 1: Loop with Membership Check
The straightforward approach iterates over each character and builds a result string by adding only characters not yet seen:
def remove_duplicates(s):
result = ""
for char in s:
if char not in result:
result += char
return result
print(remove_duplicates("hello")) # helo
print(remove_duplicates("programming")) # progamin
print(remove_duplicates("FACEFACE")) # FACE
Output verification:
"hello": h(keep), e(keep), l(keep), l(skip), o(keep) →"helo"✓"programming": trace from the Understanding section →"progamin"✓"FACEFACE": F(keep), A(keep), C(keep), E(keep), F(skip), A(skip), C(skip), E(skip) →"FACE"✓
Why this is O(n²)
The char not in result check scans the entire result string from left to right each time. On a string of n characters, the result grows up to n characters, so each membership test takes up to O(n) time. Multiplied across n characters, the total is O(n²). Performance at different scales:
- On a 10-character string: negligible. All three methods feel instant.
- On a 10,000-character string: this method executes up to 100 million character comparisons in the worst case.
For a placement coding test, Method 1 is easy to write and reason about. In any production context, prefer Method 2 or 3.
Method 2: Set-and-List Approach
The O(n) approach uses a set to track seen characters. Set membership in Python is O(1) average, based on hash lookup:
def remove_duplicates_fast(s):
seen = set()
result = []
for char in s:
if char not in seen:
seen.add(char)
result.append(char)
return "".join(result)
print(remove_duplicates_fast("hello")) # helo
print(remove_duplicates_fast("programming")) # progamin
Per the Python set type documentation, average-case lookup and insertion for sets are both O(1). This means the inner check char not in seen no longer scans the entire accumulated result on every iteration. The full loop runs in O(n) time.
Two design choices here are worth noting:
- A
listis used forresultinstead of a string. Appending to a list is O(1); string concatenation (result += char) creates a new string object each time, adding overhead on every step. "".join(result)converts the list to a string at the end in a single O(n) pass.
The set-and-list pattern is what to reach for when the function is called repeatedly or when the input can be arbitrarily long.
Method 3: One-liner with dict.fromkeys()
Python 3.7 and later maintain dictionary insertion order. That property makes dict.fromkeys() a clean one-liner for deduplication:
def remove_duplicates_dict(s):
return "".join(dict.fromkeys(s))
print(remove_duplicates_dict("hello")) # helo
print(remove_duplicates_dict("programming")) # progamin
print(remove_duplicates_dict("FACEFACE")) # FACE
According to the Python dict.fromkeys() documentation, dict.fromkeys(iterable) creates a dictionary with keys taken from the iterable in iteration order, each mapped to a default value of None. Passing a string as the iterable produces one key per unique character in the order they first appear. "".join() then converts the keys back to a string.
Time complexity: O(n) average. Space complexity: O(n) for the dict and the output string.
This is the idiomatic Python approach. One caveat: it relies on dict insertion order being preserved, which is guaranteed from Python 3.7. On Python 3.6 (CPython only, an implementation detail) or earlier, the order is not guaranteed. All current placement test platforms use Python 3.x, so this is safe in practice.
Comparing the Three Methods
| Method | Time Complexity | Preserves Order | Code Length |
|---|---|---|---|
Loop with in check | O(n²) | Yes | 4 lines |
| Set-and-list | O(n) avg | Yes | 6 lines |
dict.fromkeys() | O(n) avg | Yes (Python 3.7+) | 1 line |
All three produce identical output for typical inputs. The performance gap is irrelevant on short strings, but grows with input size:
- On strings up to a few hundred characters: all three methods run fast enough that the difference is not measurable.
- On a 50,000-character corpus: Method 1 executes far more comparisons than Methods 2 or 3.
For a placement coding test: Method 3 is the fastest to write and least likely to contain a bug. Method 2 is better if an interviewer asks you to explain the algorithm step by step. Method 1 is useful if asked to write the solution without any built-in data structures other than a basic string.
Edge Cases Worth Knowing
Before a timed test, check these manually:
- Empty string: all three methods return
"". No error is raised. - Single character:
remove_duplicates("a")returns"a". Works correctly. - All-duplicate string:
remove_duplicates("aaaa")returns"a". The first occurrence is kept; all subsequent copies are dropped. - Case sensitivity:
"A"and"a"are treated as different characters.remove_duplicates("AaAa")returns"Aa", not"A". Uses.lower()before deduplication if the task requires case-insensitive results. - Digits and special characters: all three methods treat digits, spaces, and punctuation as distinct characters.
remove_duplicates("a1 a1")returns"a1 "with the space kept as a unique character. - Unicode characters: Python 3 handles Unicode strings natively. All three methods work on non-ASCII characters without modification.
The character type check guide covers how Python distinguishes uppercase, lowercase, digit, and special characters. That context matters when the task adds a filter such as “remove duplicate letters only, keeping digits as-is.”
For broader string-manipulation practice, the Python practice programs collection covers the categories most commonly tested in placement coding rounds, including I/O handling, loops, and string operations.
Both Method 2 and Method 3 rely on the same principle: hash the characters already seen so each lookup costs O(1) rather than O(n). That hash-table approach is also how deduplication works in LLM token-processing pipelines when cleaning text before passing it to a model. TinkerLLM is the entry point at ₹299 to apply Python string-handling ideas at the scale of real model inputs, rather than on toy strings of 10 characters.
Primary sources
Frequently asked questions
What does removing duplicate characters mean in Python?
Keep only the first occurrence of each character and discard every repeat. Input 'hello' gives 'helo'; input 'programming' gives 'progamin'.
Which method preserves character order when removing duplicates?
The loop-with-set approach and dict.fromkeys() both preserve insertion order. Using set() alone on a string loses the original order because sets are unordered.
What is the time complexity of dict.fromkeys() for deduplication?
O(n) average case. Each character is inserted as a dict key once, and dict key lookup and insertion are both O(1) average.
What is the difference between set(s) and dict.fromkeys(s) for this task?
set(s) removes duplicates but does not preserve the original character order. dict.fromkeys(s) preserves insertion order in Python 3.7 and later.
How does this problem appear in placement coding tests?
Usually phrased as 'return the string with only the first occurrence of each character kept.' TCS, Infosys, and Wipro online tests include string-deduplication as a standard coding problem.
Does this work for strings with spaces and digits?
Yes. All three methods treat spaces and digit characters as distinct characters. The string 'a1 a1' deduplicates to 'a1 ' with each unique character kept once.
A self-paced playground for building with LLMs.
TinkerLLM is FACE Prep's sister property. A guided environment for shipping real LLM applications, the kind of project that earns a paragraph on your resume, not a line.
Try TinkerLLM (₹299 launch)