Placement Prep

Program to Remove All Non-Alphabet Characters from a String

Step-by-step algorithm and verified C, C++, Java, and Python code to remove all non-alphabet characters from a string, with edge cases and O(n) complexity analysis.

By FACE Prep Team 5 min read
string-manipulation character-filtering python java cpp placement-prep data-structures

A single pass through a string is all it takes to strip every non-alphabet character. No sorting, no extra data structures, just a character test at each position.

What the Problem Asks

Given a string that can contain uppercase letters, lowercase letters, digits, and symbols, produce a new string that contains only the alphabet characters, in their original order. Everything else is dropped.

Canonical example from placement papers:

  • Input: pro$#&gra7m
  • Expected output: program

Tracing it manually: p, r, o are alphabets and pass through. $, #, & are symbols and are dropped. g, r, a pass through. 7 is a digit and is dropped. m passes through. Result: program.

This problem appears in the string-manipulation section of service-tier placement screens, including AMCAT Automata and CoCubes coding rounds. It tests whether a candidate knows the standard character-classification functions in their language rather than reinventing the test with ASCII range comparisons.

Algorithm: Single-Pass Character Filter

The algorithm is four steps, executed once:

  • Step 1: Read the input string.
  • Step 2: Create an empty result string (or buffer).
  • Step 3: Visit each character left to right. If it is an alphabet letter, append it to the result; otherwise skip it.
  • Step 4: Output the result string.

The key decision in Step 3 is how to test “is this character alphabetic?” The correct answer for each language:

  • C / C++: isalpha() from <ctype.h> / <cctype>. Returns non-zero for A through Z and a through z.
  • Java: Character.isLetter(). Returns true for any Unicode letter; use Character.isAlphabetic() if you want the broader Unicode-aware definition.
  • Python: str.isalpha() on a single character. Returns True for any Unicode letter.

Do not substitute an ASCII range comparison like (c >= 'A' && c <= 'Z') || (c >= 'a' && c <= 'z'). It works for ASCII-only inputs but breaks silently for any character outside that range. The standard library functions are the right tool.

C and C++ Implementations

C Implementation

#include <stdio.h>
#include <ctype.h>

void remove_non_alpha(const char *str, char *result) {
    int j = 0;
    for (int i = 0; str[i] != '\0'; i++) {
        if (isalpha((unsigned char)str[i])) {
            result[j++] = str[i];
        }
    }
    result[j] = '\0';
}

int main(void) {
    char str[201];
    char result[201];
    scanf("%200s", str);
    remove_non_alpha(str, result);
    printf("%s\n", result);
    return 0;
}

The (unsigned char) cast before isalpha is not decorative. On platforms where char is signed, passing a negative value directly to isalpha triggers undefined behaviour; the cast keeps the value in the 0 to 255 range that <ctype.h> expects. See the C isalpha() reference at cppreference.com for the full specification.

Verified trace on pro$#&gra7m

  • Positions 0 to 2: p, r, oisalpha returns true — appended.
  • Positions 3 to 5: $, #, &isalpha returns 0 — skipped.
  • Positions 6 to 8: g, r, a — appended.
  • Position 9: 7isalpha returns 0 — skipped.
  • Position 10: m — appended.
  • Result: program

C++ Implementation

#include <iostream>
#include <string>
#include <cctype>

std::string remove_non_alpha(const std::string &str) {
    std::string result;
    for (char c : str) {
        if (std::isalpha(static_cast<unsigned char>(c))) {
            result += c;
        }
    }
    return result;
}

int main() {
    std::string str;
    std::cin >> str;
    std::cout << remove_non_alpha(str) << std::endl;
    return 0;
}

The range-based for loop over std::string iterates one char at a time, same traversal order as the C version. std::string’s += operator appends a single character. The same (unsigned char) cast applies for the same reason.

Java and Python Implementations

Java Implementation

import java.util.Scanner;

public class RemoveNonAlpha {
    public static String removeNonAlpha(String str) {
        StringBuilder result = new StringBuilder();
        for (int i = 0; i < str.length(); i++) {
            if (Character.isLetter(str.charAt(i))) {
                result.append(str.charAt(i));
            }
        }
        return result.toString();
    }

    public static void main(String[] args) {
        Scanner sc = new Scanner(System.in);
        String str = sc.next();
        System.out.println(removeNonAlpha(str));
    }
}

Using StringBuilder instead of String concatenation inside the loop matters here. Each + on a String creates a new object in memory; StringBuilder.append() operates in-place. For a 200-character string the difference is negligible, but the pattern matters in interviews: an interviewer who sees String result = result + c inside a loop will note the inefficiency.

Character.isLetter() returns true for any Unicode letter. For strictly ASCII-only letter checking, use (c >= 'A' && c <= 'Z') || (c >= 'a' && c <= 'z'), but isLetter() is the idiomatic Java approach and what most interviewers expect.

Trace on pro$#&gra7m: identical to the C trace above. Output is program. ✓

Python Implementation

def remove_non_alpha(s):
    return ''.join(c for c in s if c.isalpha())

s = input()
print(remove_non_alpha(s))

The generator expression (c for c in s if c.isalpha()) iterates the string once and yields only alphabetic characters. ''.join(...) assembles the result. This is the idiomatic Python one-liner; most Python interviewers will accept it for placement rounds.

Python’s str.isalpha() returns True for any Unicode letter, not just ASCII. If the requirement is ASCII-only letters, combine both tests:

def remove_non_alpha_ascii(s):
    return ''.join(c for c in s if c.isalpha() and c.isascii())

A re.sub alternative also works and is worth knowing for interviews that ask for a regex approach:

import re

def remove_non_alpha_regex(s):
    return re.sub(r'[^a-zA-Z]', '', s)

[^a-zA-Z] matches any character that is not an uppercase or lowercase ASCII letter, and re.sub replaces every match with an empty string. The ^ inside the character class means “not these characters.”

Edge Cases

Four inputs worth testing before submitting in a coding round:

InputExpected outputReason
pro$#&gra7mprogramCanonical example: mix of letters, symbols, digit
HelloWorldHelloWorldAll alphabets: unchanged, case preserved
12345!@#(empty)No alphabets: result is empty string
He77o W0rldHeoWrldSpaces and digits both removed

Empty-string input produces empty-string output in all four implementations; the loop body never executes when the string length is zero.

Time and Space Complexity

  • Time: O(n). One pass through the string; each of the n characters is visited once and either appended or skipped. The isalpha / isLetter / isalpha() call is O(1) per character.
  • Space: O(n) for the result buffer. Worst case is when every character is alphabetic, making the result the same length as the input. No hash table, no sorting, no recursion stack — just the output buffer.

This is one of the more straightforward space-complexity derivations in placement prep. For a deeper treatment of how to calculate and reason about auxiliary space in algorithm questions, see the space complexity of algorithms guide.

The same single-pass traversal pattern applies when the problem is integer-based rather than string-based. For example, the task of replacing all 0s with 1s in a given integer uses the same digit-by-digit iteration logic.


Character-level filtering is more than a placement round warm-up. Every large language model tokenizer runs a normalisation pass that applies this exact filter: stripping control characters, normalising Unicode, removing characters outside its vocabulary. That pattern is isalpha at production scale. TinkerLLM lets you inspect live tokenisation pipelines and run your own variations for ₹299; the distance between a string-filter exercise and a shipped tokenizer is shorter than most placement guides suggest.

Primary sources

Frequently asked questions

Does isalpha() return true for digits?

No. isalpha() returns a non-zero value only for characters classified as alphabetic — A through Z and a through z in the C locale. Digits (0 to 9), spaces, and all symbols return 0. This is why the single-pass filter keeps only letters.

Will spaces be removed from the string?

Yes. A space character is not an alphabet letter, so the isalpha() check returns false for it and the character is not appended to the result. The input 'Hello World 2026' becomes 'HelloWorld'.

Does the program preserve uppercase and lowercase letters?

Yes. The filter tests only whether the character is alphabetic; it does not alter the character's case. Input 'ProGraM123' produces 'ProGraM' exactly.

What is the time complexity of this program?

O(n), where n is the length of the input string. Each character is visited exactly once in a single left-to-right traversal. There is no nested loop and no sorting step.

What is the space complexity?

O(n) for the result buffer. In the worst case, where every character is alphabetic, the result is the same length as the input. No additional auxiliary data structure is needed.

What happens with Unicode input in Python?

Python's str.isalpha() follows Unicode definitions. It returns True for any Unicode letter, not just ASCII A to Z. The Arabic letter alef, the Devanagari letter ka, and the Greek letter alpha all pass the check. If you want strictly ASCII letters only, use c.isascii() and c.isalpha() together.

Build AI projects

A self-paced playground for building with LLMs.

TinkerLLM is FACE Prep's sister property. A guided environment for shipping real LLM applications, the kind of project that earns a paragraph on your resume, not a line.

Try TinkerLLM (₹299 launch)
Free AI Roadmap PDF