Placement Prep

Remove Duplicate Elements from an Array: Python, C++, Java

Remove array duplicates: two-pointer for sorted arrays (O(n), O(1) space); HashSet for unsorted (O(n), O(n) space). Code in Python, C++, and Java.

By FACE Prep Team 6 min read
arrays hash-set two-pointer data-structures coding-interview python cpp

Removing duplicate elements from an array produces a version containing each value exactly once, and the right algorithm depends on whether the array is sorted.

Sorted arrays keep duplicates adjacent, which enables an O(1)-space in-place scan. Unsorted arrays scatter duplicates across arbitrary positions, which requires a HashSet to track what has already been seen. Both approaches run in O(n) time; the space cost is where they differ.

The Two Cases: Sorted and Unsorted

In a sorted array, equal values are neighbours because sorting places them together. The input [1, 2, 3, 4, 4, 4, 5] reduces to [1, 2, 3, 4, 5]: the three consecutive 4s collapse to one, and no element needs to be looked up in a separate data structure.

In an unsorted array, duplicates can appear at any distance from each other. The input [9, 2, 7, 4, 7, 9] reduces to [9, 2, 7, 4]: the second 7 and second 9 are removed while preserving the order of first occurrences.

Sorting status determines the algorithm. Both run in O(n) time; the choice is between O(1) extra space (sorted) and O(n) extra space (unsorted).

Approach 1: Two-Pointer for Sorted Arrays

The two-pointer technique uses a slow pointer j (write position) and a fast pointer i (read position). The slow pointer advances only when the fast pointer finds a value that differs from what the slow pointer currently holds. That different value is then written at the next write position.

Algorithm steps

  • If the array is empty, return 0.
  • Set j = 0. The element at index 0 is unconditionally unique.
  • For i from 1 to n - 1:
    • If arr[i] != arr[j], increment j and set arr[j] = arr[i].
    • Otherwise, skip: i advances but j stays.
  • Return j + 1 as the count of unique elements.

Step-by-step trace on [1, 2, 3, 4, 4, 4, 5]

iarr[i]jarr[j]Action
1201Different: j=1, arr[1]=2
2312Different: j=2, arr[2]=3
3423Different: j=3, arr[3]=4
4434Same: no change
5434Same: no change
6534Different: j=4, arr[4]=5

Return j + 1 = 5. The first five positions of arr are [1, 2, 3, 4, 5]. Verified.

Python

def remove_duplicates_sorted(arr):
    if not arr:
        return 0
    j = 0
    for i in range(1, len(arr)):
        if arr[i] != arr[j]:
            j += 1
            arr[j] = arr[i]
    return j + 1

arr = [1, 2, 3, 4, 4, 4, 5]
new_len = remove_duplicates_sorted(arr)
print(arr[:new_len])  # [1, 2, 3, 4, 5]

C++

#include <iostream>
#include <vector>
using namespace std;

int removeDuplicatesSorted(vector<int>& arr) {
    if (arr.empty()) return 0;
    int j = 0;
    for (int i = 1; i < (int)arr.size(); i++) {
        if (arr[i] != arr[j]) {
            j++;
            arr[j] = arr[i];
        }
    }
    return j + 1;
}

int main() {
    vector<int> arr = {1, 2, 3, 4, 4, 4, 5};
    int newLen = removeDuplicatesSorted(arr);
    for (int i = 0; i < newLen; i++)
        cout << arr[i] << " ";  // 1 2 3 4 5
    return 0;
}

Java

public class RemoveDuplicates {
    static int removeDuplicatesSorted(int[] arr) {
        if (arr.length == 0) return 0;
        int j = 0;
        for (int i = 1; i < arr.length; i++) {
            if (arr[i] != arr[j]) {
                j++;
                arr[j] = arr[i];
            }
        }
        return j + 1;
    }

    public static void main(String[] args) {
        int[] arr = {1, 2, 3, 4, 4, 4, 5};
        int newLen = removeDuplicatesSorted(arr);
        for (int i = 0; i < newLen; i++)
            System.out.print(arr[i] + " ");  // 1 2 3 4 5
    }
}

The two-pointer method is O(n) time and O(1) space. No auxiliary data structure is allocated; the scan touches each element once and writes unique values forward in the same array. A placement interviewer asking for the space-efficient solution for sorted input expects this answer. For a deeper treatment of when O(1)-space algorithms are preferable to their O(n) counterparts, see space complexity of algorithms with worked examples.

Approach 2: HashSet for Unsorted Arrays

The adjacency assumption from Approach 1 does not hold for unsorted arrays. In [9, 2, 7, 4, 7, 9], the two 7s are separated by one element. No single-comparison scan can detect them as duplicates without remembering that 7 was already seen.

A HashSet records each element on first encounter. When a new element arrives, the set membership check tells in O(1) average time whether it is a duplicate.

Algorithm steps

  • Create an empty set seen and an empty list result.
  • For each element x in the array:
    • If x is not in seen, add x to both seen and result.
    • If x is already in seen, skip it.
  • Return result.

Step-by-step trace on [9, 2, 7, 4, 7, 9]

xIn seen?Action
9Noadd to seen and result
2Noadd to seen and result
7Noadd to seen and result
4Noadd to seen and result
7Yesskip
9Yesskip

Result: [9, 2, 7, 4]. Insertion order preserved.

Python

Python’s built-in set provides O(1) average membership testing and insertion:

def remove_duplicates_unsorted(arr):
    seen = set()
    result = []
    for x in arr:
        if x not in seen:
            seen.add(x)
            result.append(x)
    return result

arr = [9, 2, 7, 4, 7, 9]
print(remove_duplicates_unsorted(arr))  # [9, 2, 7, 4]

C++

C++ provides unordered_set for O(1) average lookup and insertion:

#include <iostream>
#include <vector>
#include <unordered_set>
using namespace std;

vector<int> removeDuplicatesUnsorted(const vector<int>& arr) {
    unordered_set<int> seen;
    vector<int> result;
    for (int x : arr) {
        if (seen.find(x) == seen.end()) {
            seen.insert(x);
            result.push_back(x);
        }
    }
    return result;
}

int main() {
    vector<int> arr = {9, 2, 7, 4, 7, 9};
    vector<int> res = removeDuplicatesUnsorted(arr);
    for (int x : res)
        cout << x << " ";  // 9 2 7 4
    return 0;
}

Java

Java’s LinkedHashSet maintains insertion order while providing O(1) average lookups:

import java.util.*;

public class RemoveDuplicates {
    static int[] removeDuplicatesUnsorted(int[] arr) {
        LinkedHashSet<Integer> seen = new LinkedHashSet<>();
        for (int x : arr) seen.add(x);
        return seen.stream().mapToInt(Integer::intValue).toArray();
    }

    public static void main(String[] args) {
        int[] arr = {9, 2, 7, 4, 7, 9};
        System.out.println(Arrays.toString(removeDuplicatesUnsorted(arr)));
        // [9, 2, 7, 4]
    }
}

The HashSet approach is O(n) time and O(n) space. The set can hold up to n entries in the worst case, which occurs when the input has no duplicates at all. The same HashSet pattern appears in finding all symmetric pairs in an array, where a HashMap records the first element of each pair and checks for its symmetric counterpart on arrival.

Complexity Comparison

ApproachInput typeTimeSpaceModifies original?
Two-pointerSorted onlyO(n)O(1)Yes, in-place
HashSetAnyO(n)O(n)No, returns new list
Brute force (nested loops)AnyO(n^2)O(1)Yes

For a placement coding round:

  • Sorted input: two-pointer is the expected answer because of O(1) space.
  • Unsorted input: HashSet is the expected answer because of O(n) time.
  • If asked why not sort-then-two-pointer: sorting costs O(n log n) time and O(log n) or O(n) space depending on the sort algorithm; the HashSet achieves O(n) time without a sort step.

The distinction between in-place modification (two-pointer) and returning a new structure (HashSet) is worth stating explicitly when answering in a technical round. Interviewers often follow up with “what if the caller needs the original array preserved?”

Edge Cases

Four inputs are worth testing before submitting:

  • Empty array (arr = []): The two-pointer version checks if not arr and returns 0 immediately. The HashSet version returns an empty list. Both handle this correctly without an explicit guard in the loop body.
  • All duplicates (arr = [5, 5, 5, 5]): Two-pointer: the fast pointer never finds a new value, j stays at 0, the function returns 1. HashSet: only the first 5 enters the set; result is [5].
  • No duplicates (arr = [1, 2, 3, 4]): Two-pointer: j advances with every step, returns the original length. HashSet: every element enters the set, result is identical to the input.
  • Single element (arr = [7]): Two-pointer returns 1 (the inner loop never executes). HashSet returns [7]. Both work without a special case.

The boundary-guard pattern here connects to the explicit empty-array check in equilibrium index of an array, where the same discipline of testing index-0 and last-index behaviour prevents silent wrong-output bugs.


The HashSet approach (insert on first encounter, skip on repeat) is the same building block behind semantic deduplication in LLM data pipelines, where duplicate or near-duplicate text chunks are filtered before embedding to prevent inflated retrieval scores. TinkerLLM at ₹299 is where you can apply that membership-check logic to real LLM API calls, starting from the set-lookup intuition this problem builds.

Primary sources

Frequently asked questions

What is the difference between removing duplicates from a sorted vs unsorted array?

In a sorted array, duplicates are always adjacent, so a two-pointer scan removes them in one pass with O(1) extra space. In an unsorted array, duplicates can appear anywhere, requiring a HashSet to track seen values — which trades O(n) space for the same O(n) time bound.

Which approach should I present in a placement interview?

Start with the two-pointer method for the sorted case to show you know the O(1) space solution. For unsorted arrays, present the HashSet approach as the O(n) solution, then mention that sorting first costs O(n log n) time but brings space down to O(1). Stating that trade-off directly strengthens the answer.

Does the two-pointer method modify the original array?

Yes. The two-pointer approach is in-place: unique elements are copied forward in the same array, and the function returns a new length. Elements at indices beyond that length are not cleared; the returned length tells the caller how many positions are valid.

What does Java's LinkedHashSet add over a plain HashSet?

LinkedHashSet maintains insertion order while providing O(1) average lookups, identical to a plain HashSet. For deduplication, this means the output array preserves the order in which elements first appeared in the input, which is the expected behaviour for most placement test problems.

Can the HashSet approach be used on a sorted array?

Yes, it works correctly on both sorted and unsorted input. For a sorted array specifically, the two-pointer approach achieves the same O(n) time with O(1) space by exploiting adjacency. The HashSet always costs O(n) extra memory regardless of whether the input is sorted.

What is the time complexity of the brute-force approach for unsorted arrays?

A nested-loop comparison of every element against every other element runs in O(n^2) time. For n = 1,000 elements that means up to 999,000 comparisons. The HashSet approach reduces this to a single O(n) pass by recording seen elements as it goes.

Build AI projects

A self-paced playground for building with LLMs.

TinkerLLM is FACE Prep's sister property. A guided environment for shipping real LLM applications, the kind of project that earns a paragraph on your resume, not a line.

Try TinkerLLM (₹299 launch)
Free AI Roadmap PDF