Find Duplicate Elements in an Array: 5 Approaches
Five approaches to find duplicates in an array: O(n²) nested loops, sort-and-scan, hash set, XOR trick, and Floyd's cycle detection, with C, C++, and Python code.
Finding duplicate elements in an array is a standard placement coding question with five distinct solutions, each trading off time, space, and input constraints differently.
Two problem variants appear in tests. The general version has no range restriction: the array may hold any integer values, and any number of elements may repeat. The constrained version specifies that the array has n+1 elements with values in [1..n], which enables the O(n) O(1) solutions in Approaches 4 and 5. Approaches 1 through 3 work on both; Approaches 4 and 5 work only on the constrained version.
Approach 1: Nested-Loop Brute Force (C)
For each element, scan every element to its right. A match means the element is a duplicate.
Algorithm
- Traverse the array from index 0 to n-2 (outer loop, index
i). - For each
arr[i], scan from indexi+1to n-1 (inner loop, indexj). - If
arr[i] == arr[j], printarr[i]and break the inner loop to avoid printing the same duplicate twice.
C Code
#include <stdio.h>
void findDuplicates(int arr[], int n) {
printf("Repeating elements:\n");
for (int i = 0; i < n - 1; i++) {
for (int j = i + 1; j < n; j++) {
if (arr[i] == arr[j]) {
printf("%d ", arr[i]);
break;
}
}
}
printf("\n");
}
int main() {
int arr[] = {4, 3, 2, 7, 8, 3, 4};
int n = sizeof(arr) / sizeof(arr[0]);
findDuplicates(arr, n);
return 0;
}
Output: Repeating elements: 4 3
Trace on {4, 3, 2, 7, 8, 3, 4}
- i=0, arr[i]=4: check j=1(3), j=2(2), j=3(7), j=4(8), j=5(3), j=6(4) — match at j=6, print 4
- i=1, arr[i]=3: check j=2(2), j=3(7), j=4(8), j=5(3) — match at j=5, print 3
- i=2 through i=5: no matching element to the right
Complexity
- Time: O(n²) — the inner loop runs up to n-1 iterations per outer step.
- Space: O(1) — no auxiliary data structure beyond the input array.
This is the correct baseline. An interviewer who asks for a more efficient solution is testing whether you know the hash-set or sorted approach. Present this first, then improve.
Approach 2: Sort and Scan (C++)
Sort the array. Adjacent duplicates end up adjacent in sorted order and are found in a single scan.
Algorithm
- Sort the array with std::sort.
- Scan from index 1 to n-1.
- If
arr[i] == arr[i-1], printarr[i].
C++ Code
#include <iostream>
#include <algorithm>
using namespace std;
void findDuplicates(int arr[], int n) {
sort(arr, arr + n);
cout << "Repeating elements:\n";
for (int i = 1; i < n; i++) {
if (arr[i] == arr[i - 1]) {
cout << arr[i] << " ";
}
}
cout << "\n";
}
int main() {
int arr[] = {4, 3, 2, 7, 8, 3, 4};
int n = sizeof(arr) / sizeof(arr[0]);
findDuplicates(arr, n);
return 0;
}
Sorted array: {2, 3, 3, 4, 4, 7, 8}. Scan finds 3 (at index 2) and 4 (at index 4).
Complexity
- Time: O(n log n) — dominated by
std::sort. - Space: O(1) — sorting is in-place.
std::sort modifies the original array. If the rest of the program needs the original order preserved, copy the array before sorting or use the hash-set approach instead.
Approach 3: Hash Set (Python)
A set tracks which elements have already been seen. On each iteration, if the current element is already in the set, it is a duplicate.
Algorithm
- Initialise an empty set
seen. - For each element in the array:
- If the element is in
seen, record it as a duplicate. - Otherwise, add it to
seen.
- If the element is in
Python Code
def find_duplicates(arr):
seen = set()
duplicates = []
for val in arr:
if val in seen:
duplicates.append(val)
else:
seen.add(val)
return duplicates
arr = [4, 3, 2, 7, 8, 3, 4]
print("Repeating elements:", find_duplicates(arr))
Output: Repeating elements: [3, 4]
Python’s set type uses a hash table internally, so membership tests and insertions run in O(1) average time.
Trace on {4, 3, 2, 7, 8, 3, 4}
- val=4: not in seen, add. seen=4
- val=3: not in seen, add. seen=3
- val=2: not in seen, add. seen=2
- val=7: not in seen, add. seen=7
- val=8: not in seen, add. seen=8
- val=3: in seen, append 3
- val=4: in seen, append 4
Output: [3, 4]
Complexity
- Time: O(n) — one pass, O(1) average per element for set operations.
- Space: O(n) — the set holds up to n distinct elements.
This is the default answer for the general problem. The trade-off is space: the set grows proportionally with the number of distinct values in the array.
Approach 4: XOR Trick (Constrained Problem Only)
Constraint: same constraint as Approach 5. The array has n+1 elements, all values are integers in [1..n], and exactly one value is repeated.
XOR of any value with itself equals 0. XOR of any value with 0 returns the value unchanged. XOR is commutative and associative.
If you XOR all array elements and then XOR the result with 1 through n, each number in [1..n] that appears exactly once cancels out. Only the repeated number remains, because it contributes one extra XOR that does not cancel.
Algorithm
- Compute
arr_xoras the XOR of all elements in the array. - Compute
range_xoras the XOR of all integers from 1 to n (where n is the maximum value in [1..n]). - The duplicate equals
arr_xor XOR range_xor.
C Code
#include <stdio.h>
int findDuplicate(int arr[], int n) {
int arr_xor = 0, range_xor = 0;
for (int i = 0; i < n; i++) arr_xor ^= arr[i];
for (int i = 1; i < n; i++) range_xor ^= i;
return arr_xor ^ range_xor;
}
int main() {
/* n+1 = 5 elements, values in [1..4], duplicate = 2 */
int arr[] = {1, 3, 4, 2, 2};
int n = 5;
printf("Duplicate: %d\n", findDuplicate(arr, n));
return 0;
}
Output: Duplicate: 2
Trace on {1, 3, 4, 2, 2} (n=4, array has 5 elements)
- arr_xor: 1 XOR 3 XOR 4 XOR 2 XOR 2 = 6
- range_xor: 1 XOR 2 XOR 3 XOR 4 = 4
- duplicate: 6 XOR 4 = 2
Binary check: 110 XOR 100 = 010 = 2. Confirmed.
Complexity
- Time: O(n) — two linear passes.
- Space: O(1) — two integer variables.
The XOR trick returns only one duplicate value and only works under the [1..n] constraint. For the general problem, use the hash set.
Approach 5: Floyd’s Cycle Detection
Constraint: the array has n+1 elements, values in [1..n], exactly one duplicate.
Map the array as an implicit linked list: treat index i as a node whose next pointer is arr[i]. Because all values fall in [1..n] and the array has n+1 entries, at least two indices point to the same next node, forming a cycle. The entry point of that cycle is the duplicate.
Floyd’s algorithm finds the cycle in two phases. Phase 1 moves a slow pointer (one step per iteration) and a fast pointer (two steps per iteration) until they meet inside the cycle. Phase 2 resets the slow pointer to index 0 and advances both pointers one step at a time; their new meeting point is the cycle entry, which is the duplicate.
Python Code
def find_duplicate_floyd(arr):
slow = arr[0]
fast = arr[arr[0]]
# Phase 1: find meeting point inside the cycle
while slow != fast:
slow = arr[slow]
fast = arr[arr[fast]]
# Phase 2: find cycle entry = duplicate
slow = 0
while slow != fast:
slow = arr[slow]
fast = arr[fast]
return slow
arr = [1, 3, 4, 2, 2]
print("Duplicate:", find_duplicate_floyd(arr))
Output: Duplicate: 2
Trace on {1, 3, 4, 2, 2}
Mapping from index to next: 0 to 1, 1 to 3, 2 to 4, 3 to 2, 4 to 2. The path from node 0: 0, 1, 3, 2, 4, 2, 4, … (cycle: nodes 2 and 4).
Phase 1 (slow starts at arr[0]=1, fast starts at arr[arr[0]]=3):
- Step 1: slow=arr[1]=3, fast=arr[arr[3]]=arr[2]=4
- Step 2: slow=arr[3]=2, fast=arr[arr[4]]=arr[2]=4
- Step 3: slow=arr[2]=4, fast=arr[arr[4]]=arr[2]=4, slow equals fast at 4
Phase 2 (slow reset to 0, fast stays at 4):
- Step 1: slow=arr[0]=1, fast=arr[4]=2
- Step 2: slow=arr[1]=3, fast=arr[2]=4
- Step 3: slow=arr[3]=2, fast=arr[4]=2, slow equals fast at 2
Duplicate is 2. Confirmed.
Complexity
- Time: O(n) — both phases are linear.
- Space: O(1) — two pointer variables.
Floyd’s is the optimal approach for the constrained problem: O(n) time, O(1) space, and it does not modify the array. The sort approach (Approach 2) also uses O(1) space but costs O(n log n) and mutates the array.
Complexity Comparison
| Approach | Time | Space | Finds All Duplicates | Input Constraint |
|---|---|---|---|---|
| Brute force (C) | O(n²) | O(1) | Yes | None |
| Sort and scan (C++) | O(n log n) | O(1) | Yes | Mutates array |
| Hash set (Python) | O(n) | O(n) | Yes | None |
| XOR trick (C) | O(n) | O(1) | No — single only | Values in [1..n], one duplicate |
| Floyd’s cycle (Python) | O(n) | O(1) | No — single only | Values in [1..n], one duplicate |
For the general problem, the hash set is the correct O(n) solution. For the constrained problem, Floyd’s is optimal: O(n) time, O(1) space, non-destructive.
Placement Interview Tips
Duplicate-detection problems appear in three forms in Indian placement tests.
The first is a direct coding question: write a function that returns all duplicate elements. The expected answer is Approach 3 (hash set). Approach 1 (brute force) is an acceptable starting point, but the interviewer will follow up asking for a more efficient solution.
The second form adds a space constraint: find the duplicate using O(1) extra memory. This is the cue for sorting (Approach 2) or, if the values are confirmed to be in [1..n], for Floyd’s (Approach 5).
The third form is a reasoning question: given an array where every value is in [1..n] and exactly one appears twice, which algorithm is most efficient? The answer is Floyd’s: O(n) time and O(1) space without modifying the array.
For the full range of array and data-structure problems that appear across these tests, see data structures interview questions. The single-pass update pattern in Approach 3 also underlies the O(n) scan in finding the smallest and largest element: initialise a candidate at index 0, walk once, update on each comparison.
The O(n) hash-set pass in Approach 3 is the same deduplication loop used in retrieval-augmented generation pipelines: retrieve candidate documents, scan once with a seen set, discard any already encountered. TinkerLLM at ₹299 builds that connection through exercises that extend single-pass array logic toward context-window management in LLM inference. The palindrome-check pattern is another O(n) single-pass variant worth reviewing alongside this article.
Primary sources
Frequently asked questions
What is the most efficient approach to find all duplicates in an array?
The hash-set approach finds all duplicates in O(n) time and O(n) space with a single pass. Use it as the default unless the problem explicitly requires O(1) extra space.
When does the XOR trick work for finding duplicates?
The XOR trick works when the array has exactly n+1 elements with values in the range [1..n] and exactly one element is repeated. For arrays with multiple duplicates or arbitrary value ranges, use a hash set instead.
Does sort-and-scan modify the original array?
Yes. std::sort in C++ is in-place and overwrites the original element order. If the original order must be preserved, copy the array before sorting or use the hash-set approach instead.
Can Floyd's cycle detection find multiple duplicates in one pass?
No. Floyd's algorithm identifies the entry point of one cycle, which corresponds to one duplicate value. If the array has multiple duplicates, only one is returned.
Why does Floyd's cycle detection use O(1) space?
The algorithm uses only two pointer variables regardless of array size. A hash set grows with n because it stores every unique element seen so far.
What should I say first when asked to find duplicates in a placement interview?
Ask whether extra space is allowed. If yes, use a hash set in O(n) time and O(n) space. If no, sort-and-scan in O(n log n) or Floyd's if the constrained n+1 range applies.
A self-paced playground for building with LLMs.
TinkerLLM is FACE Prep's sister property. A guided environment for shipping real LLM applications, the kind of project that earns a paragraph on your resume, not a line.
Try TinkerLLM (₹299 launch)