In programming, arrays are widely used to store multiple values of the same type. However, sometimes arrays may contain duplicate elements, which can cause problems in calculations, data analysis, or other operations. Removing duplicates from an array ensures that every element appears only once, which is often necessary in tasks such as searching, sorting, and reporting unique values.
In this tutorial, we will write a complete C program to remove duplicates from an array. By the end of this guide, you will understand how to efficiently remove duplicate elements while preserving the order of the remaining values.
Understanding the Problem
The goal is to take an input array and produce an array that contains only unique elements. For example, if the array is [1, 2, 3, 2, 4, 1]
, the resulting array should be [1, 2, 3, 4]
. This task introduces array traversal, comparison, and conditional logic. A simple approach involves checking each element against previously encountered elements before adding it to the result array.
Program 1: Using a Temporary Array
A straightforward way to remove duplicates is to use a temporary array to store unique elements as you traverse the original array.
#include <stdio.h>
int main() {
int arr[] = {1, 2, 3, 2, 4, 1};
int size = sizeof(arr) / sizeof(arr[0]);
int temp[size];
int newSize = 0;
for (int i = 0; i < size; i++) {
int isDuplicate = 0;
for (int j = 0; j < newSize; j++) {
if (arr[i] == temp[j]) {
isDuplicate = 1;
break;
}
}
if (!isDuplicate) {
temp[newSize] = arr[i];
newSize++;
}
}
printf("Array after removing duplicates: ");
for (int i = 0; i < newSize; i++) {
printf("%d ", temp[i]);
}
printf("\n");
return 0;
}
In this program, we first declare an array and calculate its size. We create a temporary array to store unique elements. The outer loop traverses each element of the original array. The inner loop checks if the current element already exists in the temporary array. If it does not, we add it to the temporary array and increment newSize
. Finally, we print the new array containing only unique elements.
Program 2: In-Place Removal
You can also remove duplicates directly in the original array to save space, although this approach requires careful management of indices.
#include <stdio.h>
int main() {
int arr[] = {1, 2, 3, 2, 4, 1};
int size = sizeof(arr) / sizeof(arr[0]);
int newSize = size;
for (int i = 0; i < newSize; i++) {
for (int j = i + 1; j < newSize; j++) {
if (arr[i] == arr[j]) {
for (int k = j; k < newSize - 1; k++) {
arr[k] = arr[k + 1];
}
newSize--;
j--;
}
}
}
printf("Array after removing duplicates: ");
for (int i = 0; i < newSize; i++) {
printf("%d ", arr[i]);
}
printf("\n");
return 0;
}
In this method, we traverse the array and compare each element with the rest of the elements. When a duplicate is found, we shift all following elements one position to the left and decrease newSize
. This keeps the original array without using extra memory for another array. However, it may be slower for large arrays due to repeated shifting operations.
Program 3: Using Sorting for Efficiency
If the array can be sorted, removing duplicates becomes easier because all duplicates will be adjacent.
#include <stdio.h>
#include <stdlib.h>
int compare(const void *a, const void *b) {
return (*(int*)a - *(int*)b);
}
int main() {
int arr[] = {4, 2, 1, 3, 2, 1};
int size = sizeof(arr) / sizeof(arr[0]);
qsort(arr, size, sizeof(int), compare);
int newSize = 1;
for (int i = 1; i < size; i++) {
if (arr[i] != arr[i - 1]) {
arr[newSize] = arr[i];
newSize++;
}
}
printf("Array after removing duplicates: ");
for (int i = 0; i < newSize; i++) {
printf("%d ", arr[i]);
}
printf("\n");
return 0;
}
Here, we first sort the array using qsort()
. Once the array is sorted, duplicates are next to each other. We then traverse the sorted array and copy only unique elements to the front. This method is efficient for large arrays but changes the original order of elements.
Optimized Removal Using Hashing
For large arrays, nested loops can be slow, resulting in O(n²) time complexity. Using a hash table (or boolean array for small integer ranges) allows us to remove duplicates in O(n) time while using extra memory.
#include <stdio.h>
#include <stdbool.h>
#define MAX 10 // maximum value in array for simplicity
int main() {
int arr[] = {4, 2, 1, 3, 2, 1, 5};
int size = sizeof(arr) / sizeof(arr[0]);
bool hash[MAX + 1] = {false}; // boolean array to track seen elements
int newSize = 0;
int result[size]; // store unique elements
for (int i = 0; i < size; i++) {
if (!hash[arr[i]]) {
result[newSize++] = arr[i];
hash[arr[i]] = true;
}
}
printf("Array after removing duplicates: ");
for (int i = 0; i < newSize; i++) {
printf("%d ", result[i]);
}
printf("\n");
return 0;
}
In this program, we use a boolean array hash
to track whether each value has already been encountered. As we traverse the original array, we only add elements to the result if they haven’t been seen before. This method is extremely fast for arrays with a limited range of integer values, reducing the time complexity from O(n²) to O(n).
Note: If the array contains large integers or negative numbers, you would need a more general hash table implementation (for example, using a map or dynamically allocated structures).
Performance Comparison
Method | Time Complexity | Space Complexity | Notes |
---|---|---|---|
Nested Loops | O(n²) | O(1) | Simple to implement, but slow for large arrays. |
Sorting + Remove Duplicates | O(n log n) | O(1) | Efficient for medium arrays; sorts the array which may not always be desired. |
Hashing / Boolean Array | O(n) | O(k) (depends on value range) | Very fast for large arrays with a known range of elements; uses extra memory. |
The nested loop method is straightforward but inefficient for big arrays. Sorting first reduces time complexity but changes element order. Hashing is the fastest and best for large datasets, provided you can allocate memory for the hash table or boolean array.
Which Method Should You Use?
If you are learning, the nested loop method is easiest to understand. It clearly shows how to compare each element with all others to detect duplicates.
The sorting method is a good choice if preserving the array’s content in order is not critical. It is faster than nested loops but changes element order.
The hashing method is the best for efficiency and large datasets. It allows O(n) performance while keeping the array’s unique elements intact, though it requires extra memory for the hash table.
In practice, for small arrays, nested loops are fine. For medium arrays, sorting can be used. For large arrays or performance-critical applications, hashing is the most practical choice.
FAQs
1. Can these methods be used for floating-point numbers?
Yes. You can adapt the comparison operations for float
or double
arrays, but be careful with floating-point precision.
2. Can I remove duplicates from a sorted array more efficiently?
Yes. When the array is already sorted, you only need to compare each element with its previous element, which reduces complexity.
3. Does the in-place removal method preserve the original order?
Yes, the in-place method preserves the order while removing duplicates, unlike the sorting-based method.
4. Can I use C standard library functions for this?
Yes, functions like qsort()
help in sorting arrays, which makes removing duplicates easier, especially for large datasets.
Conclusion
Removing duplicates from an array is a common and useful task in C programming. You have learned multiple approaches: using a temporary array, removing duplicates in-place, and using sorting for efficiency. Each method has advantages and trade-offs in terms of performance and memory usage. By practicing these techniques, you can confidently handle arrays with repeated elements and prepare for more advanced programming challenges.
References & Additional Resources
A curated collection of textbooks and tutorials for learning arrays, functions, and fundamental C programming concepts.
- Kernighan, Brian W., and Dennis M. Ritchie. The C Programming Language, 2nd Edition, Prentice Hall, 1988 – The foundational text covering arrays, functions, pointers, and core C programming principles.
- GeeksforGeeks: Remove Duplicates from an Array – Explains how to remove duplicate elements from an array with code examples and detailed explanations.
- Tutorialspoint: Arrays in C – Introductory guide to arrays, covering declaration, initialization, and common operations.
- Cprogramming.com: Functions and Arrays – Guide on writing reusable functions to manipulate arrays effectively.
- cplusplus.com: C Programming Basics – Beginner-friendly tutorial covering the fundamentals of C programming.