Finding the union of two arrays is a common task in programming. The union of arrays contains all elements from both arrays but without duplicates. This concept is widely used in data processing, set operations, and algorithm problems. In this tutorial, we will explore multiple ways to find the union of two arrays in C, explain each approach in detail, and provide full code examples.
By the end of this tutorial, you will understand how to combine arrays, avoid duplicates, and write efficient C programs for union operations.
Understanding the Problem
The union of two arrays A and B is a new array that contains every element from A and B, but each element appears only once. For example, if:
A = {1, 2, 3}
B = {2, 3, 4}
Then the union will be:
Union = {1, 2, 3, 4}
The challenge is to efficiently combine arrays while removing duplicates. We will cover several approaches, from simple nested loops to using hashing for better performance.
Program 1: Using Nested Loops
This method is simple and easy to understand. We first copy all elements from the first array, then check each element from the second array and add it only if it is not already in the result.
#include <stdio.h>
int main() {
int A[] = {1, 2, 3, 4};
int B[] = {2, 3, 4, 5, 6};
int sizeA = sizeof(A) / sizeof(A[0]);
int sizeB = sizeof(B) / sizeof(B[0]);
int result[sizeA + sizeB]; // safe for any sizes
int sizeR = 0;
// Copy all elements from A
for(int i = 0; i < sizeA; i++) {
result[sizeR++] = A[i];
}
// Add elements from B if not already in result
for(int i = 0; i < sizeB; i++) {
int exists = 0;
for(int j = 0; j < sizeR; j++) {
if(B[i] == result[j]) {
exists = 1;
break;
}
}
if(!exists) {
result[sizeR++] = B[i];
}
}
printf("Union of two arrays: ");
for(int i = 0; i < sizeR; i++) {
printf("%d ", result[i]);
}
printf("\n");
return 0;
}
In this program, we first copy all elements from array A
into result
. Then, for each element in B
, we check if it already exists in result
. If it does not, we append it. This method is simple but can be slow for large arrays due to repeated comparisons.
Program 2: Using Sorting and Merge
We can sort both arrays and then merge them while skipping duplicates. This method is faster for larger arrays than nested loops because we avoid repeated comparisons.
#include <stdio.h>
#include <stdlib.h>
int compare(const void *a, const void *b) {
return (*(int*)a - *(int*)b);
}
int main() {
int A[] = {1, 2, 3, 4};
int B[] = {2, 3, 4, 5, 6};
int sizeA = sizeof(A) / sizeof(A[0]);
int sizeB = sizeof(B) / sizeof(B[0]);
int result[sizeA + sizeB]; // safe for any array sizes
int sizeR = 0;
qsort(A, sizeA, sizeof(int), compare);
qsort(B, sizeB, sizeof(int), compare);
int i = 0, j = 0;
while(i < sizeA && j < sizeB) {
if(A[i] < B[j]) result[sizeR++] = A[i++];
else if(B[j] < A[i]) result[sizeR++] = B[j++];
else { // equal elements
result[sizeR++] = A[i];
i++; j++;
}
}
while(i < sizeA) result[sizeR++] = A[i++];
while(j < sizeB) result[sizeR++] = B[j++];
printf("Union of two arrays: ");
for(i = 0; i < sizeR; i++) printf("%d ", result[i]);
printf("\n");
return 0;
}
Here, we first sort both arrays using qsort()
. Then, we traverse both arrays simultaneously, adding the smaller element or skipping duplicates. This is efficient for medium to large arrays but changes the original order.
Program 3: Using Hashing for Large Arrays
For large arrays or high-performance requirements, we can use a hash table or boolean array to keep track of elements already added. This reduces time complexity to O(n + m) where n and m are array sizes.
#include <stdio.h>
#include <stdbool.h>
#define MAX 100 // assuming all numbers are <= 100
int main() {
int A[] = {1, 2, 3, 4, 5};
int B[] = {2, 3, 4, 5, 6, 7, 8};
int sizeA = sizeof(A) / sizeof(A[0]);
int sizeB = sizeof(B) / sizeof(B[0]);
bool hash[MAX+1] = {false};
int result[sizeA + sizeB]; // large enough for all elements
int sizeR = 0;
// Add elements from A
for(int i = 0; i < sizeA; i++) {
if(!hash[A[i]]) {
result[sizeR++] = A[i];
hash[A[i]] = true;
}
}
// Add elements from B
for(int i = 0; i < sizeB; i++) {
if(!hash[B[i]]) {
result[sizeR++] = B[i];
hash[B[i]] = true;
}
}
printf("Union of two arrays: ");
for(int i = 0; i < sizeR; i++) {
printf("%d ", result[i]);
}
printf("\n");
return 0;
}
This approach uses extra memory but is very fast and suitable for arrays with a known range of values. It efficiently prevents duplicates without nested loops.
Performance Comparison
Method | Time Complexity | Space Complexity | Notes |
---|---|---|---|
Nested Loops | O(n*m) | O(n+m) | Easy to understand but slow for large arrays. |
Sorting + Merge | O(n log n + m log m) | O(n+m) | Efficient for sorted output; changes original order. |
Hashing | O(n+m) | O(k) | Very fast for large arrays with limited range; uses extra memory. |
Which Method Should You Use?
The nested loop method is best for beginners. It is simple and clearly demonstrates the logic of checking duplicates.
The sorting and merge method works well when arrays are medium-sized or when sorted output is desired.
The hashing method is ideal for large datasets and performance-critical tasks. It is very fast, though it requires extra memory for the hash table.
In practice, hashing is most commonly used for large arrays, while nested loops or sorting work fine for smaller arrays or learning exercises.
FAQs
Q1: Can I use memcpy()
to combine arrays for union?
Yes, you can use memcpy()
to copy arrays into a result array, but you still need to check for duplicates after copying.
Q2: Does the union method preserve order?
Nested loops preserve the order of the first array, but sorting changes the order. Hashing preserves insertion order if implemented carefully.
Q3: Can these methods handle negative numbers?
Yes, nested loops and sorting handle negative numbers naturally. Hashing requires adjustments since array-based hashes cannot have negative indices.
Conclusion
Finding the union of two arrays is a foundational skill in C programming. We explored nested loops, sorting with merge, and hashing methods. Beginners can start with nested loops, while hashing is best for large arrays and performance-critical tasks. Understanding these techniques builds confidence in array manipulation and problem-solving in C.
References & Additional Resources
A curated collection of textbooks and tutorials for learning arrays, array operations, and hashing techniques in C.
- Kernighan, Brian W., and Dennis M. Ritchie. The C Programming Language, 2nd Edition, Prentice Hall, 1988 – The foundational text covering arrays, pointers, functions, and core C programming concepts.
- GeeksforGeeks: C Arrays – Introduction to arrays in C, including declaration, initialization, traversal, and basic operations.
- GeeksforGeeks: Union of Two Arrays – Demonstrates how to find the union of two arrays using C programming examples.
- GeeksforGeeks: Hashing in Data Structure – Explains the concept of hashing, hash tables, and their applications for efficient data storage and retrieval.