Why is counting word occurrences significant in programming? Consider a scenario where you have a large body of text, and you want to analyze the frequency of each word. This analysis can unveil patterns, identify keywords, or even assist in building language models for natural language processing.
Moreover, counting word occurrences is a fundamental step in developing applications like search engines, spell checkers, and text summarizers. By understanding the frequency of words, programmers can enhance the efficiency and accuracy of these applications.
The Challenge at Hand
Let’s dive into the challenge of counting word occurrences. Imagine you have a C string containing a sentence or paragraph. The goal is to determine how many times each unique word appears in the string. To achieve this, we’ll employ C programming techniques, ensuring our code is both effective and understandable.
#include <stdio.h>
int main(int argc, char* argv[]) {
char inputString[] = "Counting word occurrences in C strings is a useful skill Word occurrences help analyze text effectively";
// Tokenizing the input string
char* token = strtok(inputString, " ");
while (token != NULL) {
// Printing each word
printf("%s\n", token);
// Move to the next word
token = strtok(NULL, " ");
}
return 0;
}
In this example, we use the strtok function to tokenize the input string based on spaces. Tokenization is the process of breaking down a string into smaller parts, called tokens. In this case, each token represents a word in the sentence.
Counting Word Occurrences
Now that we have tokenized the string into individual words, the next step is to count the occurrences of each word. We’ll use a simple approach by utilizing an array to store each unique word along with its count.
#include <stdio.h>
// Structure to store word and its count
struct WordCount {
char word[50];
int count;
};
int main(int argc, char* argv[]) {
char inputString[] = "Counting word occurrences in C strings is a useful skill Word occurrences help analyze text effectively";
// Tokenizing the input string
char* token = strtok(inputString, " .");
// Array to store unique words and their counts
struct WordCount wordCounts[100];
int count = 0;
while (token != NULL) {
// Check if the word is already in the array
int found = 0;
for (int i = 0; i < count; i++) {
if (strcmp(wordCounts[i].word, token) == 0) {
// Word found, increment its count
wordCounts[i].count++;
found = 1;
break;
}
}
// If the word is not found, add it to the array
if (!found) {
strcpy(wordCounts[count].word, token);
wordCounts[count].count = 1;
count++;
}
// Move to the next word
token = strtok(NULL, " ");
}
// Displaying word occurrences
for (int i = 0; i < count; i++) {
printf("Word: %s, Occurrences: %d\n", wordCounts[i].word, wordCounts[i].count);
}
return 0;
}
In this enhanced code, we introduce a structure WordCount to store each unique word and its corresponding count. The program iterates through the tokenized words, updating the count for existing words or adding new words to the array.
Conclusion
Counting word occurrences in C strings is a valuable skill for programmers working with textual data. By understanding the basics of C strings and employing techniques like tokenization and array manipulation, we can efficiently analyze and extract meaningful information from strings. This article provides a practical example of counting word occurrences in C, showcasing how simple programming concepts can be combined to solve real-world challenges. For more content, please subscribe to our newsletter.