C Program to Find the Frequency of Words in a String

C Program to Find the Frequency of Words in a String

Counting the frequency of words in a string is a fundamental task in text processing, useful in applications like word counters, text analytics, and data mining. In C, this task requires handling strings as arrays of characters, parsing words, and keeping track of their occurrences.

This tutorial will guide you through multiple methods to calculate word frequencies. We will cover approaches using loops, strtok() for tokenization, and even basic arrays for counting.

Understanding the Problem

The goal is to find how many times each word appears in a given string. Words are sequences of characters separated by spaces or other delimiters like punctuation.

The basic approach involves:

  1. Splitting the string into individual words.
  2. Comparing each word with previously encountered words.
  3. Counting the occurrences.

This problem demonstrates string parsing, memory management, and the use of C library functions for text processing.

Program 1: Using Nested Loops

A simple approach uses nested loops to compare each word with the others.

#include <stdio.h>
#include <string.h>
#include <ctype.h>

#define MAX_WORDS 100
#define MAX_LEN 50

int main() {

    char str[500];
    char words[MAX_WORDS][MAX_LEN];
    int freq[MAX_WORDS] = {0};
    int wordCount = 0;

    printf("Enter a string: ");
    fgets(str, sizeof(str), stdin);

    // Remove newline
    for (int i = 0; str[i] != '\0'; i++) {
        if (str[i] == '\n') str[i] = '\0';
    }

    // Split string into words
    char *token = strtok(str, " ");

    while (token != NULL) {

        int found = 0;

        for (int i = 0; i < wordCount; i++) {

            if (strcmp(words[i], token) == 0) {
                freq[i]++;
                found = 1;
                break;
            }

        }

        if (!found) {
            strcpy(words[wordCount], token);
            freq[wordCount] = 1;
            wordCount++;
        }

        token = strtok(NULL, " ");

    }

    printf("Word frequencies:\n");

    for (int i = 0; i < wordCount; i++) {
        printf("%s: %d\n", words[i], freq[i]);
    }

    return 0;

}

This program uses strtok() to split the string into words, stores each unique word in an array, and counts its frequency using a nested loop.

Program 2: Using Pointers and Arrays

For a slightly more manual approach, we can traverse the string with pointers and extract words without using strtok().

#include <stdio.h>
#include <string.h>
#include <ctype.h>

#define MAX_WORDS 100
#define MAX_LEN 50

int main() {

    char str[500];
    char words[MAX_WORDS][MAX_LEN];
    int freq[MAX_WORDS] = {0};
    int wordCount = 0;

    printf("Enter a string: ");
    fgets(str, sizeof(str), stdin);

    // Remove newline
    for (char *p = str; *p != '\0'; p++) {
        if (*p == '\n') *p = '\0';
    }

    char *start = str;

    while (*start) {

        while (*start && isspace(*start)) start++; // Skip spaces

        if (*start == '\0') break;

        char word[MAX_LEN];
        int len = 0;

        while (*start && !isspace(*start) && len < MAX_LEN - 1) {
            word[len++] = *start;
            start++;
        }

        word[len] = '\0';

        int found = 0;

        for (int i = 0; i < wordCount; i++) {

            if (strcmp(words[i], word) == 0) {
                freq[i]++;
                found = 1;
                break;
            }

        }

        if (!found) {
            strcpy(words[wordCount], word);
            freq[wordCount] = 1;
            wordCount++;
        }

    }

    printf("Word frequencies:\n");

    for (int i = 0; i < wordCount; i++) {
        printf("%s: %d\n", words[i], freq[i]);
    }

    return 0;

}

This pointer-based method allows more control over tokenization and can handle multiple consecutive spaces or unusual delimiters with additional checks.

Program 3: Using a Function for Modularity

To make the program cleaner and reusable, we can encapsulate the word frequency logic in a function.

#include <stdio.h>
#include <string.h>
#include <ctype.h>

#define MAX_WORDS 100
#define MAX_LEN 50

void wordFrequency(char *str) {

    char words[MAX_WORDS][MAX_LEN];
    int freq[MAX_WORDS] = {0};
    int wordCount = 0;

    char *token = strtok(str, " ");

    while (token != NULL) {

        int found = 0;

        for (int i = 0; i < wordCount; i++) {

            if (strcmp(words[i], token) == 0) {
                freq[i]++;
                found = 1;
                break;
            }

        }

        if (!found) {
            strcpy(words[wordCount], token);
            freq[wordCount] = 1;
            wordCount++;
        }

        token = strtok(NULL, " ");

    }

    printf("Word frequencies:\n");

    for (int i = 0; i < wordCount; i++) {
        printf("%s: %d\n", words[i], freq[i]);
    }

}

int main() {

    char str[500];

    printf("Enter a string: ");
    fgets(str, sizeof(str), stdin);

    for (int i = 0; str[i] != '\0'; i++) {
        if (str[i] == '\n') str[i] = '\0';
    }

    wordFrequency(str);

    return 0;

}

Using a function improves readability, maintains modularity, and allows reuse in other programs.

Program 4: Using strlwr() and Case-Insensitive Counting

We can use strlwr() (or tolower() character by character) so "Cat" and "cat" are counted as the same word.

#include <stdio.h>
#include <string.h>
#include <ctype.h>

#define MAX_WORDS 100
#define MAX_LEN 50

int main() {

    char str[500];
    char words[MAX_WORDS][MAX_LEN];
    int freq[MAX_WORDS] = {0};
    int wordCount = 0;

    printf("Enter a string: ");
    fgets(str, sizeof(str), stdin);

    // Remove newline
    for (int i = 0; str[i] != '\0'; i++) {

        if (str[i] == '\n') str[i] = '\0';
        str[i] = tolower(str[i]); // convert to lowercase

    }

    char *token = strtok(str, " ");

    while (token != NULL) {

        int found = 0;

        for (int i = 0; i < wordCount; i++) {

            if (strcmp(words[i], token) == 0) {
                freq[i]++;
                found = 1;
                break;
            }

        }

        if (!found) {

            strcpy(words[wordCount], token);
            freq[wordCount] = 1;
            wordCount++;

        }

        token = strtok(NULL, " ");

    }

    printf("Word frequencies (case-insensitive):\n");

    for (int i = 0; i < wordCount; i++) {
        printf("%s: %d\n", words[i], freq[i]);
    }

    return 0;

}

This version ensures "Hello" and "hello" are treated as the same word.

Program 5: Using Recursion

We can recursively process tokens and update frequencies.

#include <stdio.h>
#include <string.h>

#define MAX_WORDS 100
#define MAX_LEN 50

char words[MAX_WORDS][MAX_LEN];
int freq[MAX_WORDS];
int wordCount = 0;

void addWord(char *token) {

    if (token == NULL) return;

    int found = 0;

    for (int i = 0; i < wordCount; i++) {

        if (strcmp(words[i], token) == 0) {
            freq[i]++;
            found = 1;
            break;
        }

    }

    if (!found) {

        strcpy(words[wordCount], token);
        freq[wordCount] = 1;
        wordCount++;

    }

    addWord(strtok(NULL, " ")); // recursive call

}

int main() {

    char str[500];

    printf("Enter a string: ");
    fgets(str, sizeof(str), stdin);

    for (int i = 0; str[i] != '\0'; i++) {
        if (str[i] == '\n') str[i] = '\0';
    }

    char *token = strtok(str, " ");
    addWord(token);

    printf("Word frequencies (recursion):\n");

    for (int i = 0; i < wordCount; i++) {
        printf("%s: %d\n", words[i], freq[i]);
    }

    return 0;

}

This is more of a learning tool than a practical method, but it shows how recursion can manage token streams.

Program 6: Using qsort() for Sorting and Counting

Sorting the words first ensures that duplicates are adjacent, which makes counting faster and simpler.

#include <stdio.h>
#include <string.h>
#include <stdlib.h>
#include <ctype.h>

#define MAX_WORDS 100
#define MAX_LEN 50

// Comparison function for qsort
int cmpstr(const void *a, const void *b) {
    return strcmp((char *)a, (char *)b);
}

int main() {

    char str[500];
    char words[MAX_WORDS][MAX_LEN];
    int freq[MAX_WORDS] = {0};
    int wordCount = 0;

    printf("Enter a string: ");
    fgets(str, sizeof(str), stdin);

    // Remove newline and convert to lowercase
    for (int i = 0; str[i] != '\0'; i++) {

        if (str[i] == '\n') str[i] = '\0';
        str[i] = tolower(str[i]);

    }

    // Split string into words
    char *token = strtok(str, " ");

    while (token != NULL) {

        strcpy(words[wordCount], token);
        wordCount++;
        token = strtok(NULL, " ");

    }

    // Sort words alphabetically
    qsort(words, wordCount, MAX_LEN, cmpstr);

    // Count frequencies
    int i = 0;

    while (i < wordCount) {

        int count = 1;

        while (i + count < wordCount && strcmp(words[i], words[i + count]) == 0) {
            count++;
        }

        printf("%s: %d\n", words[i], count);
        i += count;

    }

    return 0;

}

This method is efficient for counting because sorting groups duplicates together, and the counting loop only scans each word once.

FAQs

Answering common questions about counting word frequencies in C programs.

1. Can this program handle multiple spaces between words?
Yes. Using strtok() or pointer-based logic ensures that extra spaces or tabs are skipped correctly, so each word is counted only once.

2. How do I make the word count case-insensitive?
Convert all characters to lowercase using tolower() (or strlwr() if available) before comparison. This treats "Cat" and "cat" as the same word.

3. Can this program handle punctuation?
By default, punctuation is treated as part of the word. To ignore punctuation, you can check each character with ispunct() and remove or skip it before counting.

4. Is this efficient for large texts?
For small to medium strings, these methods are fine. For very large texts, using a hash table or dynamic data structure is more efficient than nested loops.

5. Can I reuse these methods in other programs?
Yes. Function-based and modular versions (like Program 3) are easy to integrate into larger projects or multiple programs.

Conclusion

Counting the frequency of words in a string is a fundamental operation in text processing.

We explored six methods: nested loops, pointer-based parsing, modular functions, case-insensitive counting, recursion, and sorting with qsort(). Each method balances readability, efficiency, and flexibility differently.

Mastering these techniques equips you to handle text analysis, preprocessing, word counting, and more complex string manipulations in C programming.

References & Additional Resources

A curated list of tutorials and references for working with strings, pointers, and text-processing in C.

  1. Kernighan, Brian W., and Dennis M. Ritchie. The C Programming Language, 2nd Edition, Prentice Hall, 1988 – Authoritative text covering strings, arrays, and fundamental C programming concepts.
  2. Tutorialspoint: C Strings – Overview of C string handling, functions, and operations.
  3. Cprogramming.com: Pointers in C – Explains the fundamentals of pointers and their role in string and array manipulation.
  4. cplusplus.com: ctype.h Functions – Reference guide for character classification and manipulation functions.
Scroll to Top