Fixed-Size Sliding Window Technique for Identifying String Anagrams

Problem Definition

Given two strings s and p, identify every starting index within s where a substring contains the exact same characters as p with identical frequencies. Character order is irrelevant for matching purposes.

Example: With s = "cbaebabacd" and p = "abc", the qualifying substrings appear at indices 0 ("cba") and 6 ("bac"). The expected output array is [0, 6].

Algorithmic Strategy

The optimal solution applies a fixed-length sliding window paired with character frequency mapping. Because the problem domain restricts inputs to lowercase Latin letters, we can efficiently track distributions using static integer buffers of length 26.

  1. Frequency Initialization: Allocate two count arrays. The first records the character requirements derived from p, while the second monitors the dynamic state of the traversed segment in s.
  2. Rightward Expansion: Iterate through s using a right boundary pointer. Increment the frequency slot corresponding to each newly encountered character.
  3. Leftward Compression: If the current window span surpasses the length of p, decrement the frequency count for the character being pushed out from the left edge.
  4. Equality Validation: Once the window dimensions align with p, perform a direct comparison between the two count arrays. An exact match confirms an anagram. Store the left boundary index in the results container.

Code Implementation

#include <iostream>
#include <vector>
#include <string>
#include <array>

std::vector<int> locateAnagrams(const std::string& text, const std::string& pattern) {
    std::vector<int> validIndices;
    int textLen = text.length();
    int patLen = pattern.length();

    if (patLen > textLen) return validIndices;

    std::array<int, 26> targetFreq = {};
    std::array<int, 26> currentFreq = {};

    // Establish baseline character distribution
    for (char charVal : pattern) {
        targetFreq[charVal - 'a']++;
    }

    // Scan through the source string
    for (int rightPos = 0; rightPos < textLen; ++rightPos) {
        // Incorporate new character into the active window
        currentFreq[text[rightPos] - 'a']++;

        // Expel leftmost character once capacity is exceeded
        if (rightPos >= patLen) {
            currentFreq[text[rightPos - patLen] - 'a']--;
        }

        // Verify configuration when window reaches target dimensions
        if (rightPos >= patLen - 1) {
            if (currentFreq == targetFreq) {
                validIndices.push_back(rightPos - patLen + 1);
            }
        }
    }
    return validIndices;
}

int main() {
    std::string s_input = "cbaebabacd";
    std::string p_input = "abc";
    std::vector<int> matches = locateAnagrams(s_input, p_input);

    for (int idx : matches) {
        std::cout << idx << " ";
    }
    return 0;
}

Execution Mechanics

  • ASCII Offset Mapping: Subtracting the base character 'a' converts alphabetic input into contiguous zero-indexed integers, enabling direct array access without hash overhead.
  • Boundary Arithmetic: The expression rightPos - patLen precisely targets the element exiting the window frame. Unlike variable-window approaches, this subtraction does not require additional offset adjustments because the window size constraint is mathematically hard-coded.
  • Starting Index Derivation: When equality is detected at rightPos, the original substring begins at rightPos - patLen + 1. This calculation shifts the cursor backward from the right edge to capture the exact anchor position of the valid segment.

Resource Consumption

Tags: C++ algorithm-design string-matching sliding-window frequency-counting

Posted on Thu, 14 May 2026 07:21:00 +0000 by onyx