When Subsets != Substrings: A Costly Confusion

The Bug That Bit Me

Today I hit a classic case of conceptual confusion that cost me time and frustration. I was working on a problem requiring all substrings of a string, and I confidently wrote:

num_substrings = (1 << n) - 1  # 2^n - 1

This is wrong. This formula gives the number of subsets, not substrings.

The Correct Approach

For substrings, the formula should be:

num_substrings = (n * (n + 1)) // 2

The Impact

The difference is staggering. For a string of length n = 10:

Subsets: (1 << 10) - 1 = 1023
Substrings: (10 * 11) // 2 = 55

That’s a 20x difference. My code was creating way more iterations than needed, leading to timeouts and confusing behavior that had me chasing ghosts in my logic.

Understanding the Mental Model Error

Subsets (2^n): Choose any combination of characters from any positions

Example with “abc”: "", "a", "b", "c", "ab", "ac", "bc", "abc"
Order doesn’t matter
Positions can be non-contiguous
Every character has two choices: include it or don’t

Substrings (n(n+1)/2): Contiguous sequences only

Example with “abc”: "a", "b", "c", "ab", "bc", "abc"
Must be continuous in the original string
Defined by a start and end position
Number of ways to choose two endpoints from n+1 positions

The Key Lesson

Always pause and ask: “Am I dealing with contiguous elements or arbitrary combinations?”

This simple question would have saved me 30 minutes of debugging why my “optimized” solution was actually slower than brute force. The irony of making code worse while trying to make it better is not lost on me.

Going Forward

When approaching string/array problems, I need to be explicit about:

Can elements be rearranged? → Likely subsets/combinations
Must elements be contiguous? → Substrings/subarrays
Does order matter? → Permutations vs combinations

This isn’t just about formulas—it’s about precision in problem understanding before writing a single line of code.

Takeaway: Slow down on the mental model. Fast code starts with correct thinking.