You don’t have a data problem, you have a causation problem.

Big data shows you what happened, not why. Learn why causation comes from going narrow and deep first, then using big data to confirm and size.

You’ve got dashboards. You’ve got survey results, sample sizes, a market sizing deck that someone in finance has signed off. 

But you’re still sitting there wondering what to build next, or why the last thing you shipped didn’t land the way the numbers promised it would.

If that’s you, let me share with you something it took years to be comfortable saying out loud: more data isn’t going to fix this. 

You don’t need more data, you need better data. Trust me, those are not the same thing.

We’ll see teams trying to push big data further and further up the front of their process, convinced that if they just gather enough of it, the answer will fall out. It won’t. 

Not because big data is bad – trust me, I’m a big fan of it in the right place – but because of what big data can and can’t tell you.

Big data shows you the effect, not the cause.

Here’s the trap. You collect a mountain of data, you find a trend, and you assume the trend is the reason. You’ve seen the effect, so you tell yourself you understand the cause.

But knowing what happened is not the same as knowing why it happened.

There’s rarely one cause and one effect for anything. Sets of causes that produce an effect. 

When you flatten all of that into a single common thread, you end up chasing the wrong thing. 

There’s a timing problem too. 

Most data gets aggregated first and analyzed second, which buries the sequence. The cause might happen on a Tuesday and the effect might not show up until Wednesday. If you’re only looking at what happened on Tuesday, you’ll never see how the two connect. You’ll measure the right things at the wrong moment and call it insight.

The answer doesn’t live in the average

A lot of teams reach for big data because it feels safe. 

Big sample, statistically significant, nobody can argue with it. So it becomes the excuse: “we don’t have enough data to know yet.”

But I’ve never been able to pull causation out of a large dataset by staring at the average. The average smooths over the exact thing I’m hunting for. Todd Rose wrote a whole book on this called The End of Average, about how a genuinely useful tool has quietly distorted the way we see the world.

The average tells you what the middle did. It almost never tells you why anyone did anything.

What I actually want, before anything else, are the anomalies. I want to see the weird ones. The people doing something they’re not supposed to do.

The answer in data usually lives in the anomalies

Years ago, Clayton Christensen and I were looking at a milkshake business, staring at sales data. There was this odd spike: people buying milkshakes early in the morning, on their own, for a use nobody had designed the product for. 

Most analysts would scrub a data point like that. 

Thomas Kuhn wrote about this in The Structure of Scientific Revolutions – scientists see an anomaly and say “that point’s no good,” and they throw it out. 

Innovators do the opposite.

They treat the anomaly as the source of knowledge they don’t have yet. So instead of deleting it, we went and asked: why are these people doing this?  Read more about the Milkshake story. 

A similar thing happened with Basecamp.

We could see in the data that a doctor’s practice was using it, architects were using it, a not-for-profit was using it. The team knew exactly why agencies and software companies signed up. They had no idea why these other people were.

But that’s where the growth was hiding. Not in getting better for the customers they already understood, but in the people on the edges. Get too good for your best customers and you can quietly push everyone else out the door.

Do you know what? I didn’t find any of that by living in the big data. I found it by going narrow and deep first. Ten or twelve conversations, one person at a time, understanding what genuinely caused each of them to act. Once I could see that causal pattern in a handful of people, then I went into the big data and could spot the same shape playing out at scale. 

Small data to understand. 

Big data to confirm and size. 

Not the other way around.

Sizing the market in a vacuum will lie to you

This is where it bites hardest. Someone builds a model, and it says you need 40% of people to rate a concept a seven or eight before you’re allowed to launch.

It feels like an insurance policy. It mostly isn’t. Look at the track record of those tools and they’re worse than you’d guess. You launch a “$20 million business” and it does $100 million and you’re still in trouble because you built no infrastructure for it. Or you bet on a $100 million business and it does $10 million.

Sizing only means something in context. “There are a billion people who eat food, so this is a $15 billion business” tells you nothing.

When we moved into homebuilding, the assumption was that nobody buying a new home would ever consider a used one.

We disproved that in the first month. So we pulled the used-home listings, looked at what sold, how long it sat, who bought it, what surrounded it, and we used that to decide what and where to build. 

One small thing we noticed: every used home in the area had eight-foot ceilings. We made ours ten-foot. It cost almost nothing and the rooms felt enormous. We went from a couple hundred homes to a thousand in a little over three years. The data pointed us at the question. The conversations told us what to do about it.

Know which tool you’re holding, and why

None of this is “big data is bad.” It’s one perspective among several, and the mistake is over-relying on any single one.

So the next time you reach for a tool to help you understand what to build or to dig into the sales slump, ask yourself four plain things: what is it, what does it actually tell me, why am I using it right now, and how will I know if it worked.

If you’ve been told the answer is more data, and it still isn’t showing up, the problem probably isn’t the size of your dataset. It’s that you started at the wrong end. 

Start narrow. Find the anomaly. Understand the cause. Then go let the big data prove you right.

If you want a second set of eyes on where you’re stuck, book a call and we’ll talk it through.