Your data exhaust is not a moat until it changes the next decision
A real data advantage is not how much you capture. It is how many decisions you capture with outcomes attached.
The data moat that was 4TB of nothing
A founder building an AI tool for support teams told investors his moat was data. He had been live for a year, and every ticket, every keystroke, every model output and every click was logged. Terabytes of it. The deck said "proprietary dataset that compounds with every customer."
In diligence, a partner asked the only question that matters: "What can you do with that data that a competitor starting today can't?" The honest answer was nothing. The logs recorded what happened, but not why, and not whether it worked. He had millions of model outputs and no record of which ones the agent kept, edited, or threw away. He had the raw material of a product and none of the material of an advantage. The 4TB was an expense, not an asset. It sat in storage he paid for every month and improved nothing.
What he learned in that room is the thing this piece is about. Volume of data exhaust is not a moat. Most of what a product emits is noise that no one will ever train on, query, or sell. The data that becomes an advantage is structured around a decision, and it only becomes valuable when you also capture what that decision led to. He had the first half by accident and the second half not at all.
The cruel part is that fixing it is not a storage problem or a model problem. It is a product instrumentation problem he could have solved in week one for almost nothing, and instead he spent a year filling a hard drive with data that argued against him.
Why "we log everything" is the wrong instinct
The reflex of most technical founders is to capture as much as possible and figure out the value later. Disk is cheap, the thinking goes, so log it all and the dataset will be there when you need it. This feels responsible. It is the most common way founders end up with a large pile of data and no advantage.
The problem is that raw exhaust has three defects that no amount of volume fixes.
It is rarely proprietary in any way that matters. Keystrokes, page views, and generic model outputs look roughly the same in every product in your category. A competitor does not need your logs of these. They generate their own the moment they have users. Data is only a moat if a competitor cannot cheaply reproduce it, and most exhaust is trivially reproducible.
It is not structured around anything useful. A log that says a user clicked at a timestamp is a fact with no spine. To improve a product or train a model, you need data organized around the unit of value: the decision the product helps the user make, the prediction the product made, the action it recommended. Undifferentiated event streams are not organized around decisions, so turning them into something useful later means a reconstruction project that usually never happens.
It does not record outcomes. This is the one that kills founders. Even when you capture a decision the product made, you usually do not capture what happened next. Did the user accept the suggestion? Edit it? Ignore it and do the opposite? Without the outcome, the decision data cannot teach the next decision. It is a question with no answer key, and you cannot grade a model against questions you never scored.
A product gets more useful over time for one reason: it captures the right traces of how users make decisions and what works, and it feeds that back into the next decision. Not because the underlying model got smarter on its own. The companies that win the "data advantage" argument are not the ones with the most data. They are the ones whose data is proprietary, decision-shaped, and outcome-labeled, so each use of the product makes the next use better in a way a fresh competitor cannot match.
The framework: three tests for whether exhaust is an asset
Treat every stream of data your product emits as a candidate. Most candidates fail. A stream is an actual advantage only if it passes all three tests at once. Miss any one and you have storage cost, not a moat.
Test 1 — Proprietary. Could a competitor with the same number of users cheaply reproduce this data? If yes, it is not a moat. Generic clickstreams, standard model outputs, and public-ish metadata all fail here. The data that passes is data that exists only because users do something inside your specific workflow that they do nowhere else.
Test 2 — Decision-structured. Is this data organized around a decision the product helps make, or is it an undifferentiated event log? Data that passes is shaped like: here was the situation, here is what the product predicted or recommended, here is the choice the user faced. Data that fails is shaped like: event, timestamp, event, timestamp.
Test 3 — Outcome-labeled. Do you capture what happened after the decision? Accepted, edited, rejected, or reversed later. The result the user got. Data that passes closes the loop from decision to consequence. Data that fails records the decision and loses the consequence, which makes it untrainable.
The reframe most founders miss: you are not trying to maximize data captured. You are trying to maximize decisions captured with their outcomes. A product that logs a thousand events per session but cleanly captures three decisions and what came of them has a far stronger data position than a product drowning in raw telemetry. The first can improve itself. The second can only fill a disk.
Two products, same logs, opposite data position
Take two AI writing assistants for sales teams. Both log "everything." Same volume of exhaust. Completely different data positions.
Product A logs every prompt, every generated draft, and every keystroke. What it does not record is which draft the rep sent, what they changed before sending, and whether the email got a reply. So it has millions of generated drafts and no idea which ones were good. The data is proprietary-ish and high-volume, but it is not outcome-labeled. It cannot tell a strong suggestion from a weak one, so it cannot get better at suggesting. The dataset grows and the product stays the same.
Product B logs less raw text but captures, for each draft: the context it was generated in, whether the rep sent it as-is, the specific edits they made, and whether the prospect replied. Far less data by volume. But every row is a decision with an outcome attached. After ten thousand sends, Product B knows which kinds of openers get edited out, which closings get replies, and which contexts produce drafts reps trash. It feeds that back, and the suggestions improve in a way a competitor starting today cannot replicate without running the same loop for months.
Product B's data is the moat. Not because there is more of it. Because every row can change the next decision. This is the move: stop measuring your data advantage by volume and start measuring it by how many of your product's decisions you capture with their outcomes attached. Then instrument for the gap.
The artifact: data advantage assessment worksheet
Run your product's data through this before your next investor conversation, or before you write "data moat" on a slide. The point is not to count gigabytes. It is to find out whether the data you are accumulating can make your product better, and to expose the gap between what you log and what you can use.
Part 1 — List your decision points
Forget the event logs. Write down the three to five decisions your product helps the user make or makes for them. These are the only places a data advantage can live.
| Decision the product makes or supports | What the user is trying to accomplish |
|---|---|
| (e.g. which draft to suggest) | |
Part 2 — Score each decision on the three tests
For each decision above, mark whether the data around it passes each test today.
| Decision | Proprietary? (competitor can't cheaply reproduce) | Decision-structured? (captured as situation → prediction → choice) | Outcome-labeled? (we record what happened after) |
|---|---|---|---|
| Y / N | Y / N | Y / N | |
| Y / N | Y / N | Y / N | |
| Y / N | Y / N | Y / N |
A decision is a real data asset only with three Yes marks. Count them. That number, not your storage bill, is the size of your data advantage.
Part 3 — Find the missing outcome
For every decision that failed the outcome test, write down the outcome you are not capturing and where it lives. This is almost always the highest-leverage fix.
| Decision | Outcome we currently lose | Where that outcome could be captured |
|---|---|---|
| (e.g. did the user send the draft, and did it get a reply) | (e.g. send event + reply webhook) | |
Part 4 — The instrumentation move
For each gap, write the smallest change that would close the loop from decision to outcome. The fix is usually one of: log the user's accept/edit/reject on a suggestion, attach the eventual result to the decision record, or stop logging a raw stream nobody will use and start logging the choice it was hiding. Name the move and the week you will ship it.
Part 5 — The investor answer
Write the one-sentence answer to the diligence question your data has to survive: "What can you do with your data that a competitor starting today cannot?" If you cannot answer it with a specific decision your data improves, you do not have a data moat yet. You have logs. Say that honestly to yourself before an investor says it to you.
Where RoundOS fits
A fundraising round is itself a stream of decisions with outcomes, and almost none of it gets captured in a usable shape. A founder decides which investor to chase, which follow-up to send, when to push and when to wait. Then something happens: the investor replies, goes quiet, passes, or commits. Most founders keep the decisions in their head and the outcomes scattered across email, calendar, and notes, so none of it ever improves the next decision. It is the same defect as Product A, applied to your own raise.
RoundOS captures the round as decision-and-outcome data, not as raw exhaust. It reads the sources where the round already lives, your emails, meeting notes, and investor lists, and structures them around the moves you make and what followed: which follow-up got a reply, which thread went stale after which message, which intro path actually converted. Because the data is decision-shaped and outcome-labeled, the next-move suggestions get sharper as the round goes on, in the same way Product B's suggestions improve. The product gets more useful because it is capturing the right traces of how your raise goes, not because it is hoarding everything you type.
It stores your fundraising history as the answer key for your next move, not as a pile of notes you never reread.
Count the decisions with answer keys.
Open the worksheet and do Part 1 and Part 2 for your product right now. Count your three-Yes decisions. If the number is zero or one, your data moat is a storage bill, and the fix is instrumentation, not more users. Running a raise? Drop your last few investor emails and meeting notes into RoundOS and see your round rendered as decisions with outcomes attached, so the next follow-up is informed by what worked on the last one.