Bad Data Is The Real AI Problem
From a duck blind to the boardroom, you have to fix the signal.
by Brian Mulderrig
It was 5 a.m. on a freezing Sunday in January when I found myself discussing AI hallucinations in a duck blind with a lawyer. Huddled in our camouflage shelter, surrounded by icy water and floating decoys, my friend told me about a recent experience using ChatGPT at work.
Our morning hunt for duck l’orange quickly became a lesson in signal quality for our respective professions. In simple terms, a signal is decision-grade data—the fuel behind large language models like those powering ChatGPT and Claude.
Over the course of the morning, we surmised that in both legal services and advertising, the AI advantage isn’t about better models anymore. It’s about a better signal.
ChatGPT, Esq.
While drafting a legal motion, my buddy decided to test the latest version of OpenAI’s LLM to help him source case law. Within seconds, it produced a motion that was extremely well written and neatly cited, per his prompt. “Well, my job is absolutely done for…” he thought.
But when he double-checked citations through LexisNexis, he found every case to be completely fabricated. Not one was real.
“Can you imagine if I submitted that?” he said. “I’d be disbarred.”
The broader implications are more alarming than one lawyer submitting falsified sources. As AI adoption spreads, fabricated outputs like these create real risk across industries that depend on trusted information for success.
The hidden battle over signal
Legacy content providers have long been sources of trusted information, yet find themselves in a tough spot today. For decades, these publishers have monetized consumer attention via advertising.
Now, LLMs monetize consumer understanding using the same publishers’ signals, often without passing fair economic value back to the source.
What happens if publishers stop allowing models to train on their information? Unverified signals begin fueling answers. Publisher-LLM economics is a debate for another article.
But the relationship highlights one thing clearly: AI is only as strong as the signal powering it. As traffic shifts and ad dollars follow, it’s critical to understand which players will control the signal, because they’ll also control the outcomes.
AI will never fix bad data
Every advertising technology company on the planet claims its fancy AI can now effectively target consumers at scale, yet I’m still being served daily ads for menopause.
Bad targeting persists despite AI’s rise because a large percentage of advertising AI still trains on probabilistic proxies (educated guesses based on indirect data) rather than verified data about what real people actually want.
Third-party data relies on buying demographic assumptions from various data brokers and hoping they’re accurate. When our data science team at ViralGains audited several major third-party data providers last year, we found a 99% overlap between male and female audiences in a widely used source in the advertising industry. In other words, there was almost zero distinction between men and women in the targeting set.
Another trusted provider showed that 81% of users in their 65+ demographic also appeared in the 18-24 demographic. Billions of dollars are transacted annually on these assumptions—hence the menopause ads.
The need for clean, high-fidelity signal has never been stronger.
The signal advantage
Industry veterans are acknowledging this need. Sarah Friar, chief financial officer of OpenAI, recently advised leaders to focus on securing unique proprietary datasets for a durable competitive advantage in the AI era. The CFO of the world’s top AI companies is saying that signal is the difference-maker, and she’s not alone. Tech titan Reid Hoffman agrees, saying AI is amplification technology for data.
So, which data sources should companies rely on to achieve positive outcomes? Start with what you own and what your customers tell you directly. First-party data comes from your own interactions with customers and includes purchases, clicks, and sign-ups. Zero-party data can be even cleaner. It’s what customers volunteer about themselves, their preferences, and their needs.
These are the data sources that represent valuable signals today. They offer direct relationships with consumers and built-in feedback loops that continuously refine understanding. The companies that can control these sources will win in the AI era.
Check your signal
The common thread across all of these examples is signal quality. AI does not magically fix bad inputs. It amplifies them. No matter your field, check your signal.
If you draft a motion based on fabricated cases, you might as well be J.K. Rowling playing a lawyer. If weak data powers your advertising AI, you may find yourself prescribing estrogen replacement to healthy men in their 30s. AI is the greatest technological advancement in human history. But it must be powered properly.
Back to the duck blind. It’s 9 a.m., and time to pack up. Two mallards fly in, taking a look at our decoy spread. I blow a few quacks through the duck call hanging from the lanyard around my neck. Off they fly, not fooled by the attempt. I need to work on my signal.
This article originally appeared in Fast Company.



