Review AI output in accounting: catch errors | The AI Accountant

The most valuable thing you'll do with AI this week is tell it it's wrong

AI gets things wrong. You've heard it. You've said it. Every practitioner who's spent 20 minutes with a language model has a story — the confident-sounding analysis that wouldn't survive a client meeting, the tax position built on a provision that doesn't apply.

Here's the part nobody's connecting. That moment — where you look at AI output and know it's wrong — is the most valuable moment in your entire AI workflow. Not because it means AI isn't ready. Because it means you are.

Your complaints are your competitive advantage

OpenAI's GDPval benchmark — the most rigorous measurement of AI against actual knowledge work — showed frontier models matching or beating professionals with 14 years of average experience on 70 to 83 percent of head-to-head comparisons. Eleven times faster. Less than one percent of the cost.

Most people read that as an AI capability story. The more interesting reading: if AI matches your best people's output 70 to 83 percent of the time, then the 17 to 30 percent where it gets it wrong is where organizations win or lose.

Sequoia Capital's framework argues — and I think they're right — that most of what accountants call professional judgment is actually intelligence work. Internalized rules. The genuine judgment residual is real but thin. Maybe 10 to 15 percent of your actual week.

Connect those two data points. The territory where AI gets it wrong overlaps almost perfectly with the territory where your genuine judgment operates. Every time you catch an error in AI output, you're not doing quality control. You're exercising the exact skill that defines the defensible layer of professional services.

The hallucination complaint isn't a problem with AI. It's a map of where your judgment lives.

Rejection is judgment made visible

Earlier this week I wrote about a cross-border tax research project where the AI and I caught a material legal error — not because I asked better questions, but because the adversarial back-and-forth forced both sides to test every assumption. I was building on one legal basis. The AI flagged that the statutory text didn't support my characterization. I pushed back. It held its ground. I re-examined a position I'd have defended instinctively — and realized the entire argument needed a different foundation.

Neither of us would have caught it alone. That's rejection in action. A collision between domain expertise and computational analysis where the inconsistency becomes undeniable.

But that moment — the articulated constraint, the newly explicit knowledge — evaporated when the conversation ended. Next time a similar question comes up, someone rebuilds the reasoning from scratch.

Every correction you make contains professional knowledge you've never articulated as a formal rule. The client whose Q4 spike is seasonal, not a red flag. The materiality threshold that shifts based on industry context the model doesn't have. Until now, that knowledge lived inside your head and came out through your work. AI is giving you a reason to make it explicit.

The skill nobody's developing

The profession trains you for knowledge retention and rule application — CPA exams, tax code, GAAP, firm SOPs. Nobody trains you for rejection.

Rejection has three dimensions. Recognition — spotting that something's wrong, which depends on domain experience and can't be shortcut. Articulation — explaining why it's wrong precisely enough that a system could act on it. "This isn't right" is a rejection. "This isn't right because DSCR monitoring has completely different triggers than minimum net worth requirements" is a constraint. Encoding — making that constraint persist so your firm's AI output improves without requiring your attention next time.

Encoding is where everything breaks down. The constraint lives in a chat window nobody will search. Next quarter, a different team member makes the same mistake because the knowledge was never captured. Your corrections are context engineering in raw form — but only if you capture them.

The pipeline problem this creates

AI is eliminating the production work that used to train junior accountants. Bank recs, transaction coding, workpaper prep — those weren't just tasks. They were the apprenticeship that built recognition. You learned what "wrong" looks like by producing wrong output for years and getting corrected by someone with the judgment to explain why.

Entry-level tech hiring has collapsed by roughly 67 percent since ChatGPT's release. The pattern is showing up in accounting. If juniors aren't doing the repetitive work that builds recognition, they won't develop the rejection skill that makes AI output trustworthy. The people verifying AI output in 2028 won't have the judgment to do it — because they weren't trained on the work that builds it.

What to do this week

Don't stop fixing AI output. Start noticing what you're fixing and why. Every correction contains a constraint — professional knowledge that compounds if you capture it and evaporates if you don't.

When you push back on AI output, write down the reason. Not for the AI — for your firm. The practitioners who do this deliberately will compound their expertise in ways the profession has never been able to do before. The ones who silently fix and move on will rebuild the same knowledge from scratch every session.

Your biggest complaint about AI is actually where your professional value lives. The question is whether you're going to develop the skill that captures it — or keep letting it disappear.

If you want to understand where your practice stands right now — what gaps you have in your rejection discipline, where your team's judgment is developed, and where the risk lies — take the free AI Readiness Scorecard at theaiaccountant.ai/scorecard. Twenty-five questions, five minutes, and you'll have a clear picture of what you need to develop. It's designed for firms like yours, and the diagnostic is honest.

Tomorrow I'm going to challenge something the profession holds sacred — the idea that "professional judgment" protects you from AI. It does. But the moat is much narrower than anyone on LinkedIn wants to admit.