Anthropic just released the largest qualitative AI user study ever — 80,508 participants across 159 countries. The number one fear about AI reliability in accounting and across all industries isn't job loss at 22.2%. It's unreliability at 26.7%. More than a quarter of AI users are afraid the technology will be wrong and they won't catch it.
That fear is valid. But when it comes to AI reliability in accounting specifically, it's aimed at the wrong target. You've been managing these same failure modes in your human staff for decades.
You've been managing these failures your entire career
A bookkeeper codes every Home Depot transaction to repairs and maintenance because that's where they always go. No invoice? No flag. Except this time it was a $4,200 table saw the client capitalized on their own records. The pattern match worked 95% of the time. The 5% that should've triggered a question got swept along with the routine.
A senior reviews a workpaper, sees a number that looks right, and initials it without tracing it back. You find out three weeks later during the partner review.
A client emails "just put it in miscellaneous" and the bookkeeper does — even though the transaction clearly belongs somewhere specific. We tell our staff "don't let the client do your job." They nod. Then the next email arrives and the same thing happens.
An exception report flags a $47 reimbursement because the description says "penalty." Meanwhile, a $15,000 payment to a vendor with the same address as the client's spouse goes through clean.
Pattern matching that stops looking. Reasoning that doesn't follow through. External input that overrides professional judgment. Exception flags that catch the wrong things. None of these are AI stories. They're Tuesday. You've trained on these failures, built review layers around them, and accepted that managing them is part of running a practice.
The 80,000 people in that survey aren't afraid of something new. They're afraid of something every practice owner has managed for decades — in human staff. The question isn't whether AI makes these mistakes. It's whether you can fix them. And whether the fixes stick.
With AI, the corrections are permanent
The most common pushback I hear about AI in CAS work is "it doesn't get the coding right." True. AI miscodes transactions, misses edge cases, and takes shortcuts when the rules say to dig deeper. But that objection describes the humans you already employ. You don't refuse to hire bookkeepers because bookkeepers make mistakes. You hire them, train them, and build systems so the same mistake doesn't happen twice.
With a human, corrections are fragile. You explain why the coding was wrong. They understand. Two weeks later, the same error shows up on a different client. Training is slow, inconsistent, and walks out the door when the person leaves.
With AI, a correction can be permanent. "This client capitalizes anything over $2,500 — always check for an invoice on Home Depot transactions above that threshold." That rule applies to every transaction, for every relevant client, forever. One correction. Permanent improvement.
That's context engineering — encoding the edge-case rules experienced practitioners carry in their heads so that every correction compounds into institutional knowledge. If the system lets you see what went wrong.
Show the work — a call to every vendor in the industry
That "if" is doing a lot of work.
Right now, most AI embedded in accounting platforms doesn't show its reasoning. A transaction gets categorized. You see the result. If it's wrong, you correct it — but you don't know why it was wrong, which means you can't prevent the same error tomorrow. It's a black box. No different from the bookkeeper who says "I don't know, it just seemed right."
This is a call to every vendor building AI into accounting tools: show the work. The ability to see, correct, and encode decisions is what separates AI that gets smarter from AI that just gets faster. If we can't see how the model made a decision, we can't train it. If we can't train it, we're stuck managing the same failure modes — just at higher volume.
A medical AI recently identified respiratory failure in its own reasoning, then told the patient to schedule a routine appointment. It directed patients away from the ER 52% of the time in cases physicians unanimously classified as emergencies. Aggregate accuracy numbers hide the same pattern you see in your practice — errors concentrate exactly where they matter most.
The fix isn't refusing to use AI because it makes mistakes. It's demanding tools that let your corrections compound instead of evaporate.
Stop comparing AI to perfection
Nobody holds your staff to that standard. Compare AI to the actual humans doing the work — the same pattern-matching shortcuts, the same edge-case blindness, the same tendency to stop looking when the answer looks close enough.
Your best bookkeeper makes the same mistake twice because the correction didn't transfer. AI makes the same mistake once — because the correction becomes a permanent rule. Your senior can't explain why they signed off without checking. AI can show you exactly where the reasoning stopped. Your staff member takes a new job and their institutional knowledge walks out with them. AI's institutional knowledge lives in the system.
Those 80,000 people are right to care about reliability. But the comparison that matters isn't AI versus perfection. It's AI versus the humans you're already trusting with the work. The failure modes aren't new. The ability to fix them is.
Want deeper guidance? There's additional content for newsletter subscribers on our blog. We've put together five pointed questions you can ask any AI vendor or use to evaluate any AI workflow you're building internally. Each question maps to a specific failure mode from this article and includes a one-paragraph explanation of why it matters. Sign up at theaiaccountant.ai to get access.

