AI Is Not The Problem | Forensic Investigation Services

AI Is Not The Problem

When an admitted attorney files a legal brief in court that cites six court cases that did not exist, you know you are in for an exciting ride.

In June 2023, a federal court in New York faced exactly that situation. The brief listed detailed case names, airlines, courts, and docket numbers. Everything looked legitimate and carefully sourced. It was not.

ChatGPT had generated every citation, and Steven Schwartz of Levidow, Levidow and Oberman P.C. submitted the brief to a federal judge without checking any of them.

When Avianca’s attorneys flagged the problem, Schwartz went back to ChatGPT and asked it directly whether the cases were real. The chatbot assured him they were. He accepted that too.

Judge P. Kevin Castel of the Southern District of New York was not sympathetic. He described the submissions as containing bogus judicial decisions with bogus quotes and bogus internal citations. Schwartz, his colleague Peter LoDuca, and their firm were sanctioned $5,000. In a separate ruling issued the same day, the underlying personal injury lawsuit was dismissed on statute of limitations grounds.

At the sanctions hearing, Schwartz told the court he had been operating under a false assumption: that ChatGPT could not possibly fabricate cases on its own. He had never thought to check.

That is not an AI failure. It is a human failure. And it is happening every day, in offices across every continent, at every level of seniority, in every industry that has decided AI is a shortcut to thinking rather than a tool for it.

The AI did exactly what it was asked. No one checked the answer.

THE ADOPTION SURGE NOBODY IS MANAGING PROPERLY

According to McKinsey’s 2024 Global Survey on AI, 65% of organisations worldwide are now regularly using generative AI, nearly double the figure from just ten months earlier. Three-quarters of those surveyed believe AI will cause significant or disruptive change in their industries within a few years.

However, McKinsey’s 2025 follow-up survey found that only 27% of organisations review all AI-generated content before it is used. That means roughly 73% of organisations are allowing AI output to move through their operations, reach their clients, and inform their decisions without anyone checking whether it is accurate.

The same 2025 survey found that 47% of organisations had already experienced at least one negative consequence from AI use. The most commonly cited problem? Inaccuracy.

Not cybersecurity. Not intellectual property. Inaccuracy.

The thing that happens when you feed a prompt into a capable tool and treat whatever comes back as finished work.

73% of organisations are sending AI output into the world without reviewing all of it first. That is not a technology failure. It is a governance failure.

The pace of adoption and the absence of oversight are not a coincidence. Organisations are moving fast because the competitive pressure is real. But speed without judgment is a liability. The evidence for that is already documented in courtrooms, regulatory decisions, and boardroom post-mortems around the world.

A PATTERN, NOT AN ANOMALY

The Schwartz case became famous because it ended in federal court. But the behaviour that produced it is not unusual. It is the default.

Ask yourself honestly: when did you last watch a colleague submit an AI-generated report, proposal, or client email without any visible verification? When did you last do it yourself?

AI language models do not know what they do not know. They generate fluent, confident, well-formatted text based on patterns in their training data. They cannot distinguish between what is accurate and what merely sounds accurate. They will produce a believable statistic from a study that was never published. They will name a competitor that was acquired 18 months ago as an active market player. They will cite a legal precedent, a regulation, or a policy position with complete confidence and complete incorrectness. And they will do all of it in language so clean and authoritative that most readers will not stop to question it.

THE AIR CANADA CASE

In February 2024, a British Columbia tribunal heard a case that captured this problem in precise, human terms. Jake Moffatt needed to fly from Vancouver to Toronto following the death of his grandmother. Before booking, he consulted Air Canada’s chatbot about bereavement fares. The chatbot said he could apply for a discounted fare within 90 days of the ticket being issued, including after travel. Moffatt booked at full price and submitted the application afterwards, exactly as the chatbot had instructed.

Air Canada refused the refund. Their actual policy required the application to be made before travel. The chatbot had it wrong.

What happened next is where this case earns its place in the record books. Air Canada’s legal response was to argue that their chatbot was a separate legal entity, and therefore the airline bore no responsibility for what it said. Tribunal member Christopher Rivers rejected this without hesitation. He found that Air Canada had not taken reasonable care to ensure its chatbot was accurate, and that there was no reason why a customer should have to double-check information found in one section of the airline’s own website against another. Air Canada was ordered to pay Moffatt a total of $812.02 in damages, interest, and tribunal fees.

The moment AI-generated content leaves your organisation, whether through a customer service bot, a client proposal, a strategic briefing, or a board presentation, it carries your name and your liability. Not the AI model’s.

THE PROMPT YOU WROTE MAY BE THE PROBLEM

Garbage in, garbage out. This principle predates artificial intelligence by decades. It has always really been about human behaviour: people who want quick answers give rushed inputs, and then treat the output as authoritative regardless of the quality of the question they asked. AI has not changed this. It has amplified it, because AI output looks more finished than almost any previous technology.

A half-built spreadsheet model looks unfinished. An AI-generated executive summary looks polished and ready even when it is built on a four-word prompt and zero verification. That surface credibility is the trap.

Here is what a lazy prompt looks like: “Write a competitor analysis for our board.”

Here is what a serious prompt looks like: “You are a senior strategy analyst with expertise in South African financial services. Using only the context I am about to provide, write a 250-word competitive analysis for a board audience. Focus on pricing strategy, market share movements, and the three most significant competitive threats identified in the past 12 months. Use formal language. Flag any area where you lack sufficient context to draw a reliable conclusion.”

Those are not two versions of the same instruction. They are two entirely different conversations with two entirely different outcomes. The person who submits the first and then complains that AI produced generic, inaccurate work is not identifying a flaw in the technology. They are describing the result of asking a vague question and expecting a precise answer.

A sloppy prompt is not a prompt. It is a coin toss. And the people flipping that coin every day are consistently surprised by the result.

THE AMAZON AI BIAS

Amazon learned this in a way that cost them years of work and significant reputational damage. The company began building an AI recruitment tool in 2014, training it on a decade of historical resumes. The goal was to automate the identification of top candidates. By 2015, Amazon’s own engineers had already discovered a serious problem: the tool was systematically downgrading applications from women.

The AI was not making a decision to discriminate. It was learning from what it was given. A decade of resumes from a male-dominated technology industry had taught the model to replicate the bias already embedded in that data. It penalised resumes containing the word “women’s”, as in “women’s chess club captain”, and downgraded graduates from all-women’s colleges. Amazon tried to correct it. Eventually, the company lost confidence that the tool could ever be reliably neutral and disbanded the team by early 2017.

The AI did not add bias. It reflected the quality of the data behind it and the absence of adequate oversight.

YOUR NAME IS ON THE DOCUMENT. NOT THE AI’S.

Every case described in this article ends the same way.

A real person or organisation faces real consequences. A federal judge sanctions two attorneys and their law firm for submitting fabricated case citations they never thought to verify.

A tribunal orders an airline to pay damages after its chatbot gave a grieving customer incorrect information and the company tried to blame the bot.

A technology giant disbands a recruitment project after its AI spent years filtering out qualified candidates based on gender, trained to do so by the data its own engineers fed it.

In every case, the AI tool performed exactly as it was designed to perform (within the context of the disclaimer in the terms of service of every AI platform)

Here is the disclaimer provided by ChatGPT in its Terms of Service:

The output may contain mistakes or inaccurate information.
The output should not be treated as professional advice.
Users should evaluate the accuracy of the content before relying on it.

In every case, humans made choices, about what to put in, what to leave out, what to check, and what to ignore, that produced the damaging result.

No competent executive would sign off on a legal submission without confirming the cited cases exist.

No serious leader would accept a strategic analysis built on unverified data from an unidentified source.

No responsible board would approve a recruitment process that had not been audited for bias.

Yet all of this is happening, in organisations that have replaced those standards with a prompt and an assumption.

When the report is wrong and it carries your company’s name, the conversation will not be about which AI model was at fault.

McKinsey’s 2024 survey data makes the structural problem explicit. Only 18% of organisations have an enterprise-wide council or board with actual authority over responsible AI governance decisions. Fewer than one in five. That means four out of every five organisations have deployed AI at scale without anyone in authority specifically accountable for the quality and accuracy of what it produces.

That is not a technology gap. It is a leadership gap.

The culture of AI use inside your organisation is not set by the tool you choose. It is set by what you personally accept. If you wave through AI output without checking it, your team reads that as permission. If you submit unverified AI content to clients, your team concludes that verification is optional. If you have never asked a direct question about how AI-generated work is reviewed before it leaves the building, your team is already answering that question for you.

THE INVESTIGATION LANDS HERE

The AI is not broken. The standard is.

Sixty-five percent of global organisations are now regularly using generative AI. The tools are fast, capable, and improving. Used well, with clear inputs, iterative refinement, and rigorous verification, they represent a genuine competitive asset. They can process, synthesise, and present information at a speed no human team can match.

But the evidence, from federal courtrooms, from tribunal decisions, from McKinsey’s own survey data, points to a consistent and urgent problem. The majority of organisations are not using these tools well. They are using them fast. They are treating output as finished work. They are skipping verification. They are deploying AI in customer-facing and decision-critical roles without adequate oversight. And when the inevitable problems surface, they play the blame game.

Steven Schwartz told the court he was unaware that ChatGPT could fabricate information.

Air Canada argued its chatbot was a separate legal entity.

Amazon’s engineers spent a year trying to correct a bias problem in a tool they had already deployed, then gave up and shut the whole project down.

In each case, the problem was not what the AI did. It was what the humans around it failed to do.

Your organisation is not immune to this. The question is not whether you are using AI. The question is whether you are using it with the same professional accountability you apply to every other tool your name gets attached to.

Verified Sources

Mata v. Avianca, Inc., 678 F.Supp.3d 443 (S.D.N.Y. 2023). Judge P. Kevin Castel.

Sanctions ruling: June 22, 2023. Dismissal ruling: June 22, 2023 (separate order, statute of limitations, Montreal Convention).

Moffatt v. Air Canada, 2024 BCCRT 149. Tribunal member Christopher C. Rivers.

British Columbia Civil Resolution Tribunal. February 14, 2024.

Total damages awarded: $812.02 CAD (damages $650.88 + pre-judgment interest $36.14 + CRT fees $125.00).

Amazon AI recruitment tool. Reuters investigation by Jeffrey Dastin, October 10, 2018.

Tool built from 2014. Bias identified by 2015. Project disbanded by early 2017.

Source: ‘Amazon scraps secret AI recruiting tool that showed bias against women,’ Reuters, 2018.

McKinsey Global Survey on AI, early 2024 (surveyed February 22 to March 5, 2024, n=1,363).

65% of organisations regularly using gen AI. 18% have enterprise-wide AI governance authority.

Source: ‘The state of AI in early 2024: Gen AI adoption spikes and starts to generate value,’ McKinsey.

McKinsey Global Survey on AI, 2025.

27% of organisations review all gen AI content before use. 47% have experienced negative consequences from AI use.

Source: ‘The state of AI: How organizations are rewiring to capture value,’ McKinsey, 2025.