Whisper’s Hallucinations Messing Up AI Medical Transcription Records

Whisper, an AI tool in hospitals has been “hallucinating” and producing incorrect information when producing AI medical transcriptions. 

Whisper, an AI tool developed by OpenAI used in hospitals for medical transcriptions is criticized for occasionally “hallucinating” and producing incorrect information when producing AI medical transcriptions. 

The tool has been widely adopted by Nabla, a French health tech company, which has reportedly used it to process over seven million conversations using AI for medical transcription, according to ABC News. 

Researchers caution that Whisper’s tendency to generate inaccurate or even violent content makes its accuracy far from perfect, particularly when used for critical tasks like producing an AI generated doctors note. 

Whisper’s Offensive Words 

In the study, “Careless Whisper: Speech-to-Text Hallucination Harms”, a team of researchers from many universities, including the University of Washington and Cornell University discovered that hallucinations in medical transcription AI software such as Whisper, recorded at a rate of 1% of the recordings.  

Among the mistakes found in AI medical transcription were invented phrases and sentences that might be deceptive in the medical field.  

In some cases, the AI transcription tool generated sentences during moments of silence where there was an introduction of offensive or unnecessary words that are not associated with the main conversation.  

Such incidents particularly occurred in recordings involving individuals who have aphasia, a language impairment that is frequently marked by prolonged pauses, increasing the likelihood of mistakes in these recordings. 

To further illustrate the issue with AI medical transcription, Dr. Allison Knoenecke, a member of the team of researchers from Cornell University posted some examples on her account on Thread, showcasing the way Whisper sometimes includes sentences like “Thank you for watching!”, which are out of context.  

Solutions on the Way 

Researchers suggest that Whisper’s mistakes may come from its training on vast transcription data – this also includes YouTube videos – contributing to the generation of irrelevant or irregular content. 

In response to these critics, Nabla, which uses AI medical scribe admits that the AI transcription tool has hallucination issues and that it is working on fixing this problem. 

As for OpenAI, Taya Christianson, the company’s spokesperson told The Verge in a statement that it is aware of these concerns and is working on addressing them. 

“We take this issue seriously and are continually working to improve, including reducing hallucinations. For Whisper use on our API platform, our usage policies prohibit use in certain high-stakes decision-making contexts, and our model card for open-source use includes recommendations against use in high-risk domains. We thank researchers for sharing their findings,” Christianson said. 

The findings have ignited discussions around excessive AI reliance risks in healthcare where accuracy is a pillar. Even though Whisper’s success has risen tremendously when it comes to quick speech transcribes, its tendency to generate “hallucinations” however, is a clear testament of its reliability in medical applications. 

The researchers presented the findings on AI medical transcription in June at the Association for Computing Machinery FAccT conference that took place in Brazil, though it isn’t clear whether the study has been peer reviewed. 


Inside Telecom provides you with an extensive list of content covering all aspects of the tech industry. Keep an eye on our Intelligent Tech sections to stay informed and up-to-date with our daily articles.