Study finds that even the best speech recognition systems exhibit bias

April 1, 2021

Join Transform 2021 for the most important themes in enterprise AI & Data. Learn more.

Even state-of-the-art automatic speech recognition (ASR) algorithms struggle to recognize the accents of people from certain regions of the world. That’s the top-line finding of a new study published by researchers at the University of Amsterdam, the Netherlands Cancer Institute, and the Delft University of Technology, which found that an ASR system for the Dutch language recognized speakers of specific age groups, genders, and countries of origin better than others.

Speech recognition has come a long way since IBM’s Shoebox machine and Worlds of Wonder’s Julie doll. But despite progress made possible by AI, voice recognition systems today are at best imperfect — and at worst discriminatory. In a study commissioned by the Washington Post, popular smart speakers made by Google and Amazon were 30% less likely to understand non-American accents than those of native-born users. More recently, the Algorithmic Justice League’s Voice Erasure project found that that speech recognition systems from Apple, Amazon, Google, IBM, and Microsoft collectively achieve word error rates of 35% for African American voices versus 19% for white voices.

The coauthors of this latest research set out to investigate how well an ASR system for Dutch recognizes speech from different groups of speakers. In a series of experiments, they observed whether the ASR system could contend with diversity in speech along the dimensions of gender, age, and accent.

The researchers began by having an ASR system ingest sample data from CGN, an annotated corpus used to train AI language models to recognize the Dutch language. CGN contains recordings spoken by people ranging in age from 18 to 65 years old from Netherlands and the Flanders region of Belgium, covering speaking styles including broadcast news and telephone conversations.

CGN has a whopping 483 hours of speech spoken by 1,185 women and 1,678 men. But to make the system even more robust, the coauthors applied data augmentation techniques to increase the total hours of training data “ninefold.”

When the researchers ran the trained ASR system through a test set derived from the CGN, they found that it recognized female speech more reliably than male speech regardless of speaking style. Moreover, the system struggled to recognize speech from older people compared with younger, potentially because the former group wasn’t well-articulated. And it had an easier time detecting speech from native speakers versus non-native speakers. Indeed, the worst-recognized native speech — that of Dutch children — had a word error rate around 20% better than that of the best non-native age group.

In general, the results suggest that teenagers’ speech was most accurately interpreted by the system, followed by seniors’ (over the age of 65) and children’s. This held even for non-native speakers who were highly proficient in Dutch vocabulary and grammar.

As the researchers point out, while it’s to an extent impossible to remove the bias that creeps into datasets, one solution is mitigating this bias at the algorithmic level.

“[We recommend] framing the problem, developing the team composition and the implementation process from a point of anticipating, proactively spotting, and developing mitigation strategies for affective prejudice [to address bias in ASR systems],” the researchers wrote in a paper detailing their work. “A direct bias mitigation strategy concerns diversifying and aiming for a balanced representation in the dataset. An indirect bias mitigation strategy deals with diverse team composition: the variety in age, regions, gender, and more provides additional lenses of spotting potential bias in design. Together, they can help ensure a more inclusive developmental environment for ASR.”

Become a member

By VentureBeat Source Link

LEAVE A REPLY Cancel reply

Top Stories

AI Is Transforming Jobs Faster Than Organizations Can Transform Work

Global Economic Outlook Hangs in Balance between Geopolitical Headwinds and AI boost, Chief Economists...

Low Job Confidence, Uneven AI Adoption and Rising Anxiety is Reshaping Employee Sentiment in...

By 2027, 40% of Enterprises Will Demote or Decommission Autonomous AI Agents Due to...

Sixty-One Percent of CEOs Say Their Boards Are Rushing AI Transformation

Cyber Security

Threat Actors Spoofing FIFA Websites in Advance of the 2026 World Cup

Kaspersky reports 17% of major Mexico cities open Wi-fi spots unsecure

Despite robust security measures, credential abuse techniques remain the most effective attack method

Phishing-as-a-Service (PhaaS): Understanding the Growing Cyber Threat and How to Stay Safe

Kaspersky: Nearly 5 in 10 victims of tech-enabled abuse know their perpetrator