TECH & OTHER NEWS

Google says its Parallel Tacotron model generates synthetic voices 13 times faster than its predecessor

October 30, 2020

In December 2016, Google released Tacotron 2, a machine learning text-to-speech (TTS) system that generates natural-sounding speech from raw transcripts. It’s used in user-facing services like Google Assistant to create voices that sound humanlike, but it’s relatively compute-intensive. In a new paper, researchers at the search giant claim to have addressed this limitation with what they call Parallel Tacotron, a model that’s highly parallelized during training and inference to enable efficient voice generation on less-powerful hardware.

Text-to-speech synthesis is what’s known as a one-to-many mapping problem. Given any snippet of text, multiple voices with different prosodies (intonation, tone, stress, and rhythm) could be generated. Even sophisticated models like Tacotron 2 are prone to errors like babble, cut-off speech, and repeating or skipping words as a result, but one way to address this is to augment models by incorporating representations that capture latent speech factors. These representations can be extracted by an encoder that takes ground-truth spectrograms (a visual representation of speech frequencies over time) as its input; this is the approach Parallel Tacotron takes.

In experiments, to train Parallel Tacotron, the researchers say they used a dataset containing 405 hours of speech including 347,872 utterances from 45 speakers (32 U.S. English speakers, eight British English, and five Australian English speakers) in 3 English accents for a total of 32 U.S. English speakers. Training took a day using Google Cloud TPUs, application-specific integrated circuits developed specifically to accelerate AI.

The researchers had human reviewers look at 1,000 sentences in order to evaluate Parallel Tacotron’s performance, which were synthesized using 10 U.S. English speakers (5 male & 5 female) in a round-robin style (100 sentences per speaker). While there’s room for improvement, the results suggest that Parallel Tacotron “did well” compared with human speech. Moreover, Parallel Tacotron was about 13 times faster than Tacotron 2.

“A number of models have been proposed to synthesize various aspects of speech (e.g., speaking styles) in a natural sounding way,” the researchers wrote. “Parallel Tacotron matched the baseline Tacotron 2 in naturalness and offered significantly faster inference than Tacotron 2.”

The release of Parallel Tacotron, which is available on GitHub, comes after Microsoft and Facebook detailed speedy text-to-speech techniques of their own. Microsoft’s FastSpeech features a unique architecture that not only improves performance in a number of areas but eliminates errors like word skipping and affords fine-grained adjustment of speed and word break. As for Facebook’s system, it leverages a language model for curation to create a voices 160 times faster compared with a baseline.

How startups are scaling communication: The pandemic is making startups take a close look at ramping up their communication solutions. Learn how

By VentureBeat Source Link

LEAVE A REPLY Cancel reply

TECH NEWS

RISC-V adoption will be accelerated by AI, according to new Omdia...

Gartner Identifies the Top Five Strategic Technology Trends in Software Engineering...

Honda and IBM Sign Memorandum of Understanding

India’s AI spending to triple report

Vacation Rental Sites Like Airbnb: Exploring Alternative Platforms for Property Listings

OpenAI GPT-4o With Real-Time Responses and Video Interaction Announced, GPT-4 Features...

TOP STORIES

Forrester: To Achieve Sustainable Growth, B2B Firms Must Center Their Revenue...

Gartner Identifies Top Four HR Investment Trends for 2024

Gartner CMO Survey Reveals Marketing Budgets Have Dropped to 7.7% of...

Women Leaders in Tech Outpace Men Counterparts in Generative AI Adoption

Asset Managers Need to Set a Strategy to Leverage the AI...

New research highlights diverse bundling strategies used by major video streaming...

Cyber Security

Safeguarding the Frontline of Healthcare: How to Defend Against Aggressive New...

Tech Support Scams: Understanding and Protecting Against Digital Deception

Longest cyberattacks: experts highlight trusted relationships as key vector

Cybercriminals double exploitation of Linux vulnerabilities

Zscaler Research Observed Over 79 Million Phishing Attempts In India, Ranking...

New Verification Schemes Target Users of Online Dating Platforms