Google’s updated Voice Access leverages AI to detect in-app icons

January 29, 2021

Google today launched an updated version of Voice Access, its service that enables users to control Android devices using voice commands. It leverages a machine learning model to automatically detect icons on the screen based on UI screenshots, enabling it to determine whether elements like images and icons have accessibility labels, or labels provided to Android’s accessibility services.

Accessibility labels allow Android’s accessibility services to refer to exactly one on-screen element at a time, letting users know when they’ve cycled through the UI. Unfortunately, some elements lack labels, a challenge the new version of Voice Access aims to address.

A vision-based object detection model called IconNet in the new Voice Access (version 5.0) can detect 31 different icon types, soon to be extended to more than 70 types. As Google explains in a blog post, IconNet is based on the novel CenterNet architecture, which extracts app icons from input images and then predicts their locations and sizes. Using Voice Access, users can refer to icons detected by IconNet by their names, e.g., “Tap ‘menu.”

To train IconNet, Google engineers collected and labeled more than 700,000 app screenshots, streamlining the process by using heuristics, auxiliary models, and data augmentation techniques to identify rarer icons and enrich existing screenshots with infrequent icons. “IconNet is optimized to run on-device for mobile environments, with a compact size and fast inference time to enable a seamless user experience,” Google Research software engineers Gilles Baechler and Srinivas Sunkara wrote in a blog post.

Google says that in the future, it plans to expand the range of elements supported by IconNet to generic images, text, and buttons. It also plan to extend IconNet to differentiate between similar-looking icons by identifying their functionality. Meanwhile, on the developer side, Google hopes to increase the number of apps with valid content descriptions by improving tools to suggest content descriptions for different elements when building applications.

Above: IconNet analyzes the pixels of the screen and identifies the centers of icons by generating heatmaps, which provide precise information about the position and type of the different types of icons present on the screen.

“A significant challenge in the development of an on-device UI element detector for Voice Access is that it must be able to run on a wide variety of phones with a range of performance capabilities, while preserving the user’s privacy,” wrote Google Research software engineers Gilles Baechler and Srinivas Sunkara in a blog post. “We are constantly working on improving IconNet.”

Voice Access, which launched in beta in 2016, dovetails with Google’s other mobile accessibility efforts. The company is continuing to develop Lookout, an accessibility-focused app that can identify packaged foods using computer vision, scan documents to make it easier to review letters and mail, and more. There’s also Project Euphonia, which aims to help people with speech impairments communicate more easily; Live Relay, which uses on-device speech recognition and text-to-speech to let phones listen and speak on a person’s behalf; and Project Diva, which helps people give the Google Assistant commands without using their voice.

Become a member

By VentureBeat Source Link

LEAVE A REPLY Cancel reply

TECH NEWS

Intel generates highest sport technology sponsorship spend in APAC in 2024,...

Intel Takes Next Step Toward Building Scalable Silicon-Based Quantum Processors

Gartner Predicts the Global Legal Technology Market Will Reach $50 Billion...

Gartner Identifies the Top Trends in Data and Analytics for 2024

IDC: Generative AI Spending to Reach $26 Billion by 2027

Positive momentum for Google Workspace continues, finds GlobalData

TOP STORIES

Asset Managers Need to Set a Strategy to Leverage the AI...

New research highlights diverse bundling strategies used by major video streaming...

70% Are Excited about GenAI in the Workplace

Nine solutions for Cities to Cut Carbon Emissions in Construction

Large European and US organizations are planning to invest $3.4 trillion...

Global telcos lead the way in digital inclusion, finds GlobalData

Cyber Security

Cybercriminals double exploitation of Linux vulnerabilities

Zscaler Research Observed Over 79 Million Phishing Attempts In India, Ranking...

New Verification Schemes Target Users of Online Dating Platforms

How threat intelligence can improve vulnerability management outcomes

Cisco Study Reveals Very Few Organizations Prepared to Defend Against Today’s...

Bots Now Make Up Nearly Half of All Internet Traffic Globally