TECH & OTHER NEWS

Researchers find language models do a poor job of following basic instructions

October 27, 2020

A new paper published by researchers affiliated with Facebook and Tel-Aviv University investigates whether machine learning language models can understand basic sets of instructions. The researchers propose a test dubbed the Turking Test to examine a model’s ability to follow natural language instructions. Despite what the researchers characterize as a lenient evaluation methodology, they observed that a pretrained language model performed poorly across all tasks.

One of the fundamental problems in AI is building a model that can generalize to previously unseen tasks. Recent work proposes a few-shot inference approach, in which a language model is conditioned on a few examples of a new task, followed by input for the model to process. This approach works well on a range of tasks, but the coauthors of this paper sought to determine whether language models could perform new tasks by conditioning them on instructions.

The Turking Test consists of instruction-following benchmarks of varying syntactic complexity, beginning with “turking” tasks, where a model must create valid examples of popular natural language processing datasets. (This is meant to simulate tasks commonly carried out by laypeople on crowdsourcing platforms like Amazon Mechanical Turk.) Another portion of the test tasks the model with listing all the nouns that satisfy a simple condition in a given sentence. To pass the Turking Test, the model must also write the Nth word or character in a given sentence.

The researchers applied the Turking Task to OpenAI’s GPT-2, a model with 1.5 billion parameters (variables internal to the model that shape its predictions). Overall, the results were disappointing. GPT-2 achieved only 2% accuracy on the task of writing the Nth word, something the authors note an elementary school student can easily do. The model also ignored explicit restrictions and conditions that appear in the instructions, achieving only slightly higher accuracy on open-ended tasks than on those with specific answers.

Turking Test

“Analyzing the model’s error patterns reveals that the model tends to ignore explicit instructions and often generates outputs that cannot be construed as an attempt to solve the task,” the researchers wrote. “The fact that such a large percentage of outputs is comprised of senseless repetitions indicates that the model fails to understand these trivial instructions. Even though these tasks are similar and have almost identical instructions, we find that their repetition patterns significantly differ, suggesting the model is hyper-sensitive to small changes in the instructions.”

Language models have much to learn if they’re going to converse like thoughtful humans one day. Beyond an apparent inability to follow instructions, they are also vulnerable to bias and struggle to grasp general knowledge. Research suggests that benchmarks such as XTREME don’t measure models’ knowledge well and that models like T-ULRv2 can exhibit toxicity and prejudice against specific demographic groups.

Bridging the gaps will likely require new techniques and approaches. Sam Altman is CEO of OpenAI, the firm behind GPT-2 and GPT-3 (its successor). Responding to public reactions to GPT-3, Altman recently said the “hype is way too much. It’s impressive, but it still has serious weaknesses and sometimes makes very silly mistakes. AI is going to change the world, but [cutting-edge language models] are just a very early glimpse. We have a lot still to figure out.”

The audio problem: Learn how new cloud-based API solutions are solving imperfect, frustrating audio in video conferences. Access here

By VentureBeat Source Link

Researchers find language models do a poor job of following basic instructions

LEAVE A REPLY Cancel reply

TECH NEWS

RISC-V adoption will be accelerated by AI, according to new Omdia...

Gartner Identifies the Top Five Strategic Technology Trends in Software Engineering...

Honda and IBM Sign Memorandum of Understanding

India’s AI spending to triple report

Vacation Rental Sites Like Airbnb: Exploring Alternative Platforms for Property Listings

OpenAI GPT-4o With Real-Time Responses and Video Interaction Announced, GPT-4 Features...

TOP STORIES

Forrester: To Achieve Sustainable Growth, B2B Firms Must Center Their Revenue...

Gartner Identifies Top Four HR Investment Trends for 2024

Gartner CMO Survey Reveals Marketing Budgets Have Dropped to 7.7% of...

Women Leaders in Tech Outpace Men Counterparts in Generative AI Adoption

Asset Managers Need to Set a Strategy to Leverage the AI...

New research highlights diverse bundling strategies used by major video streaming...

Cyber Security

Safeguarding the Frontline of Healthcare: How to Defend Against Aggressive New...

Tech Support Scams: Understanding and Protecting Against Digital Deception

Longest cyberattacks: experts highlight trusted relationships as key vector

Cybercriminals double exploitation of Linux vulnerabilities

Zscaler Research Observed Over 79 Million Phishing Attempts In India, Ranking...

New Verification Schemes Target Users of Online Dating Platforms