For language models, analogies are a tough nut to crack, study shows

May 14, 2021

Join Transform 2021 this July 12-16. Register fo r the AI event of the year.

Analogies play a crucial role in commonsense reasoning. The ability to recognize analogies like “eye is to seeing what ear is to hearing,” sometimes referred to as analogical proportions, shape how humans structure knowledge and understand language. In a new study that looks at whether AI models can understand analogies, researchers at Cardiff University used benchmarks from education as well as more common datasets. They found that while off-the-shelf models can identify some analogies, they sometimes struggle with complex relationships, raising questions about to what extent models capture knowledge.

Large language models learn to write humanlike text by internalizing billions of examples from the public web. Drawing on sources like ebooks, Wikipedia, and social media platforms like Reddit, they make inferences to complete sentences and even whole paragraphs. But studies demonstrate the pitfall of this training approach. Even sophisticated language models such as OpenAI’s GPT-3 struggle with nuanced topics like morality, history, and law and often memorize answers found in the data on which they’re trained.

Memorization isn’t the only challenge large language models struggle with. Recent research shows that even state-of-the-art models struggle to answer the bulk of math problems correctly. For example, a paper published by researchers at the University of California, Berkeley finds that large language models including GPT-3 can only complete 2.9% to 6.9% of problems from a dataset of over 12,500.

Analogy dataset

The Cardiff University researchers used a test dataset from an educational resource that included analogy problems from assessments of linguistic and cognitive abilities. One subset of problems was designed to be equivalent to analogy problems on the Scholastic Aptitude Test (SAT), the U.S. college admission test, while the other set was similar in difficulty to problems on the Graduate Record Examinations (GRE). In the interest of thoroughness, the coauthors combined the dataset with an analogy corpus from Google and BATS, which includes a larger number of concepts and relations split into four categories: lexicographic, encyclopedic, derivational morphology, and inflectional morphology.

The word analogy problems are designed to be challenging. Solving them requires identifying nuanced differences between word pairs that belong to the same relation.

In experiments, the researchers tested three language models based on the transformer architecture, including Google’s BERT, Facebook’s RoBERTa, and GPT-2, the predecessor of GPT-3. The results show that difficult analogy problems, which are generally more abstract or contain obscure words (e.g., grouch, cantankerous, palace, ornate), present a major barrier. While the models could understand analogies, not all of the models achieved “meaningful improvement.”

The researchers leave open the possibility that language models can learn to solve analogy tasks when given the appropriate training data, however. “[Our] findings suggest that while transformer-based language models learn relational knowledge to a meaningful extent, more work is needed to understand how such knowledge is encoded, and how it can be exploited,” the coauthors wrote. “[W]hen carefully tuned, some language models are able to achieve state-of-the-art results.”

VentureBeat

VentureBeat’s mission is to be a digital town square for technical decision-makers to gain knowledge about transformative technology and transact. Our site delivers essential information on data technologies and strategies to guide you as you lead your organizations. We invite you to become a member of our community, to access:

up-to-date information on the subjects of interest to you
our newsletters
gated thought-leader content and discounted access to our prized events, such as Transform 2021: Learn More
networking features, and more

Become a member

Source Link

LEAVE A REPLY Cancel reply

TECH NEWS

Intel generates highest sport technology sponsorship spend in APAC in 2024,...

Intel Takes Next Step Toward Building Scalable Silicon-Based Quantum Processors

Gartner Predicts the Global Legal Technology Market Will Reach $50 Billion...

Gartner Identifies the Top Trends in Data and Analytics for 2024

IDC: Generative AI Spending to Reach $26 Billion by 2027

Positive momentum for Google Workspace continues, finds GlobalData

TOP STORIES

New research highlights diverse bundling strategies used by major video streaming...

70% Are Excited about GenAI in the Workplace

Nine solutions for Cities to Cut Carbon Emissions in Construction

Large European and US organizations are planning to invest $3.4 trillion...

Global telcos lead the way in digital inclusion, finds GlobalData

Gartner Announces the Top Government Technology Trends for 2024

Cyber Security

Zscaler Research Observed Over 79 Million Phishing Attempts In India, Ranking...

New Verification Schemes Target Users of Online Dating Platforms

How threat intelligence can improve vulnerability management outcomes

Cisco Study Reveals Very Few Organizations Prepared to Defend Against Today’s...

Bots Now Make Up Nearly Half of All Internet Traffic Globally

McAfee’s 2024 Tax Scam Study Reveals a National Average of $8,199...