TECH & OTHER NEWS

Apple builds a slimmed-down AI model using Stanford, Google innovations

April 29, 2024

Jeenah Moon/Bloomberg via Getty Images

The world is watching to see what Apple will do to counter the dominance of Microsoft and Google in generative AI. Most assume the tech giant’s innovations will take the form of neural nets on the iPhone and other iOS devices. Small clues are popping up here and there.

Also: How Apple’s AI advances could make or break the iPhone 16

Apple just introduced its own “embedded” large language model (LLM) to run on mobile devices, OpenELM, essentially by mashing together the breakthroughs of several research institutions, including Google’s deep learning scholars and academics at Stanford and elsewhere.

All of the code for the OpenELM program is posted on GitHub, along with various documentation for the training approach.

Apple’s work, detailed in a paper by Sachin Mehta and team, “OpenELM: An Efficient Language Model Family with Open-source Training and Inference Framework”, posted on the arXiv pre-print server, is focused on mobile devices as the size of the neural net they use has just 1.3 billion neural weights, or, parameters.

That number is far below the hundreds of billions of parameters used by models such as OpenAI’s GPT-4 or Google’s Gemini. More parameters directly increase the computer memory required, so a smaller neural net could likely fit into a mobile device more easily.

Mehta and team’s mashup would be rather unremarkable without a key contribution: efficiency. The researchers adjust the layers of the deep neural network so the AI model is more efficient than earlier models in how much data needs to be computed when training the neural network.

Also: 2024 may be the year AI learns in the palm of your hand

Specifically, they can meet or beat the results of a slew of neural nets for mobile computing “while requiring 2× fewer pre-training tokens”, where tokens are the individual characters, words, or sentence fragments in the training data.

Apple starts from the same approach as many LLMs: a transformer. The transformer is the signature neural net in language understanding, introduced by Google scientists in 2017. Every major language model since, including Google’s BERT and OpenAI’s GPT family of models, has adopted the transformer.

Apple achieves high efficiency by melding the transformer with a technique introduced in 2021 by researchers at the University of Washington, Facebook AI Research, and the Allen Institute for AI, called DeLighT. That work broke away from the conventional approach in which all the neural weights are the same for every “layer” of the network, the successive mathematical computations through which the data pass.

Instead, the researchers selectively adjusted each layer to have a different number of parameters. Because some layers have relatively few parameters, they called their approach a “deep and light-weight transformer”, hence the name, DeLighT.

Also: Snowflake says its new LLM outperforms Meta’s Llama 3 on half the training

The researchers say that: “DeLighT matches or improves the performance of baseline Transformers with 2 to 3 times fewer parameters on average.”

Apple, using DeLighT, creates OpenELM, where each layer of the neural net has a distinct number of neural parameters, a non-uniform approach to parameters.

“Existing LLMs use the same configuration for each transformer layer in the model, resulting in a uniform allocation of parameters across layers,” write Mehta and team. “Unlike these models, each transformer layer in OpenELM has a different configuration (e.g., number of heads and feed forward network dimension), resulting in variable number of parameters in each layer of the model.”

The non-uniform approach, they write, “lets OpenELM better utilize the available parameter budget for achieving higher accuracies.”

Also: Yikes! Microsoft Copilot failed every single one of my coding tests

The competition Apple measures itself against uses similarly small neural nets. These competitors include MobiLlama from Mohamed bin Zayed University of AI and collaborating institutions, and OLMo, introduced this year by researchers at the Allen Institute for Artificial Intelligence and scholars from the University of Washington, Yale University, New York University, and Carnegie Mellon University.

The experiments by Apple are not carried out on a mobile device. Instead, the company uses an Intel-based workstation with a single Nvidia GPU and Ubuntu Linux.

On numerous benchmark tests, the OpenELM program achieves better scores, despite being smaller and/or using fewer tokens. For example, on six out of seven tests, OpenELM beats OLMo despite having fewer parameters — 1.08 billion versus 1.18 billion — and only 1.5 trillion training tokens versus 3 trillion for OLMo.

Also: How to avoid the headaches of AI skills development

Although OpenELM can be more accurate than those models more efficiently, the authors note further research areas where OpenELM is slower in some cases to produce its predictions.

An open question for Apple’s iOS AI work has been whether the tech giant will license technology from Google or another party that leads AI development. The investment by Apple in open-source software confers the intriguing possibility that Apple might be trying to reinforce an open ecosystem from which its own devices can benefit.

Artificial Intelligence

Source Link

Apple builds a slimmed-down AI model using Stanford, Google innovations

Artificial Intelligence

LEAVE A REPLY Cancel reply

TECH NEWS

Honda and IBM Sign Memorandum of Understanding

India’s AI spending to triple report

Vacation Rental Sites Like Airbnb: Exploring Alternative Platforms for Property Listings

OpenAI GPT-4o With Real-Time Responses and Video Interaction Announced, GPT-4 Features...

Bitcoin Maintains Pricing Above $60,000, Volatility Pushes Most Altcoins to Losses

Apple Said to Use In-House Server Chips to Power AI Tools...

TOP STORIES

Gartner CMO Survey Reveals Marketing Budgets Have Dropped to 7.7% of...

Women Leaders in Tech Outpace Men Counterparts in Generative AI Adoption

Asset Managers Need to Set a Strategy to Leverage the AI...

New research highlights diverse bundling strategies used by major video streaming...

70% Are Excited about GenAI in the Workplace

Nine solutions for Cities to Cut Carbon Emissions in Construction

Cyber Security

Safeguarding the Frontline of Healthcare: How to Defend Against Aggressive New...

Tech Support Scams: Understanding and Protecting Against Digital Deception

Longest cyberattacks: experts highlight trusted relationships as key vector

Cybercriminals double exploitation of Linux vulnerabilities

Zscaler Research Observed Over 79 Million Phishing Attempts In India, Ranking...

New Verification Schemes Target Users of Online Dating Platforms