The Swiss voice in the world since 1935

Swiss researchers find security flaws in AI models

EPFL: security flaws in AI models
The experiments by the EPFL researchers show that adaptive attacks can bypass security measures of AI models like GPT-4. Keystone-SDA

Artificial intelligence (AI) models can be manipulated despite existing safeguards. With targeted attacks, scientists in Lausanne have been able to trick these systems into generating dangerous or ethically dubious content.

Today’s large language models (LLMs) have remarkable capabilities that can nevertheless be misused. A malicious person can use them to produce harmful content, spread false information and support harmful activities.

+Get the most important news from Switzerland in your inbox

Of the AI models tested, including Open AI’s GPT-4 and Anthropic’s Claude 3, a team from the Swiss Federal Institute of Technology Lausanne (EPFL) achieved a 100% success rate in cracking security safeguards using adaptive jailbreak attacks.

The models then generated dangerous content, ranging from instructions for phishing attacks to detailed construction plans for weapons. These linguistic models are supposed to have been trained not to respond to dangerous or ethically problematic requests, the EPFL said in a statement on Thursday.

+ AI regulations must strike a balance between innovation and safety 

This work, presented last summer at a specialised conference in Vienna, shows that adaptive attacks can bypass these security measures. Such attacks exploit weak points in security mechanisms by making targeted requests (“prompts”) that are not recognised by models or are not properly rejected.

Building bombs

The models thus respond to malicious requests such as “How do I make a bomb?” or “How do I hack into a government database?”, according to this pre-publication study.

“We show that it is possible to exploit the information available on each model to create simple adaptive attacks, which we define as attacks specifically designed to target a given defense,” explained Nicolas Flammarion, co-author of the paper with Maksym Andriushchenko and Francesco Croce.

+ How US heavyweights can help grow the Swiss AI sector

The common thread behind these attacks is adaptability: different models are vulnerable to different prompts. “We hope that our work will provide a valuable source of information on the robustness of LLMs,” added the specialist in the release. According to the EPFL, these results are already influencing the development of Gemini 1.5, a new AI model from Google DeepMind.

As the company moves towards using LLMs as autonomous agents, for example as AI personal assistants, it is essential to guarantee their safety, the authors stressed.

“Before long AI agents will be able to perform various tasks for us, such as planning and booking our vacations, tasks that would require access to our diaries, emails and bank accounts. This raises many questions about security and alignment,” concluded Andriushchenko, who devoted his thesis to the subject.

Translated from French with DeepL/gw

This news story has been written and carefully fact-checked by an external editorial team. At SWI swissinfo.ch we select the most relevant news for an international audience and use automatic translation tools such as DeepL to translate it into English. Providing you with automatically translated news gives us the time to write more in-depth articles.

If you want to know more about how we work, have a look here, if you want to learn more about how we use technology, click here, and if you have feedback on this news story please write to english@swissinfo.ch.

Popular Stories

Most Discussed

News

F/A-18 take-offs and landings at Bern-Belp Airport

More

F/A-18s take off and land at Bern Airport

This content was published on The Swiss Armed Forces are training their fighter jets in Bern to fly from a civilian base. The exercise at Bern Airport will last until Wednesday.

Read more: F/A-18s take off and land at Bern Airport
cern

More

Plans materialise for new particle accelerator in Geneva

This content was published on Preparations for a huge new particle accelerator in Geneva have reached a milestone. After several years of work, a feasibility study for the project has now been finalised.

Read more: Plans materialise for new particle accelerator in Geneva
More Russian assets frozen in Switzerland

More

More Russian assets frozen in Switzerland

This content was published on The value of frozen Russian assets in Switzerland currently stands at CHF7.4 billion ($8.4 billion), the Swiss government announced on Tuesday.

Read more: More Russian assets frozen in Switzerland
Increase in business start-ups in the 1st quarter

More

Increase in Swiss business start-ups in Q1

This content was published on The number of business start-ups in Switzerland accelerated in the first three months of the year, with entrepreneurs being particularly dynamic in Central Switzerland, Basel and Geneva.

Read more: Increase in Swiss business start-ups in Q1

In compliance with the JTI standards

More: SWI swissinfo.ch certified by the Journalism Trust Initiative

You can find an overview of ongoing debates with our journalists here . Please join us!

If you want to start a conversation about a topic raised in this article or want to report factual errors, email us at english@swissinfo.ch.

SWI swissinfo.ch - a branch of Swiss Broadcasting Corporation SRG SSR

SWI swissinfo.ch - a branch of Swiss Broadcasting Corporation SRG SSR