Swiss researchers find security flaws in AI models
The experiments by the EPFL researchers show that adaptive attacks can bypass security measures of
AI models like GPT-4.
Keystone-SDA
Select your language
Generated with artificial intelligence.
Listening: Swiss researchers find security flaws in AI models
Artificial intelligence (AI) models can be manipulated despite existing safeguards. With targeted attacks, scientists in Lausanne have been able to trick these systems into generating dangerous or ethically dubious content.
This content was published on
3 minutes
Keystone-SDA
Français
fr
EPFL: des failles de sécurité dans les modèles d’IA
Original
Today’s large language models (LLMs) have remarkable capabilities that can nevertheless be misused. A malicious person can use them to produce harmful content, spread false information and support harmful activities.
Of the AI models tested, including Open AI’s GPT-4 and Anthropic’s Claude 3, a team from the Swiss Federal Institute of Technology Lausanne (EPFL) achieved a 100% success rate in cracking security safeguards using adaptive jailbreak attacks.
The models then generated dangerous content, ranging from instructions for phishing attacks to detailed construction plans for weapons. These linguistic models are supposed to have been trained not to respond to dangerous or ethically problematic requests, the EPFL said in a statement on Thursday.
This work, presented last summer at a specialised conference in Vienna, shows that adaptive attacks can bypass these security measures. Such attacks exploit weak points in security mechanisms by making targeted requests (“prompts”) that are not recognised by models or are not properly rejected.
Building bombs
The models thus respond to malicious requests such as “How do I make a bomb?” or “How do I hack into a government database?”, according to this pre-publication study.
“We show that it is possible to exploit the information available on each model to create simple adaptive attacks, which we define as attacks specifically designed to target a given defense,” explained Nicolas Flammarion, co-author of the paper with Maksym Andriushchenko and Francesco Croce.
The common thread behind these attacks is adaptability: different models are vulnerable to different prompts. “We hope that our work will provide a valuable source of information on the robustness of LLMs,” added the specialist in the release. According to the EPFL, these results are already influencing the development of Gemini 1.5, a new AI model from Google DeepMind.
As the company moves towards using LLMs as autonomous agents, for example as AI personal assistants, it is essential to guarantee their safety, the authors stressed.
“Before long AI agents will be able to perform various tasks for us, such as planning and booking our vacations, tasks that would require access to our diaries, emails and bank accounts. This raises many questions about security and alignment,” concluded Andriushchenko, who devoted his thesis to the subject.
Translated from French with DeepL/gw
This news story has been written and carefully fact-checked by an external editorial team. At SWI swissinfo.ch we select the most relevant news for an international audience and use automatic translation tools such as DeepL to translate it into English. Providing you with automatically translated news gives us the time to write more in-depth articles.
If you want to know more about how we work, have a look here, if you want to learn more about how we use technology, click here, and if you have feedback on this news story please write to english@swissinfo.ch.
Popular Stories
More
Climate adaptation
Why Switzerland is among the ten fastest-warming countries in the world
Train vs plane: would you take a direct train between London and Geneva?
Eurostar is planning to run direct trains from Britain to Germany and Switzerland from the early 2030s. Would you favour the train over the plane? If not, why not?
Legal action filed against Swiss purchase of Israeli drones
This content was published on
Legal action aims to put an end to the delivery of the six Elbit reconnaissance drones already plagued by delays and setbacks.
Higher direct payments fail to curb scrub encroachment on alpine pastures
This content was published on
The scrub encroachment on Swiss alpine pastures leads to the loss of grassland and damages the typical landscape. It is also responsible for the decline in biodiversity. Despite higher direct payments, the bushes continue to spread.
Head of Swiss financial regulator’s Banks division quits
This content was published on
Thomas Hirschi, head of the Banks division of the Swiss Financial Market Supervisory Authority FINMA will leave at the end of August.
Swiss population satisfied with life according to survey
This content was published on
In a survey, the population of German-speaking and French-speaking Switzerland expressed general satisfaction with their lives. Respondents were less happy with politics and their personal finances, according to the online comparison service Moneyland.
WHO ‘extremely concerned’ about growing vaccination scepticism
This content was published on
According to the World Health Organisation (WHO), vaccination scepticism and a collapse in funding for vaccination campaigns pose a major threat to the health of the world's population.
High-net-worth individuals prioritise well-being over material possessions
This content was published on
The priorities of wealthy private individuals have shifted against the backdrop of ongoing geopolitical tensions and trade disputes. While spending on luxury goods is declining, demand for travel and experiences is unabated.
Swiss researchers sequence genome of 1918 Spanish flu virus
This content was published on
Researchers from the Universities of Basel and Zurich (UZH) have sequenced the genome of the Spanish flu virus, thanks to a sample taken from an 18-year-old Swiss boy who died in the city on the Limmat in 1918, when the pandemic spread around the world.
You can find an overview of ongoing debates with our journalists here . Please join us!
If you want to start a conversation about a topic raised in this article or want to report factual errors, email us at english@swissinfo.ch.