Backes

Backes

My group is working on various aspects of information security, with a current main focus on the intersection of information security with AI / Machine Learning. Further topics of interest include: trustworthy information processing of medical information; design, analysis and verification for security-critical systems and services; and universal solutions in system and software security.

Members

Most Recent Publications

Year 2026

2026-07-02

Jailbreaking Attacks vs. Content Safety Filters:
How Far Are We in the LLM Safety Arms Race?

Conference / Medium

Annual Meeting of the Association for Computational Linguistics (ACL)

Tags

Trustworthy Information Processing

Authors

Yuan Xin
Dingfan Chen
Linyi Yang
Michael Backes
Xiao Zhang

Full Paper Visit Detail Page

2026-04-24

Benchmarking Empirical Privacy Protection for Adaptations of Large Language Models

Conference / Medium

International Conference on Learning Representations (ICLR)
Benchmarking Empirical Privacy Protection for Adaptations of Large Language Models

Tags

Authors

Full Paper Visit Detail Page

Year 2025

2025-12-05

Adjacent Words, Divergent Intents:
Jailbreaking Large Language Models via Task Concurrency

Conference / Medium

Conference on Neural Information Processing Systems (NeurIPS)
Adjacent Words, Divergent Intents: Jailbreaking Large Language Models via Task Concurrency

Tags

Trustworthy Information Processing

Authors

Full Paper Visit Detail Page

2025-12-03

Finding and Reactivating Post-Trained LLMs’ Hidden Safety Mechanisms

Conference / Medium

Conference on Neural Information Processing Systems (NeurIPS)
Finding and Reactivating Post-Trained LLMs’ Hidden Safety Mechanisms

Tags

Trustworthy Information Processing

Authors

Full Paper Visit Detail Page

2025-11-04

Breaking Agents:
Compromising Autonomous LLM Agents Through Malfunction Amplification

Conference / Medium

Conference on Empirical Methods in Natural Language Processing (EMNLP)
Breaking Agents: Compromising Autonomous LLM Agents Through Malfunction Amplification

Tags

Trustworthy Information Processing

Authors

Boyang Zhang
Yicong Tan
Yun Shen
Ahmed Salem
Michael Backes
Savvas Zannettou
Yang Zhang

Full Paper Visit Detail Page

Backes

Head of Group

Email

Address

Members

Yihan Ma

Xinyue Shen

Yiting Qu

Wai Man Si

Yuan Xin

Junjie Chu

Hai Huang

Yugeng Liu

Yiyong Liu

Minxing Zhang

Yixin Wu

Ziqing Yang

Wenhao Wang

Vincent Hanke

Mingjie Li

Yukun Jiang

Xun Wang

Yage Zhang

Most Recent Publications

Year 2026

Year 2025