Annual Meeting of the Association for Computational Linguistics (ACL)
International Conference on Learning Representations (ICLR)
Benchmarking Empirical Privacy Protection for Adaptations of Large Language Models
Conference on Neural Information Processing Systems (NeurIPS)
Adjacent Words, Divergent Intents: Jailbreaking Large Language Models via Task Concurrency
Conference on Neural Information Processing Systems (NeurIPS)
Finding and Reactivating Post-Trained LLMs’ Hidden Safety Mechanisms
Conference on Empirical Methods in Natural Language Processing (EMNLP)
Breaking Agents: Compromising Autonomous LLM Agents Through Malfunction Amplification