European Association for Computational Linguistics (EACL)
Defeating Cerberus: Privacy-Leakage Mitigation in Vision Language Models
IEEE Transactions on Dependable and Secure Computing Backdoor Complications: A Comprehensive Analysis and Mitigation of the Unforeseen Consequences of Backdoor Attacks
National Conference of the American Association for Artificial Intelligence (AAAI)
SL-CBM: Enhancing Concept Bottleneck Models with Semantic Locality for Better Interpretability
Conference on Neural Information Processing Systems (NeurIPS)
Adjacent Words, Divergent Intents: Jailbreaking Large Language Models via Task Concurrency
Conference on Neural Information Processing Systems (NeurIPS)
Finding and Reactivating Post-Trained LLMs’ Hidden Safety Mechanisms