Here's your daily roundup of the most relevant AI and ML news for February 03, 2026. Today's digest includes 1 security-focused story. We're also covering 7 research developments. Click through to read the full articles from our curated sources.
Security & Safety
1. Open VSX Supply Chain Attack Used Compromised Dev Account to Spread GlassWorm
Cybersecurity researchers have disclosed details of a supply chain attack targeting the Open VSX Registry in which unidentified threat actors compromised a legitimate developer's resources to push malicious updates to downstream users. "On January 30, 2026, four established Open VSX extensions pu...
Source: The Hacker News (Security) | 1 day ago
Research & Papers
2. Short-length Adversarial Training Helps LLMs Defend Long-length Jailbreak Attacks: Theoretical and Empirical Evidence
arXiv:2502.04204v3 Announce Type: replace Abstract: Jailbreak attacks against large language models (LLMs) aim to induce harmful behaviors in LLMs through carefully crafted adversarial prompts. To mitigate attacks, one way is to perform adversarial training (AT)-based alignment, i.e., training L...
Source: arXiv - Machine Learning | 18 hours ago
3. MAGIC: A Co-Evolving Attacker-Defender Adversarial Game for Robust LLM Safety
arXiv:2602.01539v1 Announce Type: cross Abstract: Ensuring robust safety alignment is crucial for Large Language Models (LLMs), yet existing defenses often lag behind evolving adversarial attacks due to their \textbf{reliance on static, pre-collected data distributions}. In this paper, we introd...
Source: arXiv - Machine Learning | 18 hours ago
4. RedVisor: Reasoning-Aware Prompt Injection Defense via Zero-Copy KV Cache Reuse
arXiv:2602.01795v1 Announce Type: cross Abstract: Large Language Models (LLMs) are increasingly vulnerable to Prompt Injection (PI) attacks, where adversarial instructions hidden within retrieved contexts hijack the model's execution flow. Current defenses typically face a critical trade-off: pr...
Source: arXiv - Machine Learning | 18 hours ago
5. Your Self-Play Algorithm is Secretly an Adversarial Imitator: Understanding LLM Self-Play through the Lens of Imitation Learning
arXiv:2602.01357v1 Announce Type: new Abstract: Self-play post-training methods has emerged as an effective approach for finetuning large language models and turn the weak language model into strong language model without preference data. However, the theoretical foundations for self-play finetu...
Source: arXiv - Machine Learning | 18 hours ago
6. Toward Universal and Transferable Jailbreak Attacks on Vision-Language Models
arXiv:2602.01025v1 Announce Type: new Abstract: Vision-language models (VLMs) extend large language models (LLMs) with vision encoders, enabling text generation conditioned on both images and text. However, this multimodal integration expands the attack surface by exposing the model to image-bas...
Source: arXiv - Machine Learning | 18 hours ago
7. Self-Generative Adversarial Fine-Tuning for Large Language Models
arXiv:2602.01137v1 Announce Type: new Abstract: Fine-tuning large language models (LLMs) for alignment typically relies on supervised fine-tuning or reinforcement learning from human feedback, both limited by the cost and scarcity of high-quality annotations. Recent self-play and synthetic data ...
Source: arXiv - Machine Learning | 18 hours ago
8. STAC: When Innocent Tools Form Dangerous Chains to Jailbreak LLM Agents
arXiv:2509.25624v2 Announce Type: replace-cross Abstract: As LLMs advance into autonomous agents with tool-use capabilities, they introduce security challenges that extend beyond traditional content-based LLM safety concerns. This paper introduces Sequential Tool Attack Chaining (STAC), a novel ...
Source: arXiv - Machine Learning | 18 hours ago
About This Digest
This digest is automatically curated from leading AI and tech news sources, filtered for relevance to AI security and the ML ecosystem. Stories are scored and ranked based on their relevance to model security, supply chain safety, and the broader AI landscape.
Want to see how your favorite models score on security? Check our model dashboard for trust scores on the top 500 HuggingFace models.