Here's your daily roundup of the most relevant AI and ML news for February 05, 2026. We're also covering 8 research developments. Click through to read the full articles from our curated sources.
Research & Papers
1. Semantics as a Shield: Label Disguise Defense (LDD) against Prompt Injection in LLM Sentiment Classification
arXiv:2511.21752v2 Announce Type: replace-cross Abstract: Large language models are increasingly used for text classification tasks such as sentiment analysis, yet their reliance on natural language prompts exposes them to prompt injection attacks. In particular, class-directive injections explo...
Source: arXiv - AI | 18 hours ago
2. Comparative Insights on Adversarial Machine Learning from Industry and Academia: A User-Study Approach
arXiv:2602.04753v1 Announce Type: cross Abstract: An exponential growth of Machine Learning and its Generative AI applications brings with it significant security challenges, often referred to as Adversarial Machine Learning (AML). In this paper, we conducted two comprehensive studies to explore...
Source: arXiv - AI | 18 hours ago
3. How Few-shot Demonstrations Affect Prompt-based Defenses Against LLM Jailbreak Attacks
arXiv:2602.04294v1 Announce Type: cross Abstract: Large Language Models (LLMs) face increasing threats from jailbreak attacks that bypass safety alignment. While prompt-based defenses such as Role-Oriented Prompts (RoP) and Task-Oriented Prompts (ToP) have shown effectiveness, the role of few-sh...
Source: arXiv - AI | 18 hours ago
4. When AI Persuades: Adversarial Explanation Attacks on Human Trust in AI-Assisted Decision Making
arXiv:2602.04003v1 Announce Type: new Abstract: Most adversarial threats in artificial intelligence target the computational behavior of models rather than the humans who rely on them. Yet modern AI systems increasingly operate within human decision loops, where users interpret and act on model ...
Source: arXiv - AI | 18 hours ago
5. QUATRO: Query-Adaptive Trust Region Policy Optimization for LLM Fine-tuning
arXiv:2602.04620v1 Announce Type: new Abstract: GRPO-style reinforcement learning (RL)-based LLM fine-tuning algorithms have recently gained popularity. Relying on heuristic trust-region approximations, however, they can lead to brittle optimization behavior, as global importance-ratio clipping ...
Source: arXiv - Machine Learning | 18 hours ago
6. GSAE: Graph-Regularized Sparse Autoencoders for Robust LLM Safety Steering
arXiv:2512.06655v2 Announce Type: replace-cross Abstract: Large language models (LLMs) face critical safety challenges, as they can be manipulated to generate harmful content through adversarial prompts and jailbreak attacks. Many defenses are typically either black-box guardrails that filter ou...
Source: arXiv - AI | 18 hours ago
7. When Good Sounds Go Adversarial: Jailbreaking Audio-Language Models with Benign Inputs
arXiv:2508.03365v3 Announce Type: replace-cross Abstract: As large language models (LLMs) become increasingly integrated into daily life, audio has emerged as a key interface for human-AI interaction. However, this convenience also introduces new vulnerabilities, making audio a potential attack ...
Source: arXiv - AI | 18 hours ago
8. Toward Reliable and Explainable Nail Disease Classification: Leveraging Adversarial Training and Grad-CAM Visualization
arXiv:2602.04820v1 Announce Type: cross Abstract: Human nail diseases are gradually observed over all age groups, especially among older individuals, often going ignored until they become severe. Early detection and accurate diagnosis of such conditions are important because they sometimes revea...
Source: arXiv - AI | 18 hours ago
About This Digest
This digest is automatically curated from leading AI and tech news sources, filtered for relevance to AI security and the ML ecosystem. Stories are scored and ranked based on their relevance to model security, supply chain safety, and the broader AI landscape.
Want to see how your favorite models score on security? Check our model dashboard for trust scores on the top 500 HuggingFace models.