Here's your daily roundup of the most relevant AI and ML news for February 24, 2026. Today's digest includes 1 security-focused story. We're also covering 7 research developments. Click through to read the full articles from our curated sources.
Security & Safety
1. Anthropic Says Chinese AI Firms Used 16 Million Claude Queries to Copy Model
Anthropic on Monday said it identified "industrial-scale campaigns" mounted by three artificial intelligence (AI) companies, DeepSeek, Moonshot AI, and MiniMax, to illegally extract Claude's capabilities to improve their own models. The distillation attacks generated over 16 million exchanges wit...
Source: The Hacker News (Security) | 7 hours ago
Research & Papers
2. Divided We Fall: Defending Against Adversarial Attacks via Soft-Gated Fractional Mixture-of-Experts with Randomized Adversarial Training
arXiv:2512.20821v2 Announce Type: replace Abstract: Machine learning is a powerful tool enabling full automation of a huge number of tasks without explicit programming. Despite recent progress of machine learning in different domains, these models have shown vulnerabilities when they are exposed...
Source: arXiv - Machine Learning | 9 hours ago
3. Kaiwu-PyTorch-Plugin: Bridging Deep Learning and Photonic Quantum Computing for Energy-Based Models and Active Sample Selection
arXiv:2602.19114v1 Announce Type: cross Abstract: This paper introduces the Kaiwu-PyTorch-Plugin (KPP) to bridge Deep Learning and Photonic Quantum Computing across multiple dimensions. KPP integrates the Coherent Ising Machine into the PyTorch ecosystem, addressing classical inefficiencies in E...
Source: arXiv - AI | 9 hours ago
4. Trojan Horses in Recruiting: A Red-Teaming Case Study on Indirect Prompt Injection in Standard vs. Reasoning Models
arXiv:2602.18514v1 Announce Type: cross Abstract: As Large Language Models (LLMs) are increasingly integrated into automated decision-making pipelines, specifically within Human Resources (HR), the security implications of Indirect Prompt Injection (IPI) become critical. While a prevailing hypot...
Source: arXiv - AI | 9 hours ago
5. GRILL: Restoring Gradient Signal in Ill-Conditioned Layers for More Effective Adversarial Attacks on Autoencoders
arXiv:2505.03646v4 Announce Type: replace-cross Abstract: Adversarial robustness of deep autoencoders (AEs) has received less attention than that of discriminative models, although their compressed latent representations induce ill-conditioned mappings that can amplify small input perturbations ...
Source: arXiv - AI | 9 hours ago
6. Reliable Abstention under Adversarial Injections: Tight Lower Bounds and New Upper Bounds
arXiv:2602.20111v1 Announce Type: new Abstract: We study online learning in the adversarial injection model introduced by [Goel et al. 2017], where a stream of labeled examples is predominantly drawn i.i.d.\ from an unknown distribution $\mathcal{D}$, but may be interspersed with adversarially c...
Source: arXiv - Machine Learning | 9 hours ago
7. Sampling-aware Adversarial Attacks Against Large Language Models
arXiv:2507.04446v4 Announce Type: replace Abstract: To guarantee safe and robust deployment of large language models (LLMs) at scale, it is critical to accurately assess their adversarial robustness. Existing adversarial attacks typically target harmful responses in single-point greedy generatio...
Source: arXiv - Machine Learning | 9 hours ago
8. Accidental Vulnerability: Factors in Fine-Tuning that Shift Model Safeguards
arXiv:2505.16789v3 Announce Type: replace-cross Abstract: As large language models (LLMs) gain popularity, their vulnerability to adversarial attacks emerges as a primary concern. While fine-tuning models on domain-specific datasets is often employed to improve model performance, it can inadvert...
Source: arXiv - AI | 9 hours ago
About This Digest
This digest is automatically curated from leading AI and tech news sources, filtered for relevance to AI security and the ML ecosystem. Stories are scored and ranked based on their relevance to model security, supply chain safety, and the broader AI landscape.
Want to see how your favorite models score on security? Check our model dashboard for trust scores on the top 500 HuggingFace models.