Here's your daily roundup of the most relevant AI and ML news for March 16, 2026. Today's digest includes 1 security-focused story. We're also covering 7 research developments. Click through to read the full articles from our curated sources.
Security & Safety
1. OpenClaw AI Agent Flaws Could Enable Prompt Injection and Data Exfiltration
China's National Computer Network Emergency Response Technical Team (CNCERT) has issued a warning about the security stemming from the use of OpenClaw (formerly Clawdbot and Moltbot), an open-source and self-hosted autonomous artificial intelligence (AI) agent. In a post shared on WeChat, CNCERT ...
Source: The Hacker News (Security) | 1 day ago
Research & Papers
2. PISmith: Reinforcement Learning-based Red Teaming for Prompt Injection Defenses
arXiv:2603.13026v1 Announce Type: new Abstract: Prompt injection poses serious security risks to real-world LLM applications, particularly autonomous agents. Although many defenses have been proposed, their robustness against adaptive attacks remains insufficiently evaluated, potentially creatin...
Source: arXiv - Machine Learning | 10 hours ago
3. Robust Fine-Tuning from Non-Robust Pretrained Models: Mitigating Suboptimal Transfer With Epsilon-Scheduling
arXiv:2509.23325v3 Announce Type: replace Abstract: Fine-tuning pretrained models is a standard and effective workflow in modern machine learning. However, robust fine-tuning (RFT), which aims to simultaneously achieve adaptation to a downstream task and robustness to adversarial examples, remai...
Source: arXiv - Machine Learning | 10 hours ago
4. STRAP-ViT: Segregated Tokens with Randomized -- Transformations for Defense against Adversarial Patches in ViTs
arXiv:2603.12688v1 Announce Type: cross Abstract: Adversarial patches are physically realizable localized noise, which are able to hijack Vision Transformers (ViT) self-attention, pulling focus toward a small, high-contrast region and corrupting the class token to force confident misclassificati...
Source: arXiv - Machine Learning | 10 hours ago
5. Prompt Injection as Role Confusion
arXiv:2603.12277v1 Announce Type: cross Abstract: Language models remain vulnerable to prompt injection attacks despite extensive safety training. We trace this failure to role confusion: models infer roles from how text is written, not where it comes from. We design novel role probes to capture...
Source: arXiv - AI | 10 hours ago
6. Depth Charge: Jailbreak Large Language Models from Deep Safety Attention Heads
arXiv:2603.05772v2 Announce Type: replace-cross Abstract: Currently, open-sourced large language models (OSLLMs) have demonstrated remarkable generative performance. However, as their structure and weights are made public, they are exposed to jailbreak attacks even after alignment. Existing atta...
Source: arXiv - AI | 10 hours ago
7. LLM BiasScope: A Real-Time Bias Analysis Platform for Comparative LLM Evaluation
arXiv:2603.12522v1 Announce Type: cross Abstract: As large language models (LLMs) are deployed widely, detecting and understanding bias in their outputs is critical. We present LLM BiasScope, a web application for side-by-side comparison of LLM outputs with real-time bias analysis. The system su...
Source: arXiv - AI | 10 hours ago
8. Spend Less, Reason Better: Budget-Aware Value Tree Search for LLM Agents
arXiv:2603.12634v1 Announce Type: new Abstract: Test-time scaling has become a dominant paradigm for improving LLM agent reliability, yet current approaches treat compute as an abundant resource, allowing agents to exhaust token and tool budgets on redundant steps or dead-end trajectories. Exist...
Source: arXiv - Machine Learning | 10 hours ago
About This Digest
This digest is automatically curated from leading AI and tech news sources, filtered for relevance to AI security and the ML ecosystem. Stories are scored and ranked based on their relevance to model security, supply chain safety, and the broader AI landscape.
Want to see how your favorite models score on security? Check our model dashboard for trust scores on the top 500 HuggingFace models.