Here's your daily roundup of the most relevant AI and ML news for March 02, 2026. We're also covering 8 research developments. Click through to read the full articles from our curated sources.
Research & Papers
1. Adversarial Fine-tuning in Offline-to-Online Reinforcement Learning for Robust Robot Control
arXiv:2510.13358v2 Announce Type: replace-cross Abstract: Offline reinforcement learning enables sample-efficient policy acquisition without risky online interaction, yet policies trained on static datasets remain brittle under action-space perturbations such as actuator faults. This study intro...
Source: arXiv - AI | 9 hours ago
2. Jailbreak Foundry: From Papers to Runnable Attacks for Reproducible Benchmarking
arXiv:2602.24009v1 Announce Type: cross Abstract: Jailbreak techniques for large language models (LLMs) evolve faster than benchmarks, making robustness estimates stale and difficult to compare across papers due to drift in datasets, harnesses, and judging protocols. We introduce JAILBREAK FOUND...
Source: arXiv - Machine Learning | 9 hours ago
3. To Deceive is to Teach? Forging Perceptual Robustness via Adversarial Reinforcement Learning
arXiv:2602.22227v2 Announce Type: replace Abstract: Despite their impressive capabilities, Multimodal Large Language Models (MLLMs) exhibit perceptual fragility when confronted with visually complex scenes. This weakness stems from a reliance on finite training datasets, which are prohibitively ...
Source: arXiv - Machine Learning | 9 hours ago
4. Obscure but Effective: Classical Chinese Jailbreak Prompt Optimization via Bio-Inspired Search
arXiv:2602.22983v2 Announce Type: replace Abstract: As Large Language Models (LLMs) are increasingly used, their security risks have drawn increasing attention. Existing research reveals that LLMs are highly susceptible to jailbreak attacks, with effectiveness varying across language contexts. T...
Source: arXiv - AI | 9 hours ago
5. Concept-based Adversarial Attack: a Probabilistic Perspective
arXiv:2507.02965v2 Announce Type: replace-cross Abstract: We propose a concept-based adversarial attack framework that extends beyond single-image perturbations by adopting a probabilistic perspective. Rather than modifying a single image, our method operates on an entire concept - represented b...
Source: arXiv - AI | 9 hours ago
6. Enhancing Continual Learning for Software Vulnerability Prediction: Addressing Catastrophic Forgetting via Hybrid-Confidence-Aware Selective Replay for Temporal LLM Fine-Tuning
arXiv:2602.23834v1 Announce Type: cross Abstract: Recent work applies Large Language Models (LLMs) to source-code vulnerability detection, but most evaluations still rely on random train-test splits that ignore time and overestimate real-world performance. In practice, detectors are deployed on ...
Source: arXiv - Machine Learning | 9 hours ago
7. CoMind: Towards Community-Driven Agents for Machine Learning Engineering
arXiv:2506.20640v3 Announce Type: replace-cross Abstract: Large language model (LLM) agents show promise in automating machine learning (ML) engineering. However, existing agents typically operate in isolation on a given research problem, without engaging with the broader research community, whe...
Source: arXiv - Machine Learning | 9 hours ago
8. From Flat Logs to Causal Graphs: Hierarchical Failure Attribution for LLM-based Multi-Agent Systems
arXiv:2602.23701v1 Announce Type: new Abstract: LLM-powered Multi-Agent Systems (MAS) have demonstrated remarkable capabilities in complex domains but suffer from inherent fragility and opaque failure mechanisms. Existing failure attribution methods, whether relying on direct prompting, costly r...
Source: arXiv - AI | 9 hours ago
About This Digest
This digest is automatically curated from leading AI and tech news sources, filtered for relevance to AI security and the ML ecosystem. Stories are scored and ranked based on their relevance to model security, supply chain safety, and the broader AI landscape.
Want to see how your favorite models score on security? Check our model dashboard for trust scores on the top 500 HuggingFace models.