News Digest March 02, 2026 8 min read

AI News Digest: March 02, 2026

Daily roundup of AI and ML news - 8 curated stories on security, research, and industry developments.

Here's your daily roundup of the most relevant AI and ML news for March 02, 2026. We're also covering 8 research developments. Click through to read the full articles from our curated sources.

Research & Papers

1. Adversarial Fine-tuning in Offline-to-Online Reinforcement Learning for Robust Robot Control

arXiv:2510.13358v2 Announce Type: replace-cross Abstract: Offline reinforcement learning enables sample-efficient policy acquisition without risky online interaction, yet policies trained on static datasets remain brittle under action-space perturbations such as actuator faults. This study intro...

Source: arXiv - AI | 9 hours ago

2. Jailbreak Foundry: From Papers to Runnable Attacks for Reproducible Benchmarking

arXiv:2602.24009v1 Announce Type: cross Abstract: Jailbreak techniques for large language models (LLMs) evolve faster than benchmarks, making robustness estimates stale and difficult to compare across papers due to drift in datasets, harnesses, and judging protocols. We introduce JAILBREAK FOUND...

Source: arXiv - Machine Learning | 9 hours ago

3. To Deceive is to Teach? Forging Perceptual Robustness via Adversarial Reinforcement Learning

arXiv:2602.22227v2 Announce Type: replace Abstract: Despite their impressive capabilities, Multimodal Large Language Models (MLLMs) exhibit perceptual fragility when confronted with visually complex scenes. This weakness stems from a reliance on finite training datasets, which are prohibitively ...

Source: arXiv - Machine Learning | 9 hours ago

4. Obscure but Effective: Classical Chinese Jailbreak Prompt Optimization via Bio-Inspired Search

arXiv:2602.22983v2 Announce Type: replace Abstract: As Large Language Models (LLMs) are increasingly used, their security risks have drawn increasing attention. Existing research reveals that LLMs are highly susceptible to jailbreak attacks, with effectiveness varying across language contexts. T...

Source: arXiv - AI | 9 hours ago

5. Concept-based Adversarial Attack: a Probabilistic Perspective

arXiv:2507.02965v2 Announce Type: replace-cross Abstract: We propose a concept-based adversarial attack framework that extends beyond single-image perturbations by adopting a probabilistic perspective. Rather than modifying a single image, our method operates on an entire concept - represented b...

Source: arXiv - AI | 9 hours ago

6. Enhancing Continual Learning for Software Vulnerability Prediction: Addressing Catastrophic Forgetting via Hybrid-Confidence-Aware Selective Replay for Temporal LLM Fine-Tuning

arXiv:2602.23834v1 Announce Type: cross Abstract: Recent work applies Large Language Models (LLMs) to source-code vulnerability detection, but most evaluations still rely on random train-test splits that ignore time and overestimate real-world performance. In practice, detectors are deployed on ...

Source: arXiv - Machine Learning | 9 hours ago

7. CoMind: Towards Community-Driven Agents for Machine Learning Engineering

arXiv:2506.20640v3 Announce Type: replace-cross Abstract: Large language model (LLM) agents show promise in automating machine learning (ML) engineering. However, existing agents typically operate in isolation on a given research problem, without engaging with the broader research community, whe...

Source: arXiv - Machine Learning | 9 hours ago

8. From Flat Logs to Causal Graphs: Hierarchical Failure Attribution for LLM-based Multi-Agent Systems

arXiv:2602.23701v1 Announce Type: new Abstract: LLM-powered Multi-Agent Systems (MAS) have demonstrated remarkable capabilities in complex domains but suffer from inherent fragility and opaque failure mechanisms. Existing failure attribution methods, whether relying on direct prompting, costly r...

Source: arXiv - AI | 9 hours ago

About This Digest

This digest is automatically curated from leading AI and tech news sources, filtered for relevance to AI security and the ML ecosystem. Stories are scored and ranked based on their relevance to model security, supply chain safety, and the broader AI landscape.

Want to see how your favorite models score on security? Check our model dashboard for trust scores on the top 500 HuggingFace models.

Research & Papers

1. Adversarial Fine-tuning in Offline-to-Online Reinforcement Learning for Robust Robot Control

2. Jailbreak Foundry: From Papers to Runnable Attacks for Reproducible Benchmarking

3. To Deceive is to Teach? Forging Perceptual Robustness via Adversarial Reinforcement Learning

4. Obscure but Effective: Classical Chinese Jailbreak Prompt Optimization via Bio-Inspired Search

5. Concept-based Adversarial Attack: a Probabilistic Perspective

6. Enhancing Continual Learning for Software Vulnerability Prediction: Addressing Catastrophic Forgetting via Hybrid-Confidence-Aware Selective Replay for Temporal LLM Fine-Tuning

7. CoMind: Towards Community-Driven Agents for Machine Learning Engineering

8. From Flat Logs to Causal Graphs: Hierarchical Failure Attribution for LLM-based Multi-Agent Systems

About This Digest

Related Articles

AI News Digest: July 27, 2026

AI News Digest: July 26, 2026

AI News Digest: July 25, 2026

Stay Updated

Real talk: I built this alone.