Here's your daily roundup of the most relevant AI and ML news for February 18, 2026. Today's digest includes 1 security-focused story. We're also covering 7 research developments. Click through to read the full articles from our curated sources.
Security & Safety
1. Kernel-enforced sandbox App and SDK for AI agents, MCP and LLM workloads
Article URL: https://github.com/always-further/nono Comments URL: https://news.ycombinator.com/item?id=47066574 Points: 1
Comments: 1
Source: Hacker News - ML Security | 2 hours ago
Research & Papers
2. TokaMind: A Multi-Modal Transformer Foundation Model for Tokamak Plasma Dynamics
arXiv:2602.15084v1 Announce Type: cross Abstract: We present TokaMind, an open-source foundation model framework for fusion plasma modeling, based on a Multi-Modal Transformer (MMT) and trained on heterogeneous tokamak diagnostics from the publicly available MAST dataset. TokaMind supports multi...
Source: arXiv - Machine Learning | 18 hours ago
3. Closing the Distribution Gap in Adversarial Training for LLMs
arXiv:2602.15238v1 Announce Type: new Abstract: Adversarial training for LLMs is one of the most promising methods to reliably improve robustness against adversaries. However, despite significant progress, models remain vulnerable to simple in-distribution exploits, such as rewriting prompts in ...
Source: arXiv - Machine Learning | 18 hours ago
4. ER-MIA: Black-Box Adversarial Memory Injection Attacks on Long-Term Memory-Augmented Large Language Models
arXiv:2602.15344v1 Announce Type: new Abstract: Large language models (LLMs) are increasingly augmented with long-term memory systems to overcome finite context windows and enable persistent reasoning across interactions. However, recent research finds that LLMs become more vulnerable because me...
Source: arXiv - Machine Learning | 18 hours ago
5. Robust Deep Reinforcement Learning against Adversarial Behavior Manipulation
arXiv:2406.03862v3 Announce Type: replace Abstract: This study investigates behavior-targeted attacks on reinforcement learning and their countermeasures. Behavior-targeted attacks aim to manipulate the victim's behavior as desired by the adversary through adversarial interventions in state obse...
Source: arXiv - Machine Learning | 18 hours ago
6. RobustBlack: Challenging Black-Box Adversarial Attacks on State-of-the-Art Defenses
arXiv:2412.20987v2 Announce Type: replace Abstract: Although adversarial robustness has been extensively studied in white-box settings, recent advances in black-box attacks (including transfer- and query-based approaches) are primarily benchmarked against weak defenses, leaving a significant gap...
Source: arXiv - Machine Learning | 18 hours ago
7. Efficient Semi-Supervised Adversarial Training via Latent Clustering-Based Data Reduction
arXiv:2501.10466v3 Announce Type: replace Abstract: Learning robust models under adversarial settings is widely recognized as requiring a considerably large number of training samples. Recent work proposes semi-supervised adversarial training (SSAT), which utilizes external unlabeled or syntheti...
Source: arXiv - Machine Learning | 18 hours ago
8. The Geometry of Alignment Collapse: When Fine-Tuning Breaks Safety
arXiv:2602.15799v1 Announce Type: new Abstract: Fine-tuning aligned language models on benign tasks unpredictably degrades safety guardrails, even when training data contains no harmful content and developers have no adversarial intent. We show that the prevailing explanation, that fine-tuning u...
Source: arXiv - Machine Learning | 18 hours ago
About This Digest
This digest is automatically curated from leading AI and tech news sources, filtered for relevance to AI security and the ML ecosystem. Stories are scored and ranked based on their relevance to model security, supply chain safety, and the broader AI landscape.
Want to see how your favorite models score on security? Check our model dashboard for trust scores on the top 500 HuggingFace models.