#AI Safety

Latest news articles tagged with "AI Safety". Follow the timeline of events, related topics, and entities.

Articles (30)

🇺🇸 Why Anthropic is saying its new AI model, Mythos, is too dangerous to release — 10/04/2026 [USA]
Anthropic has announced that it is teaming up with industry competitors to "secure the world's most critical software" from its own AI model, Mythos. New York Times reporter Mike Isaac joins "The Take...
Related: #Cyber Security, #Corporate Ethics
🇺🇸 You Can’t Use This A.I. — 10/04/2026 [USA]
Claude Mythos Preview is dangerous, Anthropic said. We explain the risks.
Related: #Technology Ethics, #Corporate Responsibility
🇺🇸 Anthropic's powerful new AI model raises concerns about high-tech risks — 09/04/2026 [USA]
Anthropic announced that it has started a very limited test of its newest AI model called Mythos. It's a model deemed so powerful that the company warned it could cause widespread disruption if it wer...
Related: #Technology Ethics, #Corporate Responsibility
🇺🇸 Anthropic says new AI model too dangerous for public release — 09/04/2026 [USA]
Anthropic announced this week it will hold back the full release of its new artificial intelligence model as it believes it is too dangerous for the general public at this stage. The model, called Cl...
Related: #Technology Ethics, #Corporate Governance
🇺🇸 What we know about Anthropic's new, alarming AI model — 09/04/2026 [USA]
Anthropic announced its new AI model is too powerful for public release. Puck's Ian Krietzberg joins CBS News with more.
Related: #Technology Ethics, #Corporate Governance
🇺🇸 The Depth Ceiling: On the Limits of Large Language Models in Discovering Latent Planning — 09/04/2026 [USA]
arXiv:2604.06427v1 Announce Type: cross Abstract: The viability of chain-of-thought (CoT) monitoring hinges on models being unable to reason effectively in their latent representations. Yet little is...
Related: #AI Research, #Model Limitations
🇺🇸 When to Call an Apple Red: Humans Follow Introspective Rules, VLMs Don't — 09/04/2026 [USA]
arXiv:2604.06422v1 Announce Type: cross Abstract: Understanding when Vision-Language Models (VLMs) will behave unexpectedly, whether models can reliably predict their own behavior, and if models adhe...
Related: #Machine Learning, #Cognitive Science
🇺🇸 ClawLess: A Security Model of AI Agents — 09/04/2026 [USA]
arXiv:2604.06284v1 Announce Type: cross Abstract: Autonomous AI agents powered by Large Language Models can reason, plan, and execute complex tasks, but their ability to autonomously retrieve informa...
Related: #Cybersecurity, #Formal Verification
🇺🇸 Blending Human and LLM Expertise to Detect Hallucinations and Omissions in Mental Health Chatbot Responses — 09/04/2026 [USA]
arXiv:2604.06216v1 Announce Type: cross Abstract: As LLM-powered chatbots are increasingly deployed in mental health services, detecting hallucinations and omissions has become critical for user safe...
Related: #Mental Health Technology, #Human-AI Collaboration
🇺🇸 Harnessing Hyperbolic Geometry for Harmful Prompt Detection and Sanitization — 09/04/2026 [USA]
arXiv:2604.06285v1 Announce Type: cross Abstract: Vision-Language Models (VLMs) have become essential for tasks such as image synthesis, captioning, and retrieval by aligning textual and visual infor...
Related: #Machine Learning, #Cybersecurity
🇺🇸 Why Anthropic won't release its new Claude Mythos AI model to the public — 08/04/2026 [USA]
Claude Mythos Preview can identify and exploit software vulnerabilities with unprecedented accuracy, the company says.
Related: #Cybersecurity, #Corporate Ethics
🇺🇸 Anthropic claims newest AI model, Claude Mythos, is too powerful for public release — 08/04/2026 [USA]
Anthropic says its newest AI model, Claude Mythos, is too powerful and dangerous to be released to the public. Tech journalist Jacob Ward joins CBS News to discuss.
Related: #Technology Ethics, #Corporate Responsibility
🇺🇸 How dangerous is Mythos, Anthropic’s new AI model? — 08/04/2026 [USA]
Dario Amodei’s warnings should not be dismissed
Related: #Technological Risk, #Industry Ethics
🇺🇸 Anthropic’s Claude Code gets ‘safer’ auto mode — 25/03/2026 [USA]
Anthropic has launched an "auto mode" for Claude Code, a new tool that lets AI make permissions-level decisions on users' behalf. The company says the feature offers vibe coders a safer alternative be...
Related: #Automation
🇺🇸 PowerLens: Taming LLM Agents for Safe and Personalized Mobile Power Management — 23/03/2026 [USA]
arXiv:2603.19584v1 Announce Type: new Abstract: Battery life remains a critical challenge for mobile devices, yet existing power management mechanisms rely on static rules or coarse-grained heuristic...
Related: #Mobile Optimization
🇺🇸 LSR: Linguistic Safety Robustness Benchmark for Low-Resource West African Languages — 23/03/2026 [USA]
arXiv:2603.19273v1 Announce Type: cross Abstract: Safety alignment in large language models relies predominantly on English-language training data. When harmful intent is expressed in low-resource la...
Related: #Linguistic Diversity
🇺🇸 Creating with Sora Safely — 23/03/2026 [USA]
To address the novel safety challenges posed by a state-of-the-art video model as well as a new social creation platform, we’ve built Sora 2 and the Sora app with safety at the foundation. Our approac...
Related: #Content Creation
🇺🇸 VLM-AutoDrive: Post-Training Vision-Language Models for Safety-Critical Autonomous Driving Events — 20/03/2026 [USA]
arXiv:2603.18178v1 Announce Type: cross Abstract: The rapid growth of ego-centric dashcam footage presents a major challenge for detecting safety-critical events such as collisions and near-collision...
Related: #Autonomous Driving
🇺🇸 A Concept is More Than a Word: Diversified Unlearning in Text-to-Image Diffusion Models — 20/03/2026 [USA]
arXiv:2603.18767v1 Announce Type: new Abstract: Concept unlearning has emerged as a promising direction for reducing the risks of harmful content generation in text-to-image diffusion models by selec...
Related: #Diffusion Models
🇺🇸 Multi-Trait Subspace Steering to Reveal the Dark Side of Human-AI Interaction — 20/03/2026 [USA]
arXiv:2603.18085v1 Announce Type: new Abstract: Recent incidents have highlighted alarming cases where human-AI interactions led to negative psychological outcomes, including mental health crises and...
Related: #Human-AI Interaction
🇺🇸 Secure Linear Alignment of Large Language Models — 20/03/2026 [USA]
arXiv:2603.18908v1 Announce Type: new Abstract: Language models increasingly appear to learn similar representations, despite differences in training objectives, architectures, and data modalities. T...
Related: #Model Alignment
🇺🇸 DriveVLM-RL: Neuroscience-Inspired Reinforcement Learning with Vision-Language Models for Safe and Deployable Autonomous Driving — 20/03/2026 [USA]
arXiv:2603.18315v1 Announce Type: cross Abstract: Ensuring safe decision-making in autonomous vehicles remains a fundamental challenge despite rapid advances in end-to-end learning approaches. Tradit...
Related: #Autonomous Driving
🇺🇸 A.I. Agents: They’re Fun. They’re Useful. But Don’t Give Them the Credit Card. — 19/03/2026 [USA]
New A.I. bots can do more than just chat. They can edit files, send emails, book trips and cause trouble.
Related: #Technology Risks
🇺🇸 Security Assessment and Mitigation Strategies for Large Language Models: A Comprehensive Defensive Framework — 19/03/2026 [USA]
arXiv:2603.17123v1 Announce Type: cross Abstract: Large Language Models increasingly power critical infrastructure from healthcare to finance, yet their vulnerability to adversarial manipulation thre...
Related: #Cybersecurity
🇺🇸 Towards Safer Large Reasoning Models by Promoting Safety Decision-Making before Chain-of-Thought Generation — 19/03/2026 [USA]
arXiv:2603.17368v1 Announce Type: new Abstract: Large reasoning models (LRMs) achieved remarkable performance via chain-of-thought (CoT), but recent studies showed that such enhanced reasoning capabi...
Related: #Reasoning Models
🇺🇸 UniSAFE: A Comprehensive Benchmark for Safety Evaluation of Unified Multimodal Models — 19/03/2026 [USA]
arXiv:2603.17476v1 Announce Type: cross Abstract: Unified Multimodal Models (UMMs) offer powerful cross-modality capabilities but introduce new safety risks not observed in single-task models. Despit...
Related: #Multimodal Models
🇺🇸 Mitigating LLM Hallucinations through Domain-Grounded Tiered Retrieval — 19/03/2026 [USA]
arXiv:2603.17872v1 Announce Type: cross Abstract: Large Language Models (LLMs) have achieved unprecedented fluency but remain susceptible to "hallucinations" - the generation of factually incorrect o...
Related: #Information Retrieval
🇺🇸 IndicSafe: A Benchmark for Evaluating Multilingual LLM Safety in South Asia — 19/03/2026 [USA]
arXiv:2603.17915v1 Announce Type: cross Abstract: As large language models (LLMs) are deployed in multilingual settings, their safety behavior in culturally diverse, low-resource languages remains po...
Related: #Multilingual Evaluation
🇺🇸 Privacy and Safety Experiences and Concerns of U.S. Women Using Generative AI for Seeking Sexual and Reproductive Health Information — 19/03/2026 [USA]
arXiv:2603.16918v1 Announce Type: cross Abstract: The rapid adoption of generative AI (GenAI) chatbots has reshaped access to sexual and reproductive health (SRH) information, particularly following ...
Related: #Privacy, #Women's Health
🇺🇸 Safety-Preserving PTQ via Contrastive Alignment Loss — 19/03/2026 [USA]
arXiv:2511.07842v5 Announce Type: replace Abstract: Post-Training Quantization (PTQ) has become the de-facto standard for efficient LLM deployment, yet its optimization objective remains fundamentall...
Related: #Model Compression

Key Entities (15)

AI safety (13 news)
Anthropic (9 news)
Claude (language model) (5 news)
Large language model (4 news)
Dario Amodei (2 news)
South Asia (1 news)
The Verge (1 news)
Regulation of artificial intelligence (1 news)
Niger–Congo languages (1 news)
LSR (1 news)
Ethics of artificial intelligence (1 news)
You Can (1 news)
Dark side (1 news)
AI alignment (1 news)
Credit card (1 news)

About the topic: AI Safety

The topic "AI Safety" aggregates 30+ news articles from various countries.