#AI Safety
Latest news articles tagged with "AI Safety". Follow the timeline of events, related topics, and entities.
Articles (30)
-
๐บ๐ธ Why Anthropic is saying its new AI model, Mythos, is too dangerous to release
[USA]
Anthropic has announced that it is teaming up with industry competitors to "secure the world's most critical software" from its own AI model, Mythos. New York Times reporter Mike Isaac joins "The Take...
Related: #Cyber Security, #Corporate Ethics -
๐บ๐ธ You Canโt Use This A.I.
[USA]
Claude Mythos Preview is dangerous, Anthropic said. We explain the risks.
Related: #Technology Ethics, #Corporate Responsibility -
๐บ๐ธ Anthropic's powerful new AI model raises concerns about high-tech risks
[USA]
Anthropic announced that it has started a very limited test of its newest AI model called Mythos. It's a model deemed so powerful that the company warned it could cause widespread disruption if it wer...
Related: #Technology Ethics, #Corporate Responsibility -
๐บ๐ธ Anthropic says new AI model too dangerous for public release
[USA]
Anthropic announced this week it will hold back the full release of its new artificial intelligence model as it believes it is too dangerous for the general public at this stage.ย The model, called Cl...
Related: #Technology Ethics, #Corporate Governance -
๐บ๐ธ What we know about Anthropic's new, alarming AI model
[USA]
Anthropic announced its new AI model is too powerful for public release. Puck's Ian Krietzberg joins CBS News with more.
Related: #Technology Ethics, #Corporate Governance -
๐บ๐ธ The Depth Ceiling: On the Limits of Large Language Models in Discovering Latent Planning
[USA]
arXiv:2604.06427v1 Announce Type: cross Abstract: The viability of chain-of-thought (CoT) monitoring hinges on models being unable to reason effectively in their latent representations. Yet little is...
Related: #AI Research, #Model Limitations -
๐บ๐ธ When to Call an Apple Red: Humans Follow Introspective Rules, VLMs Don't
[USA]
arXiv:2604.06422v1 Announce Type: cross Abstract: Understanding when Vision-Language Models (VLMs) will behave unexpectedly, whether models can reliably predict their own behavior, and if models adhe...
Related: #Machine Learning, #Cognitive Science -
๐บ๐ธ ClawLess: A Security Model of AI Agents
[USA]
arXiv:2604.06284v1 Announce Type: cross Abstract: Autonomous AI agents powered by Large Language Models can reason, plan, and execute complex tasks, but their ability to autonomously retrieve informa...
Related: #Cybersecurity, #Formal Verification -
๐บ๐ธ Blending Human and LLM Expertise to Detect Hallucinations and Omissions in Mental Health Chatbot Responses
[USA]
arXiv:2604.06216v1 Announce Type: cross Abstract: As LLM-powered chatbots are increasingly deployed in mental health services, detecting hallucinations and omissions has become critical for user safe...
Related: #Mental Health Technology, #Human-AI Collaboration -
๐บ๐ธ Harnessing Hyperbolic Geometry for Harmful Prompt Detection and Sanitization
[USA]
arXiv:2604.06285v1 Announce Type: cross Abstract: Vision-Language Models (VLMs) have become essential for tasks such as image synthesis, captioning, and retrieval by aligning textual and visual infor...
Related: #Machine Learning, #Cybersecurity -
๐บ๐ธ Why Anthropic won't release its new Claude Mythos AI model to the public
[USA]
Claude Mythos Preview can identify and exploit software vulnerabilities with unprecedented accuracy, the company says.
Related: #Cybersecurity, #Corporate Ethics -
๐บ๐ธ Anthropic claims newest AI model, Claude Mythos, is too powerful for public release
[USA]
Anthropic says its newest AI model, Claude Mythos, is too powerful and dangerous to be released to the public. Tech journalist Jacob Ward joins CBS News to discuss.
Related: #Technology Ethics, #Corporate Responsibility -
๐บ๐ธ How dangerous is Mythos, Anthropicโs new AI model?
[USA]
Dario Amodeiโs warnings should not be dismissed
Related: #Technological Risk, #Industry Ethics -
๐บ๐ธ Anthropicโs Claude Code gets โsaferโ auto mode
[USA]
Anthropic has launched an "auto mode" for Claude Code, a new tool that lets AI make permissions-level decisions on users' behalf. The company says the feature offers vibe coders a safer alternative be...
Related: #Automation -
๐บ๐ธ PowerLens: Taming LLM Agents for Safe and Personalized Mobile Power Management
[USA]
arXiv:2603.19584v1 Announce Type: new Abstract: Battery life remains a critical challenge for mobile devices, yet existing power management mechanisms rely on static rules or coarse-grained heuristic...
Related: #Mobile Optimization -
๐บ๐ธ LSR: Linguistic Safety Robustness Benchmark for Low-Resource West African Languages
[USA]
arXiv:2603.19273v1 Announce Type: cross Abstract: Safety alignment in large language models relies predominantly on English-language training data. When harmful intent is expressed in low-resource la...
Related: #Linguistic Diversity -
๐บ๐ธ Creating with Sora Safely
[USA]
To address the novel safety challenges posed by a state-of-the-art video model as well as a new social creation platform, weโve built Sora 2 and the Sora app with safety at the foundation. Our approac...
Related: #Content Creation -
๐บ๐ธ VLM-AutoDrive: Post-Training Vision-Language Models for Safety-Critical Autonomous Driving Events
[USA]
arXiv:2603.18178v1 Announce Type: cross Abstract: The rapid growth of ego-centric dashcam footage presents a major challenge for detecting safety-critical events such as collisions and near-collision...
Related: #Autonomous Driving -
๐บ๐ธ A Concept is More Than a Word: Diversified Unlearning in Text-to-Image Diffusion Models
[USA]
arXiv:2603.18767v1 Announce Type: new Abstract: Concept unlearning has emerged as a promising direction for reducing the risks of harmful content generation in text-to-image diffusion models by selec...
Related: #Diffusion Models -
๐บ๐ธ Multi-Trait Subspace Steering to Reveal the Dark Side of Human-AI Interaction
[USA]
arXiv:2603.18085v1 Announce Type: new Abstract: Recent incidents have highlighted alarming cases where human-AI interactions led to negative psychological outcomes, including mental health crises and...
Related: #Human-AI Interaction -
๐บ๐ธ Secure Linear Alignment of Large Language Models
[USA]
arXiv:2603.18908v1 Announce Type: new Abstract: Language models increasingly appear to learn similar representations, despite differences in training objectives, architectures, and data modalities. T...
Related: #Model Alignment -
๐บ๐ธ DriveVLM-RL: Neuroscience-Inspired Reinforcement Learning with Vision-Language Models for Safe and Deployable Autonomous Driving
[USA]
arXiv:2603.18315v1 Announce Type: cross Abstract: Ensuring safe decision-making in autonomous vehicles remains a fundamental challenge despite rapid advances in end-to-end learning approaches. Tradit...
Related: #Autonomous Driving -
๐บ๐ธ A.I. Agents: Theyโre Fun. Theyโre Useful. But Donโt Give Them the Credit Card.
[USA]
New A.I. bots can do more than just chat. They can edit files, send emails, book trips and cause trouble.
Related: #Technology Risks -
๐บ๐ธ Security Assessment and Mitigation Strategies for Large Language Models: A Comprehensive Defensive Framework
[USA]
arXiv:2603.17123v1 Announce Type: cross Abstract: Large Language Models increasingly power critical infrastructure from healthcare to finance, yet their vulnerability to adversarial manipulation thre...
Related: #Cybersecurity -
๐บ๐ธ Towards Safer Large Reasoning Models by Promoting Safety Decision-Making before Chain-of-Thought Generation
[USA]
arXiv:2603.17368v1 Announce Type: new Abstract: Large reasoning models (LRMs) achieved remarkable performance via chain-of-thought (CoT), but recent studies showed that such enhanced reasoning capabi...
Related: #Reasoning Models -
๐บ๐ธ UniSAFE: A Comprehensive Benchmark for Safety Evaluation of Unified Multimodal Models
[USA]
arXiv:2603.17476v1 Announce Type: cross Abstract: Unified Multimodal Models (UMMs) offer powerful cross-modality capabilities but introduce new safety risks not observed in single-task models. Despit...
Related: #Multimodal Models -
๐บ๐ธ Mitigating LLM Hallucinations through Domain-Grounded Tiered Retrieval
[USA]
arXiv:2603.17872v1 Announce Type: cross Abstract: Large Language Models (LLMs) have achieved unprecedented fluency but remain susceptible to "hallucinations" - the generation of factually incorrect o...
Related: #Information Retrieval -
๐บ๐ธ IndicSafe: A Benchmark for Evaluating Multilingual LLM Safety in South Asia
[USA]
arXiv:2603.17915v1 Announce Type: cross Abstract: As large language models (LLMs) are deployed in multilingual settings, their safety behavior in culturally diverse, low-resource languages remains po...
Related: #Multilingual Evaluation -
๐บ๐ธ Privacy and Safety Experiences and Concerns of U.S. Women Using Generative AI for Seeking Sexual and Reproductive Health Information
[USA]
arXiv:2603.16918v1 Announce Type: cross Abstract: The rapid adoption of generative AI (GenAI) chatbots has reshaped access to sexual and reproductive health (SRH) information, particularly following ...
Related: #Privacy, #Women's Health -
๐บ๐ธ Safety-Preserving PTQ via Contrastive Alignment Loss
[USA]
arXiv:2511.07842v5 Announce Type: replace Abstract: Post-Training Quantization (PTQ) has become the de-facto standard for efficient LLM deployment, yet its optimization objective remains fundamentall...
Related: #Model Compression
Key Entities (15)
- AI safety (13 news)
- Anthropic (9 news)
- Claude (language model) (5 news)
- Large language model (4 news)
- Dario Amodei (2 news)
- South Asia (1 news)
- The Verge (1 news)
- Regulation of artificial intelligence (1 news)
- NigerโCongo languages (1 news)
- LSR (1 news)
- Ethics of artificial intelligence (1 news)
- You Can (1 news)
- Dark side (1 news)
- AI alignment (1 news)
- Credit card (1 news)
About the topic: AI Safety
The topic "AI Safety" aggregates 30+ news articles from various countries.