AI safety
Research area on making AI safe and beneficial
📊 Rating
8 news mentions · 👍 0 likes · 👎 0 dislikes
📌 Topics
- Artificial Intelligence (7)
- Machine Learning (3)
- Data Science (2)
- AI Safety (2)
- Cybersecurity (2)
- Research (1)
- Human Oversight (1)
- Computational Linguistics (1)
- Technology Safety (1)
- Innovation (1)
- Ethics (1)
- Model Interpretability (1)
🏷️ Keywords
AI safety (8) · arXiv (7) · LLM (3) · generative AI (2) · probabilistic reasoning (1) · uncertainty (1) · machine learning (1) · benchmarking (1) · diffusion models (1) · concept unlearning (1) · selective fine-tuning (1) · text-to-image (1) · Debate Query Complexity (1) · Machine Learning (1) · AI Alignment (1) · Human-in-the-loop (1) · Computational tasks (1) · ArcMark (1) · LLM watermarking (1) · multi-bit watermark (1)
📖 Key Information
📰 Related News (8)
-
🇺🇸 Implicit Probabilistic Reasoning Does Not Reflect Explicit Answers in Large Language Models
arXiv:2406.14986v4 Announce Type: replace Abstract: The handling of probabilities in the form of uncertainty or partial information is an essential t...
-
🇺🇸 Selective Fine-Tuning for Targeted and Robust Concept Unlearning
arXiv:2602.07919v1 Announce Type: new Abstract: Text guided diffusion models are used by millions of users, but can be easily exploited to produce ha...
-
🇺🇸 Debate is efficient with your time
arXiv:2602.08630v1 Announce Type: new Abstract: AI safety via debate uses two competing models to help a human judge verify complex computational tas...
-
🇺🇸 ArcMark: Multi-bit LLM Watermark via Optimal Transport
arXiv:2602.07235v1 Announce Type: cross Abstract: Watermarking is an important tool for promoting the responsible use of language models (LMs). Exist...
-
🇬🇧 This AI just passed the 'vending machine test' - and we may want to be worried about how it did
When leading AI company Anthropic launched its latest AI model, Claude Opus 4.6, at the end of last week, it broke many measures of intelligence and e...
-
🇺🇸 Can One-sided Arguments Lead to Response Change in Large Language Models?
arXiv:2602.06260v1 Announce Type: cross Abstract: Polemic questions need more than one viewpoint to express a balanced answer. Large Language Models ...
-
🇺🇸 On the Identifiability of Steering Vectors in Large Language Models
arXiv:2602.06801v1 Announce Type: cross Abstract: Activation steering methods, such as persona vectors, are widely used to control large language mod...
-
🇺🇸 Efficient LLM Moderation with Multi-Layer Latent Prototypes
arXiv:2502.16174v3 Announce Type: replace-cross Abstract: Although modern LLMs are aligned with human values during post-training, robust moderation ...
🔗 Entity Intersection Graph
People and organizations frequently mentioned alongside AI safety:
- 🌐 Large language model (3 shared articles)
- 🌐 Algorithmic bias (1 shared articles)
- 🏢 Anthropic (1 shared articles)
- 🌐 Machine learning (1 shared articles)