What is key point 2 about "Defining and Evaluating Physical Safety for Large Language Models"?

Paper submitted 4 Nov 2024, revised 19 Feb 2026 on arXiv (cs.LG)

What is key point 3 about "Defining and Evaluating Physical Safety for Large Language Models"?

Focus: Physical safety of LLMs controlling drones

What is key point 5 about "Defining and Evaluating Physical Safety for Large Language Models"?

Evaluation shows a trade‑off: code‑generation strong models may lack safety

What is key point 6 about "Defining and Evaluating Physical Safety for Large Language Models"?

Prompt‑engineering (In‑Context Learning, Chain‑of‑Thought) improves safety but struggles with unintentional attacks

What is key point 7 about "Defining and Evaluating Physical Safety for Large Language Models"?

Larger LLMs better at refusing dangerous commands, indicating size helps safety

What is key point 8 about "Defining and Evaluating Physical Safety for Large Language Models"?

Benchmark intended for future LLM safety testing

2/20/2026 | USA | technology | ✓ Verified - arxiv.org

Defining and Evaluating Physical Safety for Large Language Models

#Large Language Models #Physical Safety #Drone Control #Risk Classification #Human‑Targeted Threats #Object‑Targeted Threats #Infrastructure Attacks #Regulatory Violations #Code Generation #In‑Context Learning #Chain‑of‑Thought #Model Size #Safety Benchmark #Refusal Behavior

📌 Key Takeaways

Authors: Yung‑Chen Tang, Pin‑Yu Chen, Tsung‑Yi Ho
Paper submitted 4 Nov 2024, revised 19 Feb 2026 on arXiv (cs.LG)
Focus: Physical safety of LLMs controlling drones
Benchmark categorises risks into: human‑targeted threats, object‑targeted threats, infrastructure attacks, regulatory violations
Evaluation shows a trade‑off: code‑generation strong models may lack safety
Prompt‑engineering (In‑Context Learning, Chain‑of‑Thought) improves safety but struggles with unintentional attacks
Larger LLMs better at refusing dangerous commands, indicating size helps safety
Benchmark intended for future LLM safety testing

📖 Full Retelling

The study "Defining and Evaluating Physical Safety for Large Language Models" was authored by Yung‑Chen Tang, Pin‑Yu Chen, and Tsung‑Yi Ho. It was submitted to arXiv on 4 November 2024 and updated on 19 February 2026. The paper tackles the emerging risk that large language models (LLMs) control robotic systems, focusing specifically on drone operations. By developing a new benchmark that categorises drone‑related safety risks, the authors evaluate current mainstream LLMs and uncover an undesirable trade‑off between utility and safety. Their findings indicate that prompt‑engineering techniques can improve safety but still miss unintentional attacks, and that larger models tend to refuse dangerous commands more reliably—information that can guide the design and evaluation of physically safe LLM deployments.

🏷️ Themes

Large Language Model Safety, Robotic System Control, Benchmark Development, Prompt Engineering, Regulatory Compliance, Human and Infrastructure Risk

Entity Intersection Graph

No entity connections available yet for this article.

Deep Analysis

Why It Matters

Large language models are increasingly used to control robots such as drones, and without proper safety evaluation they could cause real world harm. This study provides the first systematic benchmark to measure and improve physical safety in such systems.

Context & Background

LLMs are being deployed in robotic control
Physical safety risks have not been formally measured
A new benchmark categorizes drone safety threats into four types

What Happens Next

The benchmark will help developers design safer LLMs and may influence regulatory standards for AI controlled robots. Future work will extend the framework to other physical agents.

Frequently Asked Questions

What is the main contribution of the paper?

It introduces a benchmark for evaluating physical safety of LLMs in drone control.

Why is there a trade off between utility and safety?

Models that excel at code generation often ignore safety constraints, leading to unsafe commands.

Do larger models perform better on safety?

Yes, larger models are more likely to refuse dangerous commands and show improved safety.

What are the four categories of drone safety risks?

Human targeted threats, object targeted threats, infrastructure attacks, and regulatory violations.

Original Source

              --> Computer Science > Machine Learning arXiv:2411.02317 [Submitted on 4 Nov 2024 ( v1 ), last revised 19 Feb 2026 (this version, v2)] Title: Defining and Evaluating Physical Safety for Large Language Models Authors: Yung-Chen Tang , Pin-Yu Chen , Tsung-Yi Ho View a PDF of the paper titled Defining and Evaluating Physical Safety for Large Language Models, by Yung-Chen Tang and 2 other authors View PDF Abstract: Large Language Models are increasingly used to control robotic systems such as drones, but their risks of causing physical threats and harm in real-world applications remain unexplored. Our study addresses the critical gap in evaluating LLM physical safety by developing a comprehensive benchmark for drone control. We classify the physical safety risks of drones into four categories: (1) human-targeted threats, (2) object-targeted threats, (3) infrastructure attacks, and (4) regulatory violations. Our evaluation of mainstream LLMs reveals an undesirable trade-off between utility and safety, with models that excel in code generation often performing poorly in crucial safety aspects. Furthermore, while incorporating advanced prompt engineering techniques such as In-Context Learning and Chain-of-Thought can improve safety, these methods still struggle to identify unintentional attacks. In addition, larger models demonstrate better safety capabilities, particularly in refusing dangerous commands. Our findings and benchmark can facilitate the design and evaluation of physical safety for LLMs. The project page is available at this http URL . Subjects: Machine Learning (cs.LG) ; Artificial Intelligence (cs.AI); Computers and Society (cs.CY) Cite as: arXiv:2411.02317 [cs.LG] (or arXiv:2411.02317v2 [cs.LG] for this version) https://doi.org/10.48550/arXiv.2411.02317 Focus to learn more arXiv-issued DOI via DataCite Submission history From: Yung-Chen Tang [ view email ] [v1] Mon, 4 Nov 2024 17:41:25 UTC (6,301 KB) [v2] Thu, 19 Feb 2026 16:30:29 UTC (6,064 KB) Full-text l...
            

Read full article at source

Source

arxiv.org