The Specification Trap: Why Content-Based AI Value Alignment Cannot Produce Robust Alignment
📖 Full Retelling
arXiv:2512.03048v2 Announce Type: replace
Abstract: I argue that content-based AI value alignment--any approach that treats alignment as optimizing toward a formal value-object (reward function, utility function, constitutional principles, or learned preference representation)--cannot, by itself, produce robust alignment under capability scaling, distributional shift, and increasing autonomy. This limitation arises from three philosophical results: Hume's is-ought gap (behavioral data cannot en
📄 Original Source Content
arXiv:2512.03048v2 Announce Type: replace Abstract: I argue that content-based AI value alignment--any approach that treats alignment as optimizing toward a formal value-object (reward function, utility function, constitutional principles, or learned preference representation)--cannot, by itself, produce robust alignment under capability scaling, distributional shift, and increasing autonomy. This limitation arises from three philosophical results: Hume's is-ought gap (behavioral data cannot en