Values in the wild: Discovering and analyzing values in real-world language model inter... - 0 views
-
dr tech on 22 Apr 25"AI models will inevitably have to make value judgments. If we want those judgments to be congruent with our own values (which is, after all, the central goal of AI alignment research) then we need to have ways of testing which values a model expresses in the real world. Our method provides a new, data-focused method of doing this, and of seeing where we might've succeeded-or indeed failed-at aligning our models' behavior."