Moral Foundations of Large Language Models

Moral foundations theory (MFT) is a psychological assessment tool that decomposes human moral reasoning into five factors, including care/harm, liberty/oppression, and sanctity/degradation (Graham et al., 2009). People vary in the weight they place on these dimensions when making moral decisions, in part due to their cultural upbringing and political ideology. As large language models (LLMs) are trained on datasets collected from the internet, they may reflect the biases that are present in such corpora. This paper uses MFT as a lens to analyze whether popular LLMs have acquired a bias towards a particular set of moral values. We analyze known LLMs and find they exhibit particular moral foundations, and show how these relate to human moral foundations and political affiliations. We also measure the consistency of these biases, or whether they vary strongly depending on the context of how the model is prompted. Finally, we show that we can adversarially select prompts that encourage the moral to exhibit a particular set of moral foundations, and that this can affect the model's behavior on downstream tasks. These findings help illustrate the potential risks and unintended consequences of LLMs assuming a particular moral stance.

Discussion(0)

No comments yet. Be the first to comment.

Open reviews(0)

Public, signed peer feedback on this preprint.

No reviews yet.

Publication Info

DOI: 10.48550/arxiv.2310.15337
Year: 2023
Published: —
Language: en

Preprint Details

Link Of The Paper: http://arxiv.org/abs/2310.15337

Timeline

Created:June 19, 2026

Related publications

Article2024

Possibilities and challenges in the moral growth of large language models: a philosophical perspective

Guoyu Wang, Wei Wang, Yiqin Cao, Teng Yan, Qingjie Guo, Haofen Wang, Junyu Lin, Jiaxin Ma, Jin Liu, Ying‐Chun Wang

Ethics and Information Technology

Preprint2024

Are large language models superhuman chemists?

Adrian Mirza, Nawaf Alampara, Sreekanth Kunchapu, Benedict Emoekabu, Aswanth Krishnan, Tanya Gupta, Macjonathan Okereke, Amir Mohammad Elahi, Mehrdad Asgari, J. Eberhardt, Maximilian Greiner, Caroline T. Holick, Christina Glaubitz, Tim Hoffmann, Lea C. Klepsch, Yannik Köster, Fabian Alexander Kreth, Jakob Meyer, Santiago Miret, Michael Ringleb, Nicole C. Roesner, Ulrich Sigmar Schubert, Leanne M. Stafast, Dinga Wonanke, Michael Pieler, Philippe Schwaller, Kevin Maik Jablonka

Preprint2025

Moral Foundations of Large Language Models

Abstract

Discussion(0)

Open reviews(0)

Related publications

Possibilities and challenges in the moral growth of large language models: a philosophical perspective

Are large language models superhuman chemists?

Large Language Models Meet Knowledge Graphs for Question Answering: Synthesis and Opportunities

Deep Binding of Language Model Virtual Personas: a Study on Approximating Political Partisan Misperceptions

Large Language Models Meet Knowledge Graphs for Question Answering: Synthesis and Opportunities