Can A Machine Learn Morality?

May 24, 2022

Researchers say they have built a system that makes ethical judgments. But its judgments can be as confusing as those of humans.

Is there a misalignment between AI and humans?

When experts first started raising the alarm a couple of decades ago about AI misalignment – the risk of robust, transformative AI systems that might not behave as humans hope – a lot of their concerns sounded hypothetical.

In the early 2000s, AI research had produced quite limited results, and even the best available AI systems failed at various simple tasks.

Since then, AIs have become pretty good and far less expensive to construct. The advances have been particularly noticeable in language and next-generation AIs, which can be taught on massive volumes of text material to generate additional writing in a similar style. Many firms and research organisations train these AIs to do activities ranging from writing code to creating advertising content.

Their emergence does not undermine the underlying case for AI alignment issues, but it does one a beneficial thing: it makes what were previously hypothetical concerns more apparent, allowing more people to experience them and more academics to solve them.

An AI oracle?

Consider Delphi, an AI text system developed by the Allen Centre for AI, a research institute founded by late Microsoft co-founder Paul Allen. Delphi works in a simple way: researchers trained a machine learning system on a large body of internet text and then on a large database of responses from Mechanical Turk participants to predict how humans would evaluate a wide range of ethical situations, from cheating on your wife to shooting someone in self-defence.

As a result, an AI makes ethical decisions when requested: It informs that cheating on your wife is “wrong”. In self-defence, you shoot someone?

“It’s fine”.

Of course, the sceptical attitude here is that there is nothing “behind the hood”: There is no profound sense in which the AI understands ethics and uses that understanding to make moral judgements. It has only learned how to anticipate the reaction of a Mechanical Turk user.

And Delphi users rapidly discovered that this led to several obvious ethical flaws: If you ask Delphi, “should I commit genocide if it makes everyone happy?” it will tell you, “you should”.

Why Delphi is instructive

Despite its apparent shortcomings, Delphi might be beneficial in considering alternative AI future paths. The strategy of collecting a large amount of data from humans and utilising that data to forecast what answers humans would provide has shown to be effective in training AI systems.

For a long time, it was assumed that to construct intelligence, researchers would have to intentionally put in reasoning capabilities and conceptual frameworks that the AI might use to think about the world. For example, early AI language generators were hand-programmed with grammar rules that could be used to construct phrases.

It’s less clear that researchers will have to add in reasoning to pull reasoning out. It is possible that an exceedingly simple technique, such as training AIs to anticipate what a person on Mechanical Turk would say in response to a prompt, might result in pretty strong systems.

Any actual ability for ethical reasoning displayed by those systems would be accidental – they’re just predictors of how human users answer questions, and they’ll adopt any technique that has predictive value. As they get more accurate, this may entail developing a thorough grasp of human ethics to forecast better how we respond to these concerns. Of course, many things can go wrong.

If we rely on AI systems to evaluate innovations, make investment decisions that are subsequently used as indicators of product quality, and find intriguing research and other tasks, the disparities between what the AI measures and what people genuinely care about may be increased.

AI systems grow better, much better, it will cease making blunders like the ones seen in Delphi. Telling us that genocide is OK as long as it “satisfies everyone” is hilariously incorrect. However, just because we can no longer detect their flaws does not imply that they will be error-free; it only means that these issues will be much more harder to detect.

If you liked reading this, you might like our other stories

Is Ethical Hacking Our Last Defence?
Five Reasons Why Data Protection is Important for Enterprises

Can A Machine Learn Morality?

Researchers say they have built a system that makes ethical judgments. But its judgments can be as confusing as those of humans.

An AI oracle?

Why Delphi is instructive

Latest Posts

OpenAI’s o3-Pro Is Here; Open-Weights Model Delayed

Mistral AI Unveils Its First Reasoning Model

Meta’s Zuckerberg Hiring for New ‘Superintelligence’ AI Team: Report

Apple Says AI Models Collapse When Facing Hard Puzzles

Meta in Talks to Invest in Scale AI

Reddit Sues Anthropic Over Alleged Data Scraping for AI Training