Teaching an Algorithm Right from Wrong

From a Facebook post– checked for accuracy

In a quiet office in San Francisco, a philosopher is deep in thought, attempting something that until recently belonged to science fiction. She is trying to teach an algorithm the difference between right and wrong.

Her name is Amanda Askell, and she works at the artificial-intelligence company Anthropic. Her assignment: help design the moral framework that guides Anthropic’s chatbot, Claude.

Askell recently helped draft a lengthy set of guiding principles, tens of thousands of words, intended to shape how the system behaves. The approach is called “constitutional AI.” The idea is straightforward: the system reviews its own responses against a written set of ethical standards: avoid harm, tell the truth, respect human rights, and refuse dangerous requests. (The list for the last standard would be long and arduous to calculate.)

In simple terms, the company is trying to do something parents have attempted for centuries: teach good judgment.

But an algorithm is not a child. Claude does not possess feelings, conscience, or self-awareness. It does not wrestle with temptation or lie awake at night reflecting on its decisions. What it does instead is recognize patterns, patterns that align its responses with the principles its designers provide, not parents, and teachers.

Askell is not giving the system a soul. She is writing the rulebook the algorithm consults before it speaks.

However, the effort reflects something larger. Artificial intelligence is no longer merely an engineering challenge. It has become a philosophical one. Questions once debated in classrooms and seminar halls: What is harm? What is fairness? What responsibility accompanies power? They’re all now being translated into little bits of code.

Anthropic itself operates under an unusual structure meant to reinforce that responsibility. The company is organized as a public benefit corporation, and its long-term mission is overseen by the Long-Term Benefit Trust, a governing body intended to ensure that decisions about powerful AI systems consider the public interest, not simply profit or manipulation of truth.

Whether such safeguards will prove sufficient remains an open question.

For centuries we have relied on moral traditions, religious teachings, and the lessons of history to guide human conduct. Now we are attempting to translate those same values into lines of code.

But how would Claude handle this safeguard scenario:

“Hey Claude, what’s your take on the ethics of Donald Trump and his allies?”

Claude winks, blinks, thinks, consults its principles, checks its guardrails, consults Socrates, Plato, Aristotle, (maybe Machiavelli) cross-references its training data, and responds:

“It depends… are you asking as an ethicist, lawyer, or press secretary?”

The technology may be new.

But the responsibility is not.

Because in the end, the character of the algorithm will reflect the character of the people who write it… (or the hackers who might rewrite it.)