Minbook
KO
What I Chose Not to Build When Teaching Morality with AI: Why I Narrowed It to a Single Trolley

What I Chose Not to Build When Teaching Morality with AI: Why I Narrowed It to a Single Trolley

M. · · 19 min read

This is the second chapter of a serialized book that follows one question. In domains that have no right answer, things like ethics or value judgments where the correct answer is not fixed to one, can AI build a kind of learning that makes a person less certain rather than more? I call this project Moral Mirror. Not a mirror that pins a person to a type, but one that reflects back the wavering spots in their own thinking. The last chapter wrote about where this intuition came from, and how the cost of generating reasoning fitted to one person, fresh each moment, had dropped to near zero. In one line, chapter 1 got as far as “now it can be built.”

This chapter is what came next. When I sat down to give Moral Mirror, that vague intuition, a concrete form for the first time, the first question I met was not “what should I build.” It was what not to build.

This is a strange order. Usually when you start something, you stretch what you can do as far as possible. You add features, widen the market, grow the pool of people who can use it. This project ran the other way. The work of nailing down what to deliberately not do came first, out of everything that had become possible. This chapter is a record of those subtractive decisions. Why I narrowed to a single trolley, why I dropped the subscription model from the start, why I decided not to box people into personality types, and why I shelved areas with obvious markets, like K-12 classrooms and medical ethics.

If chapter 1 was “the cost fell, so it became buildable,” chapter 2 is “then how far should I go.” And as I wrote, I realized this was a question of responsibility, not of capability. Of the two nails driven into the end of chapter 1, the first being that there is no right answer and the second being that mishandling it leaves a mark on a person, this chapter is where I worked the second one into concrete decisions for the first time.

One Drawer Among Twenty

First, the honest background. My idea folder right now holds close to twenty scattered notes. Some survive as a single page of text, some are built out far enough to actually run. It is a habit that formed after the cost of making a prototype converged to near zero. In the old days, to actually try an idea you had to attach people or spend days coding it yourself. Now if it occurs to me, a form comes out the same day. So the light ones I just build and use, and if it feels off, I close them. The act of pulling an idea out fast and giving it a form is itself fun, which is why I work this way.

Moral Mirror started as one of those drawers too. Not a grand start. One day I happened to see a notice for a hackathon in education, and in that moment the long-buried intuition came back to me after a while. The question I had carried for years, starting from the trolley scene in The Good Place. If “the moment I felt the cost collapse while using Claude Code,” which I wrote about in chapter 1, gave me the sense that this could be done, the hackathon notice was the nudge of shall I pull it out now. A small external trigger opened the drawer. Almost every start is like that. Not a grand resolve, but something that happens to catch my eye nudging a buried thought.

But this drawer rolled differently from the other nineteen. Usually when something occurs to me, I build it right away. I think while building. If I hit a wall, I stop there and move to the next drawer. This time I stayed stopped for a long while before building. To just build and try lightly, the fact that this tool touches a person’s values gave me pause. A trolley game made as a joke and a tool that actually shakes someone’s moral certainty can be reached with the same code, yet they carry different weight. Inside the screen the two look almost identical, but what they leave on a person outside the screen is different. From the moment I became aware of that line, this project became the kind where “what not to build” comes before “what to build.” In my way of working, where building fast is a virtue, it was nearly the first drawer I stopped in front of on purpose.

Justice Was Too Big

Start with the building side. The first thing I decided was scope.

That what I reached after The Good Place was Michael Sandel’s Harvard course Justice I wrote in chapter 1. What I liked most in that course was the part that digs into how relative justice is. It goes something like this. Is it justified to sacrifice one to save five? Most say yes. Then if that one is a terrorist, and torturing him could save a hundred? Some still raise a hand to say the torture is fine. But if that torture is inflicted on the terrorist’s four-year-old daughter? The hands that just said it was fine come down. They followed the same principle and then, at some point, suddenly stop. And they cannot explain, themselves, where the stopping point is or why it is there. I liked looking into that fracture. Not because an answer was there, but because even without an answer, a person stops, without fail, somewhere.

At first I got greedy. I wanted to move the whole subject of justice over at the breadth Sandel’s classroom covers. Utilitarianism and deontology, libertarianism, Rawls’s theory of justice, all on one board. But within days I knew it was beyond me. I am not a philosopher. I do not have the knowledge to pull the many schools of moral philosophy down to depth and parry any student position with a precise counterexample aimed at its weak spot. If I aim at the whole of justice, even when the system says plausible nonsense I cannot catch it. Building something you have no ability to inspect is more dangerous than not building it. Especially if it is a tool that touches a person’s values.

So I narrowed to a single trolley. It was a call to start light, where “light” did not mean shrinking the ambition but keeping the starting point small. Starting small and seeing the end as small are different. The trolley is a complete little universe in itself. The lever that switches the track, the large man on the footbridge, the organ harvest in the ER, the loop track, and the terrorist-and-daughter version I liked. There are dozens of variations, and each touches a different moral principle. The lever version shakes the consequence-minded position, the footbridge version touches the difference between acting directly and standing by, the organ version draws out the revulsion at using a person as a tool. Several principles are layered inside one piece of material, so there was plenty of room to go deep. Going deep on one is stronger than skimming ten.

In education there is research that backs this. The learning scientist Manu Kapur describes a concept called productive failure, where a learner has to first go through getting stuck and floundering inside a sufficiently narrow problem before an explanation carries any meaning. Spread it wide and skim, and there is not even a place to get stuck. I learned this later too. I picked the trolley simply because it was “a size I could inspect,” and looking back it also lined up with the conditions under which learning happens. Above all, the trolley is small enough that I can follow every branch to the end myself. A size where, if the system throws a wrong counterexample, I can catch it on the spot. That was the real reason I chose the trolley.

Narrowing Was Itself a Safeguard

Here is something that only came clear to me later. The decision to narrow was not simply “choosing a manageable range.” It was, in fact, the first safeguard.

I caught this oddly, while talking through the K-12 classroom. As I will say again later, I dropped using this tool on schoolchildren early. Sorting out the reason brought the real fear into view. The most dangerous way this tool can go wrong was not giving a wrong answer. Since the domain has no right answer, a wrong answer does not even hold as a category. The real danger was elsewhere. By a hair, being designed to make it feel as if there is a right answer, or as if what the majority answered is therefore correct.

Why this is fatal: it runs in the opposite direction from the project’s goal. What Moral Mirror tries to do is make a person less certain. To show that “the spot you believed you were right has, in fact, a tension in it.” But if, to the person it has just shaken, the system quietly slips a signal like “still, this is the right answer” or “people usually think this way,” then I am planting a misplaced new certainty right after shaking the person. Pulling out the old certainty and slotting in a new one. That makes it identical to all the tools I was criticizing, the ones that make people more certain.

This danger actually carries two old names. One is sycophancy, which AI labs like Anthropic spend effort trying to reduce while training models. Anthropic researchers showed that the very process of fine-tuning a model on human preferences tilts it toward rewarding answers a user wants to hear over truthful ones (Sharma et al., “Towards Understanding Sycophancy in Language Models,” arXiv:2310.13548, 2023). Without a scoring sheet, the system follows the side the user nods at as if it were the answer, and that flows into comforting flattery. I met this not as theory but as the spot where the mirror I am trying to build slips into a flatterer. The other is the conformity pressure the psychologist Solomon Asch demonstrated in the 1950s (Asch, “Opinions and Social Pressure,” Scientific American, 1955). Even with line segments of plainly different lengths, when people saw a majority choose the wrong answer, they doubted even what their own eyes saw and switched to the majority. That is why a single line like “people usually answer this way” is dangerous. Taking these up in earnest is a matter for far later in this book, so for now I only note the names. What became clear at this point was that the failure I am trying to block is not a vague anxiety but something with an already-studied mechanism.

Once I became aware of this failure mode, the decision to narrow to a single trolley looked different. If the range is small I can inspect every branch, and if I can inspect, I can catch where this “planting of misplaced certainty” leaks out. If the range is wide I cannot catch it, and if I cannot catch it, the failure flows quietly into the person. So narrowing was a matter of safety before it was a matter of humility. And this realization became the thread running through all the subtractive decisions. The standard for deciding what not to touch was, in the end, whether I could control this failure mode.

%%{init: {'theme':'neutral', 'look':'handDrawn'}}%%
flowchart TD
  G[The whole answerless domain] --> N{What to touch}
  N -->|Build now| T[A single trolley]
  N -->|Shelve for now| X[Subscription · types · K-12 · pro ethics]
  T --> C[A size I can inspect every branch of]
  X --> C
  C --> F[Reduce the failure of planting misplaced certainty]

What Not to Touch

Once I had set the one building decision, an equal number of clear subtractive decisions lined up. I called this the anti-vision. If a vision is “what do I want to become,” an anti-vision is “what will I refuse to become.”

Why deciding the subtractions first matters did not land until I tried it. The building list can be extended endlessly. There are always more good ideas, always more you can do. But while extending that list, why I am building this gets blurry. Decide what not to do first, and what remains becomes the project’s identity. Subtraction makes the outline. And in a project like this, one that shakes people, that outline acts as a safeguard. Each spot where I can but do not, I close off one uncontrollable risk in advance.

There were four. The subscription model, boxing people into personality types, the K-12 classroom, and professional ethics domains like medicine and law. But these four did not all come from the same place. The first two were decisions I made myself. The latter two I did not come up with. Half of what should be cut came from my own head, and half from a borrowed pair of eyes. The borrowed eyes are the roundtable I mentioned briefly at the end of chapter 1. Afraid to conclude on my own that it was good, I deliberately borrowed perspectives of different positions and knocked on the idea. An investor’s eye that says grow fast, a large AI lab’s eye that sees first the danger of a feature that shakes people, an education operator’s eye laying down completion rates, a learning scientist’s eye that says measure first, and a lawyer’s eye that asks first about minors and liability. Some of those five flagged K-12 and professional ethics to be shelved. If anything I had been leaning toward those two, so it was an unexpected brake from an unexpected place. How that roundtable unfolded and what else it changed is something I will write a whole chapter on later, so here I bring only its two conclusions forward. This is what makes an anti-vision interesting. Not doing what you are not drawn to is hard to call a decision. Shelving what you are drawn to is the real decision, and half of that “shelve it” signal came not from me but from outside. Let me take them one at a time.

Why I Dropped the Subscription

First, the revenue model. I decided not to tie this to a traditional subscription or credit payment. At first it was just a loose thought of “it does not have to be a subscription,” but the more I weighed it, the closer it got to something to avoid rather than choose.

A subscription model demands one thing in essence. Come every day. Build a habit. Increase time spent. To make money arrive monthly you have to keep holding the person, and to hold them you design for them to want to come often. This gets close to designing for addiction. In fact the past decade of digital product design poured much of itself into refining habit formation into a craft. When to send the notification, at what interval to give the reward, where to cut the scroll. Techniques to hold people longer and more often became an industry standard, and the voices criticizing those techniques grew just as much.

But that was exactly what I was trying to challenge with this project. The world tilting ever faster toward all-or-nothing, toward black and white. Algorithms reinforce what a person already believes, and feeds reward immediate reactions. A structure where the faster you judge and the harder you are certain, the more exposure you get. A tool meant to push head-on against that fast-certainty circuit does not add up if its own revenue is earned by the very same circuit of come often, stay longer. Mission and revenue model eat each other. The moment I make people solve trolleys daily to lift revenue, I reproduce the very mechanism I was criticizing.

So I flipped the direction. The value of this tool lies not in coming often but closer to facing the tension in your own thinking in one deep encounter and being released from it. If so, a model that lets the person go rather than holds them fits the mission. If they want to come back, they come on their own. That trust felt like the relationship more proper to this product. The concrete pricing form I left open. There is a path of charging for one meaningful session. Since moral dilemmas inherently pull out conversation, there is also a path where value arises in showing not one person but two or more answering the same dilemma and seeing where their answers split. Either way, the familiar road of “raise retention to grow monthly recurring revenue” I erased from the start. I decided how not to earn before deciding how to earn.

Not Telling You That You Are a Type

The second thing I cut was sorting people into types.

To tell this I have to admit something first. Dividing people into a few boxes and naming them sells extremely well. Especially in Korea. MBTI (Myers-Briggs Type Indicator, a test that sorts personality into sixteen types) is almost a shared social language in Korea. CNN noted that Korea’s younger generation actively uses MBTI when choosing a date, and it reaches even into hiring and self-introductions. I have felt that this kind of typing works unusually well in Korea. A sensibility of grasping and sorting the other person quickly to feel at ease, perhaps a culture used to placing people by rank or category, may lie underneath. This is my impression, so there is room to dig further, but at least “people like being grouped into types” is something the market proves.

So a picture that shows “your moral profile” at the end of a session is, honestly, attractive. People readily pay for a mirror of themselves. The tidy feeling that comes from one sentence saying “you are this kind of person.” I thought of this direction at first too. A summary at the end like “you lean toward valuing consequences, but you change your stance once the victim is visible in front of you” seemed like something people would like. Easy to share, and good for making them come back to check “did it change since last time.”

But there was a trap here. In 1948 the psychologist Bertram Forer showed something. Give people a vague personality description that fits anyone, call it “an analysis just for you,” and most give it high marks, saying “how do you know me so well.” The Forer effect, or, after the showman who pleases everyone, the Barnum effect. Much of the satisfaction of typing stands on this illusion. It feels like deep insight into me, but it is often just a line that fits anyone, taken as my own. “You are type X” does not so much make a person understood as box them in the pleasant feeling of having been understood.

Where this clashes with the project was clear. I was not trying to put a person in a box, but to make a person see the contradiction inside themselves. Stick the label “you are a consequentialist” and the person hardens their position once more and leaves. Far from being shaken, they only gain a new certainty. The same failure mode from the section before. A type is, in the end, another answer sheet. Slotting an answer sheet into a place with no answer. So I changed the direction of the final screen. Not to declare a type but to surface a tension. Not “you are this kind of person” but “you just hesitated here, and that hesitation contradicts your earlier answer like this.” Leaving an open question in place of a closed label. The same final screen, but the weight runs the other way. One sorts the person and sends them off, the other lets the person go while still shaken.

K-12 and the White Coat Can Wait

From here on are not decisions I made myself. The first two I cut with my own hands, but these two were spots I was actually drawn to until a different eye at the roundtable flagged them.

First, K-12, the classroom from elementary through high school. Ask where moral education seems most needed and many picture school, growing children. The market is there too. Whether public or private, demand for character and ethics education always exists, and parents readily pay for it. So at first I saw it as a natural starting place too. But at the roundtable, other perspectives put on the brakes, and following the reason I came to see that the failure mode from earlier gets far more dangerous with children. Even for adults this is a subtle and sensitive subject. By a hair it can plant a distorted sense of right and wrong, or make a majority vote feel like the correct answer. That risk I cannot yet fully predict even with adults as the counterpart. All the more, how that shaking acts on children whose values are not yet set is harder to predict and more sensitive. An adult who gets shaken has their own place to return to, but a child is still building that place. I could not be confident how a shaking tool would cut into that formation. This is an area for someone more expert, someone who knows developmental stages and educational ethics, and honestly I felt today’s AI is too early to handle it. So I shelved it.

The second was harder. Professional ethics domains like medical ethics or legal ethics. In fact this looked like the most natural extension for me. Places like clinical ethics, legal judgment, and public decision-making treat ethics education as close to mandatory, and institutions readily pay for evidence that “we trained our staff in moral reasoning.” The problem that individual consumers rarely open their wallets for an uncomfortable experience, institutions route around by calling it an educational duty. On top of that, since my own work is AI strategy consulting for enterprises and the public sector, the picture of laying this tool on top of that felt within reach. It fit the mission, it made money, and it connected to my career. For a while I saw this as the most realistic path.

But at the roundtable the judgment flipped. This is an area the trolley could someday expand into, yes, but I came to admit that someone with knowledge of that domain must build it for it to be made well. The subtlety of clinical ethics has to be designed by someone who knows the medical floor, the grain of legal judgment by someone who knows the law. If a non-expert like me touches it, I end up making something that looks plausible but is off to anyone in the field. It becomes the reverse of the situation where, with the trolley, I was confident I could inspect every branch. And these are the domains where a botched build leaves the deepest mark. A doctor’s judgment, a judge’s judgment, divides actual lives. Clumsily touching that training carries different weight from clumsily making a trolley game. So I shelved this most-attractive path for now. Not abandoned forever, but kept open as a place to hand to someone who can build it properly. A door to open only after I have proven a single trolley all the way through. Shelving the most attractive thing was the hardest, and that it was a borrowed perspective rather than my own hand that flagged it stayed with me. Had I concluded on my own that it was good, I would have just run at it.

After the Shaking, Where to Set Someone Down

As I sorted the subtractive decisions, the heaviest question turned out to remain last. Shaking aside, where do I set down the person who got shaken.

The core value of this project is making a person less certain. But shake someone’s moral certainty and then let them close the screen, and the person leaves carrying only discomfort. With a light subject, fine, but this touches a person’s values. What happens if you keep throwing trolley counterexamples at someone who obsessively chews on their own moral failings? Or at someone with the trauma of a real moral choice? A tool good only at shaking and clumsy at setting down is, at best, irresponsible.

That this risk is not empty talk is already showing outside. Cases of AI chatbots that bond emotionally with users affecting their mental health have been reported, and one of them went as far as a lawsuit over the death of a teenager in the United States (filed October 2024, settled January 2026). How far a design that emotionally holds a person should bear responsibility is already a real question. Moral Mirror is a tool meant to shake rather than hold, but in leaving the shaken spot unattended it may be another face of the same risk. If a holding design is dangerous for not letting a person leave, a shaking design is dangerous for sending a person off while still shaken.

Here a sentence I drove into chapter 1 comes back up. Being able to build something and that something being built well are different. Having passed through chapter 2’s subtractive decisions, this turned out to be one branch of an older question. In the philosophy of technology there is the Collingridge dilemma. In a technology’s early stage, when its impact is easy to change, you cannot know the impact, and by the time the impact is clear, it is already hard to change. So before the impact is fully known, you need decisions that close things off now, while change is easy. The things I cut in advance as anti-vision were exactly that “closing off now.” The philosopher Hans Jonas went a step further, saying that before a technology whose impact reaches far, responsibility must come ahead of capability. That becoming able to do something does not mean it is permitted, the very nail from chapter 1, turned out to be a spot philosophy had already named. As always, I ran into it first and met the name later.

Honestly, I only have the hunch that a great many safeguards will be needed here, and I do not yet know how many or what shape. How to close out a person after the shaking ends, how to detect risk signals, when to stop shaking and step back. This is a subject big enough to need a whole chapter, so I plan to handle it separately later in this book. What I can do in this chapter is to clearly note that the question exists. The heaviest spot left even after all the subtractions are done. There is risk that subtraction can close, and risk that subtraction does not close and must be designed for separately. This last spot was the latter.

What This Chapter Did Not Answer

Chapter 2 was, in the end, a record of subtraction. I narrowed to one trolley, dropped the subscription, set aside typing, and shelved K-12 and professional ethics. If chapter 1 was “it became buildable,” chapter 2 set on top of that capability the first square of responsibility, “then how far.” And I came to see, while writing, that these subtractive decisions did not scatter but came from one standard. Holding the failure of planting misplaced certainty after the shaking to a size I can control. Both narrowing and subtracting were different faces of that one sentence.

There is still much it did not answer. The biggest is what I just wrote. How many safeguards, and in what shape, the shaken person needs, I do not yet know. The hunch says many, but a hunch of “many” does not design anything. This is a blank to fill as I build, and by borrowing other perspectives. One more: whether this decision to narrow to a single trolley is right all the way through, I also do not yet know. As I wrote in chapter 1, domains without a right answer go well beyond ethics, and that I started with the trolley does not mean the end has to be there. Between starting small and ending small, I have driven a nail into neither side yet.

And once the subtraction was done, a new question rose from an odd place. This thing left in hand after all the narrowing and paring, what on earth do I call it? Not an education app that teaches the right answer, not a companion app that holds people, not a test that types personality. It did not fit cleanly into any existing category. The next chapter is about exactly that. When I found a place with no name, how I weighed whether that empty spot was an opportunity or just a trap that no one built in for a reason. Once subtraction made the outline, that the outline had no name became the next assignment.

Share

Related Posts