July 12, 2023
My least favorite response to an accident is “you have to be more careful”. I spilled milk in the fridge because I didn’t cap the bottle properly: you have to be more careful. A driver didn’t see a stop sign: you have to be more careful. A machine broke after someone swapped two wires: you have to be more careful. An operator formatted the wrong disk: you have to be more careful.
You have to be more careful!
I adhere to a philosophy where the phrase “you have to be more careful” does not exist. In this philosophy, it is not a single thing or person that causes accidents, but a system that allows accidents to happen. When I don’t cap the bottle, the driver doesn’t see the stop sign, the technician swaps two wires, or the operator formats the wrong drive, that’s not the cause of the accident, but the last step in a sequence that should have stopped sooner.
A person cannot “be more careful” all the time; people get distracted or tired or forget stuff. Therefore, we can’t depend for our safety on someone being always attentive; in fact, it should be unnecessary.
One way to achieve this is to design the system so that it’s impossible to make a mistake. For example, if I spill the milk when I screw the cap on wrong, the solution is not to always check that the lid is on straight but to use a different type of stopper that won’t go askew. If swapping two wires can destroy a machine, the solution is to put different plugs on each wire to ensure that it only fits the correct socket.
To accomplish this goal, sometimes we need to use our imaginations or get help from other fields, like ergonomics. For example, after someone lost a hand to a hydraulic press, the factory owners modified the controls to force the operator to press two buttons simultaneously to engage the press, which ensured both hands would be out of harm’s way.
Another way to avoid relying on a human’s attention span is to design the system so that whenever there is a problem, it is detected or corrected automatically. For example, trains have electronic signaling systems that will stop them automatically if they exceed the speed limit or run a red light. Planes have triple-redundant sets of computers where, if one of the computers starts malfunctioning, the other two computers can detect and disable it, alert the pilot, and keep flying the plane.
You see this philosophy’s positive effect when you compare an air accident investigation to a traffic accident investigation. When an airplane crashes, investigators try to find the root cause of the accident and issue recommendations to adopt systems that will prevent, detect, or correct the problem. As a consequence, air travel is safer every year.
By contrast, when there is a traffic accident, the driver always gets the blame: he was speeding, she was distracted by the phone, he fell asleep, or she was driving drunk. The Spanish road network has more than two hundred “black spots,” with over three accidents occurring every year. Isn’t it convenient that drivers always speed, drive distracted, and drink in the same places every time?
I encourage you to embrace this philosophy. After an incident, don’t just blame the operator; conduct a root cause analysis and issue recommendations to prevent, detect, and fix it before it becomes a problem.
The illustration for this Coding Sheet is a photograph of the Montparnasse derailment of 1895.
|Previous: “A trick for your APIs”
|Table of contents
|Next: “The second 80%”
|A Folla ten unha versión deste artigo en galego: “Hai que ter máis tino”.