Wednesday, March 19, 2008

Why Voting Machines Can't Add Up

Ed Felten is continuing his excellent work exposing the broken state of electronic voting machines. Many people are wondering how such software can have been allowed out by its developers. The discrepancies don't (at the moment) seem to be a result of fraud, just very buggy software.

Voting machines are obviously important, so their development is regulated. I've never worked in the voting machine industry, but I have worked in another kind of federally regulated software: medical devices. So I know how regulated software projects work, and how they don't.

The fundamental problem underlying this is that nobody in the world actually knows how to write software that reliably does what you want. There are quite a lot of people who can write such software, but if you ask them how its done they basically waffle. Most of them agree on a list of steps to take, starting with writing down exactly what the software is supposed to do. Various attempts have been made to codify this list, and they all look pretty similar. The voting machine standards are just another variation on this theme.

However this is all cargo-cult engineering. We know that the people who can summon up the magic cargo planes do it by putting things over their ears and saying magic words, but it doesn't follow that if we put things on our ears and say the same magic words the cargo will appear. So it is with software engineering. You can write Requirements Documents and Class Diagrams and Test Scenario Documents and Test Execution Reports until you run out of paper, but it won't make any difference if you don't have the Quality Without a Name.

Imagine you are managing a development project to build a voting machine. Your mission is to get the thing on the market. You have been given a bunch of programmers, half a human factors person and a quarter of an industrial designer. The time available isn't long enough, but you know its no use complaining about that because its not your boss's fault, or even the CEO's fault. Its the fault of the people at Big Competitor who are planning to release their product just in time to tie up the whole market, so if you don't deliver the product at the same time then its not going to matter who's fault it was, the whole division is going to get laid off anyway. You could get some more people if really wanted, but you know that more people aren't actually going to speed things up.

The Quality Department have downloaded the voting machine regulations and someone has been going through them and writing down a list of the things they say you have to do and the order they have to be done in. This is very good. In fact you send the Head of QA a little note saying how helpful his minion has been to your project, because now you have something to aim at. Project Management is mostly a matter of getting your hoops lined up so that you and your minions can jump through them as quickly as possible, and the QA minion has done the regulatory hoops for you. The regulations boil down to a list of documents that have to be shown to the inspectors (who you are going to hire, but thats another story). Each document has a list of things it must contain, and some of those things have to be traceable to other things. All you need to do now is start allocating people to things on the list and getting them ticked off. The list is long, but you have one big advantage: there is nothing to say how good any of these documents have to be. They don't have to be good, they just have to exist.

One of these documents is called "source code". Of course that one does have some quality requirements on it: its got to pass a bunch of tests. But the tests themselves don't have any quality requirements; like everything else they just have to exist. And passing the tests is the only quality requirement on the code. Once the independent laboratory you hired has run the tests and said "pass" you are over the finishing line and you can start selling these things.

This means that you have a very strong motivation to keep the testing to the minimum you can get away with. The regulations say you have to have a test for each item in the original requirements document, and this test has to be run once. If your software fails a test then you get to fix it, and if the fix was small enough you can get away without repeating all the other tests as well. During this whole time your eyes are fixed on the finishing line: the objective is to get this thing over the line. What happens to it after that is someone else's problem.

When you look at these machines from a project manager's point of view you start to see how they got to be so unreliable. "Quality Assurance" is primarily a matter of making sure you get all the items in the regulations ticked off; it has nothing at all to do with the original meaning of the word "quality". Ironically the regulations may actually do more harm than good because they divert energy from real quality onto generating the required inches of documentation.

Over the years I've spent a lot of time trying to figure out how to fix this problem, and I still don't have an answer. Abolishing private companies is a cure worse than the disease, and it won't cure the disease anyway because it won't abolish projects and the need to manage them. Software Engineering has a bad case of Quality Without A Name, and there is no prospect of it getting better soon.

However in the limited domain of voting machines I believe the best cure is sunlight: we may not be able to define quality in software, but we know it when we see it. The source code for voting machines must be published. The manufacturers will scream and shout about their precious IPR and trade secrets. This is nonsense. Any voting machine must have a well defined version of someone's software running, so any illegal copying will generate a cast-iron audit trail back to the perpetrator. And there are no real trade secrets in voting machines: counting votes is not, when it comes down to it, a particularly complicated problem. The voting machine manufacturers will make just as much money as they do now. In fact they'd probably make more because if the machines were trustworthy then people would learn to trust them. However the first vendor to start publishing their source code will be at a disadvantage because everyone else can pinch bits of it with very little risk of detection (and if they get caught they can just blame a rogue programmer). So the regulations on voting machines should be changed to require the publication of the code (and other design documentation too, while we are about it). That will create a real requirement for quality source code. Until then we are stuck with the current mess.