Subject: Failure, science and software testing
From: npdoty@ischool.berkeley.edu
Date: To: Brian (Microsoft), Vignesh (Microsoft), Mubarak (Microsoft), Tracy (Microsoft), Jolie (Microsoft), Ben Cohen Bcc: https://bcc.npdoty.name/

Have you guys read this Wired article on failure and science?

I thought it was really reminiscent of the constant failures we run into as programmers, and the particular challenge of being a software tester.

The main premise is that scientists get so comfortable with accepted theory and the status quo that they don't recognize that failures might be breakthroughs instead of just mistakes in their own equipment or experimental method. There's certainly some value there -- the author gives the example of sensitive radio telescope static finally being accepted as cosmic background radiation and not a problem with the dish, and cites some serious ethnographic research of scientists and how they make discoveries. But the article makes it sound like the solution is simply to be skeptical, to assume that every unexpected experimental result is a potential new discovery.

But anyone who's taken high school physics knows that this assumption that it must be your fault not the theory's fault isn't just some elitist fallacy, it's born of experience. Of the hundred times that the results of your high school physics experiment didn't match what theory predicts, how many times was it because the theory was wrong? Zero, of course; it's always a screw-up with your experimental set-up (at least it was in high school, and I bet the percentage doesn't change that much once you're a professional).

As programmers, we learn this lesson even more often. The first rule of programming, after all, is that it's always your fault. This isn't dogma, every one of us has learned it from this quintessential and eternally repeated experience where we write a piece of code, it doesn't work, we assume that it must be a problem with the operating system or the compiler or the other guy's code -- that the computer simply isn't doing what we told it to do -- until we realize the mundane truth when we actually look at our own code and the documentation and find that we'd just made another stupid mistake.

And that's the real trick of software testing: it can be tempting, particularly at first, to file a bug every time something doesn't work. Young confident software testers go to their dev several times a day saying "I found a bug" only to realize that they hadn't called the function with the correct parameters. But this lesson of experience quickly leads to the opposite problem: having become so accustomed to being the cause of our problems (like any programmer), we just fidget until we get the software to work, unconsciously working around bugs that we should be filing.

So I think the real answer, both to the scientific problem and the software-testing one, isn't mere undying skepticism, but in knowing which failures are probably your fault and which ones aren't. And a lot of the techniques that experimental scientists and software testers are the same: the first step for both is reproducing the failure. Lehrer's article also suggests talking to someone who isn't intimately familiar with the experiment, and I think we software testers often understand an unexpected result when we try to explain the bizarre situation to a tester from another team. "Encourage diversity" is also on his list, and I think the Test Apprentice Program at Microsoft was a darn good example of that in action -- being the only non-CS majors on our teams, we often found different bugs.

Maybe experimental science could even learn something from software testers. I thought one of the more valuable things we got from learning test-driven development was that a test wasn't good unless you'd seen it fail. If you've only ever seen a test pass, then how do you know that it really tests what you claim it tests? That must be harder for physicists (they can't briefly turn off a particular universal parameter to ensure that the experiment fails under those conditions), but the same sort of counterfactual thinking (rather than just writing a test and being happy when it turns green, or running an experiment and assuming that the result confirms the theory) seems important to me.

Do we get a lot of good software testers from experimental science backgrounds? Maybe that's where we should be hiring from. Anyway, I highly recommend the Wired article, if only for the comfort that programmers aren't alone in the universe for having their experiments fail constantly.

Hope you're all doing well -- grad school is great, but, as you can see, I still miss software testing from time to time,
Nick