Saturday, July 11, 2009

Defects and the Scientific Method

Let's leap back a few years to that frightening time in Junior High School. Remember those years? September, possibly October. It was still hot, and the air conditioning didn't work. Your science teacher was standing in the back of the room, with an overhead projector and had a slide up for you to copy down. On it, he had this information:

THE SCIENTIFIC METHOD

  • Ask and define the question.
  • Gather information and resources through observation.
  • Form a hypothesis.
  • Perform one or more experiments and collect and sort data.
  • Analyze the data .
  • Interpret the data and make conclusions that point to a hypothesis.
  • Formulate a "final" or "finished" hypothesis.
  • Ah, remember those days? Remember how boring it was? Remember thinking to yourself, "I'll never use this?"

    Well, as a software developer who is tasked with maintaining software that is virtually guaranteed to contain defects, you can be certain that you need to be intimately familiar with The Scientific Method.

    The Scientific Method provides a clear roadmap for defect isolation. In fact, anyone who has any real experience isolating defects without disturbing the rest of the system has (whether he's aware of it or not) used the Scientific Method to do so. Here's how it breaks down:

    1. Ask and define the question. The software should behave in this manner, but it does not. What is the cause of this problem, and how do we fix it?
    2. Gather information and resources through observation. In a controlled environment that mimics production as closely as possible, reproduce the defect. If possible, step through the code and observe its behavior.
    3. Form a hypothesis. The defect is caused by this behavior in the system (or by the behavior of this external system).
    4. Perform one or more experiments and collect and sort data. Implement a code fix; attempt to reproduce the defect using the fixed code. Observe the results.
    5. Analyze the data. Did the code fix have the desired effect? If so, how?
    6. Interpret the data and make conclusions that point to a hypothesis. Was the code that was modified the cause of the defect, or was it merely a symptom of an underlying problem requiring further resolution?
    7. Formulate a "final" or "finished" hypothesis. If the defect is fully repaired, check all code into the repository. Otherwise, continue the analysis until you have rooted out the underlying cause of the defect.

    Simply put, there's no guesswork in defect resolution. It is a rational, thinking process, much like a game of Sodoku. If you approach any defect and just yank an answer out of thin air, You're Doing It Wrong.

    Instant answers to defects are a dangerous game. Your first, instinctive answer to any problem is likely to be wrong; the chances of this being true will only rise as your code base grows in size. As your product gains features, you'll want to take greater care to make sure that you have taken the time to disturb absolutely nothing outside of the defect you're trying to correct. In that case, take some advice: Keep your grubby fingers to yourself. Touch only the code in the defect domain. The best way to do that is to have a plan for defect resolution, and I strongly encourage you to apply The Scientific Method.

    Developing software is a task for those who can think. It is not a task for the simple-minded, the lazy, or the inattentive. You have to be willing to pay attention to the details, and to invest the time it takes to hunt down a defect in painstaking detail to get to the root of a problem.

    A good software developer knows the difference between a symptom and a disease, and how that correlates to software defects. Sure, you have a NullReferenceException, and your code is missing a handler for that. But is the problem the missing exception handler, or is the problem that a null somehow got into a table that should never have had it, or that a stored procedure in the database returned nulls when they were never expected? Which one is the symptom? Which is the disease? Make sure you're fixing the right defect. Don't just prescribe aspirin when the software needs invasive surgery. To find that out, you need to think critically. You need to apply the Scientific Method.

    No comments: