Thursday, March 13, 2008

NValidate: Misunderstood from the Outset

Occasionally, I will post questions about the design or feature set of NValidate on Google Newsgroups. More recently, I posted a question about it to LinkedIn. Almost immediately, I got this response:

I'd suggesting looking at the Validation Application Block portion of the Enterprise Library from the Microsoft Patterns and Practices group.

Now, I'm not belittling the response, because it's perfectly valid, and the Validation Application Block attempts to solve essentially the same problem. But when I talk about NValidate, which I find myself doing a lot as I interview for jobs (it's listed on my résumé), people often ask me questions like it:

  1. How is that any different from the Validator controls in ASP.NET?
  2. Why don't you just use the Validation Application Block?
  3. Why didn't you go with attributes instead?
  4. Why didn't you use interfaces in the design?
  5. Why not just use assertions instead of throwing exceptions?

These days, I find myself answering these questions with alarming frequency. It occurs to me that I should probably get around to answering them, so I'm going to address them here and now.

It helps, before starting, to understand the problem that NValidate is trying to solve: Most programmers don't write consistent, correct parameter validation code because it's tedious, boring, and a pain in the neck. We'd rather be working on something else (like the business logic). Writing parameter validation code is just too difficult. NValidate tries to solve that problem by making it as easy as possible, with a minimal amount of overhead.

Q. How is NValidate any different from the Validator controls in ASP.NET?

A. The Validator controls in ASP.NET can only be used on pages. But what if I'm designing a class library? Isn't it vitally important that I make sure I test the parameters on my public interface to ensure that the caller passes me valid arguments? If I'm not, I'm going to fail spectacularly, and not in a pretty way. You can't use the Validator controls (RangeValidator, CompareValidator, and so on) in a class library you're writing that's intended to be invoked from your Web application.

Q. Why don't you just use the Validation Application Block?

A. This one's pretty easy to answer. NValidate is designed to accommodate lazy programmers (like me).

Here's the theory that essentially drives the design of NValidate: Developers don't write parameter validation code with any sort of consistency because it's a pain in the neck to write it, and because we're in a big hurry to get to the business logic (the meat and potatoes of the software). Let's face it: if the first chunk of the code has to be two to twenty lines of you checking parameters and throwing exceptions, and doing it all over the place, you'd get tired of doing it, too. Especially if that code is extremely repetitive.

if(null == foo) throw new ArgumentNullException(foo);
if(string.Empty == foo) throw new ArgumentException("foo cannot be empty.");
if(foo.length != 5) throw new ArgumentException("foo must be 5 characters.");

We hate writing this stuff. So we skip it, thinking we'll come back to it later and write it. But it never gets done, because we get all wrapped up in the business logic, and we simply forget. Then we're fixing bugs, going to meetings, putting out fires, reading blogs, and it gets overlooked. And the root cause is because it's tedious and boring.

I'm not making this up, folks. I've talked to lots of other developers and they've all admitted (however reluctantly), that it's pretty much the truth. We're all guilty of it. Bugs creep in because we fail to erect that impenetrable wall that prevents invalid parameter values from slipping through. Then, we have to go in after the fact and add the code after we've got egg on our face and fix it, at increased cost.

So, if you want to make sure that developers will write the parameter validation code, or are at least more likely to do it, you have to make it as easy as possible to do so. That means writing as little code as possible.

Now, if we look at the code sample provided by Microsoft on their page for the Validation Application Block, we see this:

using Microsoft.Practices.EnterpriseLibrary.Validation;
using Microsoft.Practices.EnterpriseLibrary.Validation.Validators;
public class Customer
{
    [StringLengthValidator(0, 20)]
    public string CustomerName;
    public Customer(string customerName)
    {
        this.CustomerName = customerName;
    }
}

public class MyExample
{
    public static void Main()
    {
        Customer myCustomer = new Customer("A name that is too long");
        ValidationResults r = Validation.Validate<Customer>(myCustomer);
        if (!r.IsValid)
        {
            throw new InvalidOperationException("Validation error found.");
        }
    }
}

A couple of things worth noting:

  1. You have to import two namespaces.
  2. You have to apply a separate attribute for each test.
  3. In your code that invokes the test, you need to do the following:
    1. Declare a ValidationResults variable.
    2. Execute the Validate method on your ValidationResults variable.
    3. Potentially do a cast.
    4. Check the IsValid result on your ValidationResults variable.
    5. If IsValid returned false, take the appropriate action.

That's a lot of work. If you're trying to get lazy programmers to rigorously validate parameters, that's not going to encourage them a whole lot.

On the other hand, this is the same sample, done in NValidate:

using NValidate.Framework;
public class Customer
{
    public string CustomerName;
    public Customer(string customerName)
    {
        Demand.That(customerName, "customerName").HasLength(0, 20);
        this.CustomerName = customerName;
    }
}

public class MyExample
{
    public static void Main()
    {

        try
        {

            Customer myCustomer = new Customer("A name that is too long");

        }
        catch(ArgumentException e)
        {
            throw new InvalidOperationException("Validation error found.");
        }
    }
}

A couple of things worth noting:

  1. You only have to import one namespace.
  2. In the property, you simply Demand.That your parameter is valid.
  3. In your code that invokes the test, you need to do the following:
    1. Wrap the code in a try...catch block.
    2. Catch the exception and handle it, if appropriate.

See the difference? You don't have to write a lot of code to validate the parameter, and your clients don't have to write a lot of code to use your class, either.

Q. Why didn't you go with attributes instead?

A. I considered attributes in the original design of NValidate. But I ruled them out for a number of reasons:

  1. Using them would have meant introducing a run-time dependency on reflection. While reflection isn't horrendously slow, it is slower than direct method invocation, and I wanted NValidate to be as fast as possible.
  2. I wanted the learning curve for adoption to be as small as possible. I modeled the public interface for NValidate after a product I thought was pretty well known: NUnit. You'll note that Demand.That(param, paramName).IsNotNull() is remarkably similar to NUnit's Assert.IsNotNull(someTestCondition) syntax.
  3. In NValidate, readability and performance are king. Consequently, it uses a fluent interface that allows you to chain the tests together, like so:

    Demand.That(foo, "foo").IsNotNull().HasLength(5).Matches("\\d5");

    This is a performance optimization that results in fewer objects created at runtime. It also allows you to do the tests in a smaller vertical space.

My concerns about attributes and reflection may not seem readily apparent until you consider the following: it's conceivable (in theory) that zealous developers could begin validating parameters in every frame of the stack. If the stack frame is sufficiently deep, the costs of invoking reflection to parse the metadata begins to add up. It may not seem significant yet, but consider the scenario where any one of those methods is recursive; perhaps it walks a binary tree, a DOM object, an XML document, or a directory containing lots of files and folders. When that happens, the costs of reflection can become prohibitively expensive.

In my book, that's simply not acceptable. And since, as a framework developer, I cannot predict or constrain where a user might invoke these methods, I must endeavor to make it as fast as possible. In other words, take the parameter information, create the appropriately typed validator, execute the test, and get the hell out as quickly as possible. Avoid any additional overhead at all costs.

Q. Why didn't you use interfaces in the design?

A. I go back and forth over this one all the time, and I keep coming back to the same answer: Interfaces would tie my hands.

Lets assume, for a moment, that we published NValidate using nothing but interfaces. Now, in a subsequent release, we decided we wanted to add new tests. Now we have a problem. We can't extend the interfaces without breaking the contract with clients who are built against NValidate. Sure, they'll likely have to recompile anyway; but if I add new methods to interfaces, they might have to recompile lots of assemblies. That's something I'd rather not force them to do.

On the other hand, abstract base classes allow me to extend classes and add new tests and new strongly typed validators fairly easily. Further, it eliminates casting (because that's handled by the factory). If, however, the system is using interfaces, some methods will return references to an interface, and some will return references to strongly typed validators, and some casting will have to be done at the point of call. I want to eliminate manual casting whenever I can, to keep that call to Demand.That as clean as possible: the cleaner it is, the more likely someone is to use it, because it's easy to do.

Q. Why not just use assertions instead of throwing exceptions?

A. This should be fairly obvious: Assertions don't survive into the release version of your software. Additionally, they don't work as you'd expect them to in a Web application (and rightly so, since they'd kill the ASP.NET worker process, and abort every session connected to it. [For a truly educational experience, set up a test web server, and issue a Visual Basic Stop statement from a DLL in your Web App. You'll kill the worker process, and it will be reset on the next request. Nifty.]).

Wisdom teaches us that the best laid plans of mice and men frequently fail. Your most thorough testing will miss some points of your code. The chances of achieving 100% code coverage are pretty remote; if you do it with a high degree of frequency, I'm duly impressed (and I'd like to submit my resume). But for the rest of us, we know that some code never gets executed during testing, and some code gets executed, but doesn't get executed under the precise conditions that might reveal a subtle defect. That's why you want to leave those checks in the code. Yes, it's additional overhead. But wouldn't you rather know?

In Summary

Sure, these are tradeoffs in the design. But let's keep in mind who I'm targeting here: lazy programmers who are typically disinclined to write lots of code to validate their parameters. The idea is that we want to make it so easy that they're more likely to do it. In this case, less code hopefully leads to more, which (I hope) leads to fewer defects, and higher quality software.

No comments: