Coding from the Trenches: The Sad State of SQL Refactoring

I love refactoring tools.

The ability to select a variable in code, right-click on it, choose the Rename command from a context menu, and then safely rename that variable in every location in which it appears is a boon to my programming productivity. Similarly, the ability to take a large block of complex logic and extract it to its own method is really handy. Or, the ability to reorder parameters, and know that every piece of code that calls that method will be updated appropriately saves me tons of time and improves my confidence in the quality of the code.

When you refactor code of any kind, you enhance its readability without changing the way it works. But refactoring by hand is frequently difficult, tedious, and error-prone. That's why refactoring tools exist and why the essential service they provide is important. They allow us to improve the maintainability of code, hopefully reducing maintenance costs, quickly and efficiently.

Enter SQL languages.

In theory, there is a standard for SQL languages: the ANSI SQL-92 standard. One would like to think that it would be a simple matter to create refactoring tools for any SQL based on the ANSI standard for SQL. One would be wrong. You can't use the standard as the sole basis of a refactoring tool.

Any given database vender wants to strive to make their product unique, to stand out from the crowd. And so, they don't entirely conform to the standard. They have additional features that separate them from each other. For example: Oracle organizes functions and procedures into packages. Microsoft does not. Microsoft allows a bit data type on columns. Oracle does not. And, although we all loathe to think of Access as a real database, Access provides a boolean data type that you can use in columns, while Oracle does not.

Now, let's also talk about legacy support. Oracle has been around for a very long time. Version 2 came out in 1979, from what I gather. Their legacy join syntax looks nothing like the ANSI standard (and that's not a criticism, just a simple statement of fact):

SELECT Customers.CustomerId, Company, OrderID FROM Customers C, Orders O WHERE C.CustomerID (+) = O.CustomerID UNION SELECT Customers.CustomerId, Company, OrderID FROM Customers C, Orders O WHERE C.CustomerID = O.CustomerID (+)

In ANSI SQL, this is :

SELECT Customers.CustomerId, Company, OrderId FROM Customers C FULL OUTER JOIN Orders O ON (C.CustomerId = O.CustomerId)

Now, we could add in all the different SQL variations that we know that are out there:

ANSI standard
Interbase/Firebird
IBM SQL
Microsoft Transact SQL
MySQL
Oracle PL/SQL
PostgreSQL
Access
FoxPro

...and on and on and on, but you get the point. There are a lot of SQL variants out there. And their goal is to accomplish the same thing:

Create a data store.
Occasionally, modify the structure of the data store.
Get data out of a data store.
Put data into a data store.
If supported, execute code within the confines of the data store and (optionally) return a result.

Behold, SQL in a nutshell.

Over the lifetime of these various products, they have added features that they have to maintain for legacy support. Somewhere out there there's a business that absolutely must have that feature in place or their whole process will come crashing down. Don't you dare remove it. It doesn't matter that there are better features (likely based on a standard); there are applications out there for which they don't have the source code anymore, or which no one understands, and they're too afraid to touch.

Now, all these reasons have led us to a scenario where we have vastly different implementations of SQL. Sure, they share a lot in common, but they also have radically different feature sets and syntax. And that situation, in and of itself, has led us to one frightening and sad conclusion:

We will likely never have a tool that is able to connect to any database and be able to correctly refactor its SQL.

And that's just a damned shame. Because there's a lot of SQL out there. Stored procedures, functions, views, triggers, even inline SQL in applications and all that other jazz. But the amount of work it would take to get us to a point where a refactoring tool could recognize any variant of SQL and correctly parse it, refactor it, and not hose the code is enormous.

It's a pity, really. I could see tons of use for a tool like this. In my own office, we work with SQL Server, Oracle, and Atomix databases. What I wouldn't give for a tool that could refactor SQL to enhance its readability without changing the way it worked.

And who knows? Down the road, we may be working with something else.

But this is the world we live in. If we want to refactor SQL, it looks like we'll have to settle for separate refactoring tools for each language. And then each will come with its own quirks. That might be good or bad, but it's likely the best we can hope for for now.

Technorati Tags: SQL, Best Practices, Productivity, Code Quality, Refactoring, Software Development

Coding from the Trenches

Monday, July 13, 2009

The Sad State of SQL Refactoring

No comments:

Who the Heck Am I?