Thursday, July 16, 2009

InternalsVisibleTo and Chasing Down Public Keys

Sometimes, just getting assemblies to cooperate the way you want them to is a real pain in the neck.

Today, I wanted to make one assembly (which we’ll call Foo) to be able to access the internal (Friend for all us VB geeks) members of another assembly (which we’ll call Bar). Now, the documented way to achieve this is by adding the InternalsVisibleTo attribute to your project. The code sample on MSDN looks like this:

[assembly:InternalsVisibleTo("AssemblyB, PublicKey=32ab4ba45e0a69a1")]

For a Visual Basic application, that statement goes into the AssemblyInfo.vb file, and looks like this:

<Assembly:InternalsVisibleTo("AssemblyB, PublicKey=32ab4ba45e0a69a1")>

No big deal. It’s not rocket science. You just need a strong name. To generate a strong name, you fire up the Visual Studio Command Prompt, and execute the Strong Name tool (sn.exe) and execute it as follows:


C:\Program Files\Microsoft Visual Studio 9.0\VC>sn.exe -k bar.snk

Microsoft (R) .NET Framework Strong Name Utility  Version 3.5.30729.1
Copyright (c) Microsoft Corporation.  All rights reserved.

Key pair written to bar.snk

As you can see, this creates a .SNK file, which contains a public and a private key. However, its contents are not human readable. Further, the InternalsVisibleTo attribute requires that you provide the public key in the constructor. (Don’t even think about trying it without it.)


Here’s where things get tricky: The code samples on MSDN do not provide a public key to the constructor; they provide a public key token. There’s a huge difference between the two, and it’s very misleading. A strong name token is much shorter than a public key; the former is only about 16 characters long, the other well in excess of 140 characters. If you rely on the code sample from MSDN to get you where you want to be, you’ll be pulling your hair out in no time.


But how do you get the public key token?


You can retrieve the public key token as follows:


sn –p foo.snk barpublic.snk


This creates a new file that contains only the public key. It removes the private key information from the file.


sn –tp > barkey.txt


This creates a text file that contains the full dump of the key information, including the public key. We redirect it to a text file so that you can open it in the editor of your choice (because you’ll have to do some cleanup to get the key onto one line). You’ll want to select the public key and paste it into the PublicKey portion of the constructor for the InternalsVisible attribute.


So here’s everything we did at the command prompt:


C:\Program Files\Microsoft Visual Studio 9.0\VC>sn -k bar.snk

Microsoft (R) .NET Framework Strong Name Utility  Version 3.5.30729.1
Copyright (c) Microsoft Corporation.  All rights reserved.

Key pair written to bar.snk

C:\Program Files\Microsoft Visual Studio 9.0\VC>sn -p bar.snk barpublic.snk

Microsoft (R) .NET Framework Strong Name Utility  Version 3.5.30729.1
Copyright (c) Microsoft Corporation.  All rights reserved.

Public key written to barpublic.snk

C:\Program Files\Microsoft Visual Studio 9.0\VC>sn -tp barpublic.snk > bar.txt

C:\Program Files\Microsoft Visual Studio 9.0\VC>

And here’s the contents of our text file:


Microsoft (R) .NET Framework Strong Name Utility  Version 3.5.30729.1
Copyright (c) Microsoft Corporation.  All rights reserved.

Public key is
0024000004800000940000000602000000240000525341310004000001000100e13cb392af5437279736fc3c33fe237242d0f6301fafb01c5cbc719d84102c2d8b30a148600997ed53d99624b5d0eab37fd6b24cca3ce7f7b62ae99f961e148d5421576bade0ac8ab1187a3eee318ca20026ffe9b56b8a63156f817cef49998633867ae547684e8e59c0fe0b68ab29dffa749340dc6cfdd18071f1b69c6772ac

Public key token is db218359dd8997df

So, when we finally add that attribute to our AssemblyInfo.vb file, it looks like this:


<Assembly:InternalsVisibleTo("Bar, PublicKey=0024000004800000940000000602000000240000525341310004000001000100e13cb392af5437279736fc3c33fe237242d0f6301fafb01c5cbc719d84102c2d8b30a148600997ed53d99624b5d0eab37fd6b24cca3ce7f7b62ae99f961e148d5421576bade0ac8ab1187a3eee318ca20026ffe9b56b8a63156f817cef49998633867ae547684e8e59c0fe0b68ab29dffa749340dc6cfdd18071f1b69c6772ac")>

Once this stuff is in place, Bar should be able to access any members in Foo that are marked internal/Friend. It should be smooth sailing from there.

Good luck!


Tuesday, July 14, 2009

On Legacy Software Maintenance

In the mad, mad, mad, mad world of software development, we are faced with the trying task of maintaining legacy systems. In an ideal world, that wouldn't be the case. We'd all be developing brand new systems from the ground up, writing ground-breaking code that no one has ever seen before, without the hassles that arise when you have to worry about things like backwards compatibility and maximum system uptime.

But this isn't an ideal world. The vast majority of us don't have the luxury of developing completely new systems. Instead, our lives are fraught with the perils of correcting defects and adding new features to systems that have been around for ages and, occasionally, decades. Those systems have usually passed through a number of hands and they tend to be poorly documented. They sometimes have sprawling feature sets, support technologies that have long since fallen by the wayside, bloated with code that doesn't appear to be invoked by anyone, and riddled with obscure and seemingly nonsensical comments.

Your job, as a maintenance developer, is to massage that seemingly horrific beast into a thing of beauty. Real people doing real jobs depend on it to get their work done in a timely manner. As much as we might loathe an ancient codebase, the language it was written in, or the tools we have to use to get the job done, truth is, a legacy application is maintained for a reason: it has intrinsic value to the bottom line of the business. When the folks who depend on that system can get their jobs done on time, in a productive manner, they can continue to draw a decent paycheck. That means that they can continue to pay the rent, put food on the table for their families, afford healthcare, and all those other essentials.

So tread lightly when you delve into the code of a legacy system. It's far more important than you think. We just take it for granted that it's a dirty, loathsome job and someone has to do it. We just happen to be the unlucky bastard who drew the short straw. Not so: you happen to be the lucky one who drew that straw. People depend on you to help them keep their families safe, warm, and well-fed.

My point isn't that every legacy application is manna from heaven. My point is that legacy applications exist for a reason, that they're maintained for a reason. They have long, varied histories for a reason. They have endured because they have value; they've grown beyond their original specification because the company sees real value in them, and doesn't want to lose that investment. The problem for you, as a developer, is in ensuring that you do not destroy that investment.

When we are first introduced to a legacy system, we have a tendency to look at the source code and view it as though it were written by a blind deaf quadriplegic with Tourette's syndrome. No one in their right mind would have written a system that way. What could they possibly have been thinking? You certainly wouldn't have! I certainly wouldn't have.

But then, over time, we start to learn things about it. The code is old; very old. There's been a high turnover rate, and the code has passed through lots of hands. The companies that published third-party components went out of business when the dot-com bubble burst. They used to use Novell for security, and then switched to Active Directory. When this thing was released, Windows 95 was still popular. They upgraded the database about two years ago, and had to make some emergency revisions to the code.

There are reasons that things like this happen in a legacy system that's old enough to qualify for Medicare. Many of those reasons are valid, and many of them are the product of a lack of time and budget. Sometimes, sadly, it's a result of a lack of skilled developers (but that's something for someone else to blog about). The point, in short, is that systems grow from an original vision into large, cumbersome, bloated systems because developers respond to the needs and demands of the business.

Now, here you are, present day, and you're tasked with maintaining that source code. You have two primary responsibilities: 1.) Fix the defects, and 2.) Add new features. Keep in mind that while you are doing both, you must not at any point in time break backwards compatibility or bring down the system. People rely on this system. It's the bread and butter (or part of it) of the business. And it's your baby now.

It is absolutely crucial that you treat legacy software with its due gravity. If you view it like it's some recurring annoyance, stop that. If you leap to hasty conclusions about the cause of problems in the system, stop that immediately. This is a system that many people rely on. Get it into your head that you need to treat this thing delicately, as if it were your pride and joy. Once you fix a defect, once you put a new feature into it, your name is associated with that software. Take pride in it. Do it right. Take your time.

Over time, the legacy system stops being the horrific beast associated with all those who preceded you. It becomes the creature that you have molded it into. And then, people will associate it with you, for better or worse.

Monday, July 13, 2009

The Sad State of SQL Refactoring

I love refactoring tools.

The ability to select a variable in code, right-click on it, choose the Rename command from a context menu, and then safely rename that variable in every location in which it appears is a boon to my programming productivity. Similarly, the ability to take a large block of complex logic and extract it to its own method is really handy. Or, the ability to reorder parameters, and know that every piece of code that calls that method will be updated appropriately saves me tons of time and improves my confidence in the quality of the code.

When you refactor code of any kind, you enhance its readability without changing the way it works. But refactoring by hand is frequently difficult, tedious, and error-prone. That's why refactoring tools exist and why the essential service they provide is important. They allow us to improve the maintainability of code, hopefully reducing maintenance costs, quickly and efficiently.

Enter SQL languages.

In theory, there is a standard for SQL languages: the ANSI SQL-92 standard. One would like to think that it would be a simple matter to create refactoring tools for any SQL based on the ANSI standard for SQL. One would be wrong. You can't use the standard as the sole basis of a refactoring tool.

Any given database vender wants to strive to make their product unique, to stand out from the crowd. And so, they don't entirely conform to the standard. They have additional features that separate them from each other. For example: Oracle organizes functions and procedures into packages. Microsoft does not. Microsoft allows a bit data type on columns. Oracle does not. And, although we all loathe to think of Access as a real database, Access provides a boolean data type that you can use in columns, while Oracle does not.

Now, let's also talk about legacy support. Oracle has been around for a very long time. Version 2 came out in 1979, from what I gather. Their legacy join syntax looks nothing like the ANSI standard (and that's not a criticism, just a simple statement of fact):

SELECT Customers.CustomerId, Company, OrderID
FROM Customers C, Orders O
WHERE C.CustomerID (+) = O.CustomerID
UNION
SELECT Customers.CustomerId, Company, OrderID
FROM Customers C, Orders O
WHERE C.CustomerID = O.CustomerID (+)

In ANSI SQL, this is :

SELECT Customers.CustomerId, Company, OrderId
FROM Customers C FULL OUTER JOIN Orders O ON
(C.CustomerId = O.CustomerId)

Now, we could add in all the different SQL variations that we know that are out there:

  • ANSI standard
  • Interbase/Firebird
  • IBM  SQL
  • Microsoft Transact SQL
  • MySQL
  • Oracle PL/SQL
  • PostgreSQL
  • Access
  • FoxPro

...and on and on and on, but you get the point. There are a lot of SQL variants out there. And their goal is to accomplish the same thing:

  • Create a data store.
  • Occasionally, modify the structure of the data store.
  • Get data out of a data store.
  • Put data into a data store.
  • If supported, execute code within the confines of the data store and (optionally) return a result.

Behold, SQL in a nutshell.

Over the lifetime of these various products, they have added features that they have to maintain for legacy support. Somewhere out there there's a business that absolutely must have that feature in place or their whole process will come crashing down. Don't you dare remove it. It doesn't matter that there are better features (likely based on a standard); there are applications out there for which they don't have the source code anymore, or which no one understands, and they're too afraid to touch.

Now, all these reasons have led us to a scenario where we have vastly different implementations of SQL. Sure, they share a lot in common, but they also have radically different feature sets and syntax. And that situation, in and of itself, has led us to one frightening and sad conclusion:

We will likely never have a tool that is able to connect to any database and be able to correctly refactor its SQL.

And that's just a damned shame. Because there's a lot of SQL out there. Stored procedures, functions, views, triggers, even inline SQL in applications and all that other jazz. But the amount of work it would take to get us to a point where a refactoring tool could recognize any variant of SQL and correctly parse it, refactor it, and not hose the code is enormous.

It's a pity, really. I could see tons of use for a tool like this. In my own office, we work with SQL Server, Oracle, and Atomix databases. What I wouldn't give for a tool that could refactor SQL to enhance its readability without changing the way it worked.

And who knows? Down the road, we may be working with something else.

But this is the world we live in. If we want to refactor SQL, it looks like we'll have to settle for separate refactoring tools for each language. And then each will come with its own quirks. That might be good or bad, but it's likely the best we can hope for for now.

 

Saturday, July 11, 2009

Defects and the Scientific Method

Let's leap back a few years to that frightening time in Junior High School. Remember those years? September, possibly October. It was still hot, and the air conditioning didn't work. Your science teacher was standing in the back of the room, with an overhead projector and had a slide up for you to copy down. On it, he had this information:

THE SCIENTIFIC METHOD

  • Ask and define the question.
  • Gather information and resources through observation.
  • Form a hypothesis.
  • Perform one or more experiments and collect and sort data.
  • Analyze the data .
  • Interpret the data and make conclusions that point to a hypothesis.
  • Formulate a "final" or "finished" hypothesis.
  • Ah, remember those days? Remember how boring it was? Remember thinking to yourself, "I'll never use this?"

    Well, as a software developer who is tasked with maintaining software that is virtually guaranteed to contain defects, you can be certain that you need to be intimately familiar with The Scientific Method.

    The Scientific Method provides a clear roadmap for defect isolation. In fact, anyone who has any real experience isolating defects without disturbing the rest of the system has (whether he's aware of it or not) used the Scientific Method to do so. Here's how it breaks down:

    1. Ask and define the question. The software should behave in this manner, but it does not. What is the cause of this problem, and how do we fix it?
    2. Gather information and resources through observation. In a controlled environment that mimics production as closely as possible, reproduce the defect. If possible, step through the code and observe its behavior.
    3. Form a hypothesis. The defect is caused by this behavior in the system (or by the behavior of this external system).
    4. Perform one or more experiments and collect and sort data. Implement a code fix; attempt to reproduce the defect using the fixed code. Observe the results.
    5. Analyze the data. Did the code fix have the desired effect? If so, how?
    6. Interpret the data and make conclusions that point to a hypothesis. Was the code that was modified the cause of the defect, or was it merely a symptom of an underlying problem requiring further resolution?
    7. Formulate a "final" or "finished" hypothesis. If the defect is fully repaired, check all code into the repository. Otherwise, continue the analysis until you have rooted out the underlying cause of the defect.

    Simply put, there's no guesswork in defect resolution. It is a rational, thinking process, much like a game of Sodoku. If you approach any defect and just yank an answer out of thin air, You're Doing It Wrong.

    Instant answers to defects are a dangerous game. Your first, instinctive answer to any problem is likely to be wrong; the chances of this being true will only rise as your code base grows in size. As your product gains features, you'll want to take greater care to make sure that you have taken the time to disturb absolutely nothing outside of the defect you're trying to correct. In that case, take some advice: Keep your grubby fingers to yourself. Touch only the code in the defect domain. The best way to do that is to have a plan for defect resolution, and I strongly encourage you to apply The Scientific Method.

    Developing software is a task for those who can think. It is not a task for the simple-minded, the lazy, or the inattentive. You have to be willing to pay attention to the details, and to invest the time it takes to hunt down a defect in painstaking detail to get to the root of a problem.

    A good software developer knows the difference between a symptom and a disease, and how that correlates to software defects. Sure, you have a NullReferenceException, and your code is missing a handler for that. But is the problem the missing exception handler, or is the problem that a null somehow got into a table that should never have had it, or that a stored procedure in the database returned nulls when they were never expected? Which one is the symptom? Which is the disease? Make sure you're fixing the right defect. Don't just prescribe aspirin when the software needs invasive surgery. To find that out, you need to think critically. You need to apply the Scientific Method.