Beauty is in the Eye of the Beholder


Beauty is in the Eye of the Beholder

Shakespeare probably didn't realize he was talking about dynamic SQL when he wrote those lines, but he did. There have been a number of articles on this site telling developers that dynamic SQL is "bad". After commenting on the articles a few times, I'd like to detail an application space where dynamic SQL is "good".

The quotes around the words "good" and "bad" are not just the stylistic affectations of an English major. The question of what you need to optimize for should be the first question asked in the design process. The answer to the question depends on so many factors, whether the software to be developed is embedded or not, whether it is internal to an organization or not, whether it is transaction or reporting focused, etc., that I am loathe to pass judgement right away.

After all, which direction does a clock move, clockwise or counterclockwise? The answer depends on your perspective...

For most of the developers reading this site, and indeed for most of us working in and around relational databases, the default optimization strategy is to optimize for speed of execution. This is not without reason. If you need to optimize for speed, then dynamic SQL will slow you down to some extent, since the SQL will need to be compiled before it can be run. In that case, dynamic SQL is "bad".

However, there are application spaces that use relational databases extensively, where speed of execution is not the primary optimization path. I worked in such an application space for many years, clinical trial applications.

Clinical trials are simply put, giant, extremely costly experiments. Each clinical trial is its own application, with its own data structures, programs, etc. Clinical trials tend not to have huge amounts of data (a massive trial had about a gigabyte of raw data), but since we had over 250 trials in production at one time, we had an enormous amount of metadata.

Since clinical trials come under the purview of the FDA, there is a tremendous overhead of validation for each distinct piece of software you write. Validation here does not mean testing your software, but rather documenting the testing of your software. Whether it is a good use of anyone's time is a discussion for another day, but the requirement is there, and it is onerous to one degree or another. Dynamic SQL is incredibly useful in reducing the regulatory overhead. Below are two areas where it was used.

Two important notes. One, these were internal, server-only apps with no real direct end user involvement. Therefore, SQL injection was not something we had to deal with. If your app needs to be concerned with SQL injection, that could definitely limit your use of dynamic SQL. Two, ClinTrial, the clinical trial programming tool we used, has a well thought out and extensive metadata repository. Dynamic software, whether dynamic SQL or some other language, practically requires a metadata repository.

The first area that dynamic SQL was used was the creation of generic data validation scripts. Instead of having to write the same scripts over and over (static SQL couldn't be used because each trial had different table names, and if they did have the same table names, they often had different column definitions), the tool read the metadata repository to dynamically create the validation routines. Days upon days spent creating validation scripts were eliminated.

There certainly was a time-to-execute performance hit, but it really wasn't that awful, and certainly worth the savings in development and validation time.

The second area for dynamic SQL was one where I was directly involved. We were tasked with creating the first ever clinical data warehouse. Among other things, this meant loading literally thousands of heterogeneously structured tables into 10 standard structures. A new set of tables would typically be created each week, and we had to allow for the fact that table structures could change in the middle of a trial, since they were, after all, experiments.

Finally, we had to make allowances for bizarre behavior on the part of individual trials. While you could certainly make an argument that the initial trial design process should have be more closely regulated (i.e ., don't think outside the box until you look inside the box first), that was the environment we had to deal with.

With our validation requirements, there was no way in the world to do that without dynamic SQL. Here's a very short version of how we accomplished this.

1) For each individual target entry, we created a "feed table". This feed table was populated from the metadata repository and a configuration table we called the "metameta" table.

2) From the feed table, we created an enormous SELECT statement, full of UNION ALLs, that extracted and transformed the individual trial's data to the standard target, as well as handled the change data capture logic. We stored this SELECT statement as a VIEW.

3) An Informatica program would load the data by invoking the VIEW and moving its result set into the staging target table.

In some ways, you could think of the whole feed table apparatus as a giant, complicated text replacement program, or keystroke macro; for validation purposes, that's exactly what we did, since we considered the VIEW to be the E and T of ETL. Validating that was trivial; just do SELECT * FROM new MINUS SELECT * FROM old and SELECT * FROM old MINUS SELECT * FROM new statements to compare the new and the old views. If you got the results you wanted, you were done. That was our story, and we stuck with it.

In the second example, there was no performance hit at all. Dynamic SQL turned what could have been an onerous task, a working in the coal mines kind of task, into one that didn't harsh anybody's mellow once trained.