In this short series of articles, I am going to introduce you to the world of Test Automation, Test Driven Design (TDD) and Continuous Integration (CI), for databases. We'll discuss why automation is important, how to write a good test, what makes a good testing framework and how to build a continuous integration system.
In my experience, most people implement new techniques and change existing habits and approaches based on inspiration, and truths learned through real experiences, rather than through theoretical exposition. Therefore, Instead of simply reiterating the theory that many others (including myself: 1, 2) have covered before, my intent is to break down into manageable steps, both the theory and practice of TDD, illustrating each step with practical code examples taken right from the tSQLt framework itself.
The path I describe is one that leads to more maintainable code and lower error counts. Nevertheless, what I will propose is required, over the course of these articles, may feel like a "big leap" from how you deal with database development and testing right now. Therefore, before we dive into the technical details, this first article will take a slightly unconventional, and non-technical approach. In it, I will simply recount my personal 'TDD epiphany', an experience that convinced me entirely of the need for Database TDD, and inspired me to help others find better ways to implement it.
Introducing the Fortune 50 ETL Project
Some years ago, earlier in my IT career, I was working for a large Fortune 50 company. My team's task was to develop the Extract-Transform-Load (ETL) software that would allow new customers to import their existing data (e.g. data generated while using an existing, competitor's application) into the company's flagship product.
It was regarded as a comparatively small project, but in such surroundings even rather modest projects were large and expensive by most people's standards. This one was already a multi-million dollar endeavor, and so the stakes were high.
Around the time this project started, the company invested in outsourcing. They founded their own "software development services" company in India, ultimately employing more than 2000 employees. The ETL project consisted of a team of India-based developers supporting the 8-person US team of which I was a member.
The project followed the Waterfall process to the letter. A Waterfall project can be cleanly separated into several, distinct phases, one of which is the development phase where the actual "coding" of the software happens. This phase was to be executed by the team in India.
Preceding the development phase were several research and documentation phases, executed locally. The concluding testing phase was also going to happen in the US.
One of the key deliverables of the early phases of the project was the design document, describing in detail the project implementation, covering the specifics of which tables ought to be used for each task, and so on. Normally, the development team would produce this document, but the outsourcing arrangement was a new one, and it was felt to be 'safer' in this instance to have analysts in our US team produce the design document, and pass it on to the team in India.
Disconnecting the development team from the design process was, as we would realize much later, the first big mistake. As the analysts were not active developers, they described the product in a way that, while logically correct, was not conducive to the development of high-performance T-SQL code.
The First Build
The team in India clearly wanted to make a good first impression. They worked diligently and turned over the completed product on schedule.
It was then that we discovered the second big mistake: inadequate communication. Many of the problems that followed could have been avoided, simply through better communication between the US and India teams. However, our teams were from different cultures and thousands of miles apart, which made communication difficult, and no substantial extra effort was made to overcome these difficulties.
In the absence of such communication, the India team had simply followed the design document as closely as possible, even in places where it did not make sense to them. The resulting final product consisted of approximately 125 T-SQL stored procedures. A "master" procedure called each of these procedures in a specific order, in order to execute the conversion process.
Unfortunately, most of the procedures did not behave as expected or required, partly due to problems with the initial design document, and partly due to defects introduced during coding.
Both teams started working through the numerous bugs and, after a while, it looked like we were moving in the right direction. Then someone asked the question that changed everything: is this going to perform as required? The original plan was to turn off the customer's existing system, run the conversion process, and then turn on the new system. Any period of downtime meant a lot of additional manual work for the client, so our plan was that the conversion process should complete in a few hours.
We did not have an answer to the performance question, although we were beginning to have serious misgivings. The analysts who wrote the detailed design document had little or no SQL experience and so had described the implementation in a very procedural way, including phrases such as "For each record, do this or that". As a result, the stored procedures were entirely based on cursors. Many even had multiple levels of nested cursors.
The Performance Test Failure
As soon as we felt we had removed enough of the serious bugs, we set up a test environment and started the process of putting in place all the pieces to run our first performance test to find out exactly how long this whole conversion process was likely to take, once in production.
Getting everything ready was quite complex. First, we needed to restore a backup of the database. Database source control did not exist at the time and so restoring a backup was the only way to get the latest version. Second, we had to clear out all data. This database contained not only the tables required for our product but also the entire schema for the main product, which included personally identifiable information. Scripting the DDL for just the required tables was insufficient, since quite a few tables contained important metadata that was required for the product, as well as for the conversion process to execute. One of the tables I remember contained the list of genders. While the product did not get close to the 58 gender types Facebook is now providing, there were already a surprisingly high number of entries in that table.
The last preparatory step was to load a new data set, comparable in size to that of our first customer's database, but cleansed of all sensitive information. This input data came in specially formatted files that had to be loaded into staging tables using a dedicated external tool that another team had developed. This process alone ran for many hours.
Finally, late one Friday afternoon, everything was in place and we started our master procedure. We did not know what to expect, but given it was the end of Friday, the process had about 60 hours before someone would look at it again. That should be ample time for it to complete. Or so we thought…
On Monday morning, we discovered that the process had failed after just 90 minutes, while attempting to execute the second of the 125 procedures. It turned out that the input data contained a variation, an additional field, which was allowed by the input file format but that we did not anticipate, and which our conversion program did not know how to handle.
Performance Test Automation
This failed, manual test cost us many hours of work and told us only that the total runtime would be over 90 minutes. However, as the remaining 123 procedures never even executed, we still had no idea how long it actually would take.
We decided that we had to automate the performance test process, and so I was asked to develop a framework that would allow developers a "push button" way to test the performance of our product, with different input files and for different versions of our code base, for example to compare the performance of two alternate implementations.
I created a tool based on a set of batch files that would traverse a list of directories. For each directory, first the tool would restore the databases backup contained within it. Second, it would launch the external tool to load into the staging tables the data in the input files (stored in the same directory). Finally, the tool would run the conversion process itself. Each stage in the process was timed. Having processed all directories, the test framework would produce a report, comparing the performance for each test. This process was then scheduled to run every night and, although it wasn't quite "push-button", all a developer had to do to schedule a new performance test was to create a new folder and place within it a backup of a database already containing the procedures to test, and a set of input files.
We started using this performance-testing framework to run tests against different small data sets. While the framework was not intended to test functionality, being able to run the entire process this easily proved tremendously helpful in ironing out the remaining bugs quickly. Once we were confident that the conversion process would run to completion, however long it took, I set up a series of tests with ever-increasing, but still comparatively small, data set sizes.
Once again, we opted to run the tests over the weekend, but this time I monitored the process to make guard against early failure. It did not fail, but that was the end of the good news.
On Monday morning, I compiled the test results and used mathematical extrapolation to estimate the runtime for the conversion process of a data set the size of that of our first client, consisting of 10 million accounts.
When I saw the number, my heart dropped to the floor. It was 2.2 billion seconds, or roughly 70 years! Clearly this was not going to fly and I did not relish the task of presenting this figure to the management team.
The meeting was duly painful and the managers were unimpressed by our results. They began to consider seriously the idea of abandoning the project and simply hiring a bunch of data entry specialists to type the data into the new system by hand, since this would clearly take a lot less than 70 years.
When I told the team how the management team had reacted, there was some injured pride, but also a determination to fix the problem. We went back to management and argued that the idea of typing the data in manually, while it might work for one or two customers, was not viable in the longer term, and given the number of customers we hoped to attract. We had to get the performance problems solved. The management team accepted our proposal to hold off hiring any data entry specialists and instead give us the time to fix the performance issues, and deliver a solution that both worked and performed adequately.
We had a lot of work ahead of us, because by this stage we knew that to achieve reasonable performance meant a complete rewrite of the stored procedures, implementing their functionality using set-based logic rather than row-by-row cursor-based processing.
However, we had just spent a lot of time and effort ironing out defects to the point where what we had was functionally sound. Therefore, we needed to find a solution to implement these radical changes that would prevent us from introducing any functional regressions.
Automated Unit Tests
At around this time, the practice of Test Driven Design (TDD) was gaining traction, although mainly in the object-oriented world. We decided that what we really needed was "TDD for Databases".
While my performance-testing framework had helped us find quite a few bugs, and given us some proper performance data, it was not adequate for performing the functional testing we needed to drive Database TDD. We needed a tool that would allow us to automate unit or acceptance testing. One member of our team, Dennis Lloyd Jr, developed a framework capable of executing the entire conversion process from start to finish, on a small dataset, and then comparing the achieved result with the expected result. Each test would include steps to set up the environment (i.e. restore the backup, import data from the input files) and then run the conversion process start to finish. Upon completion of the conversion, each test would execute a series of queries designed to prove whether or not it executed successfully. We called this testing tool "Dart" and it would execute a series of tests consecutively and produce a report detailing which tests passed and which ones failed. From that point on, we had Dart for functional testing and my original framework to execute performance tests.
Our analysts provided us with a suite of tests to cover the functionality of each procedure and, starting with the longest running procedure, as identified by my original performance-testing framework, we implemented these test cases as Dart tests and then started the rewrite process. First, we ran the suite of Dart tests on the original procedure, then rewrote it to use set-based logic, and then reran the tests.
If we accidentally changed the behavior of that procedure, the test failed, and we got immediate feedback. As we tackled each procedure in this manner, each test run executed the entire suite of tests, covering all procedures, so if a change in one procedure altered the behavior of another procedure, we would know about it.
We worked through every procedure in this way, and by the time the first customer was ready for their big "conversion" event, we had reduced the runtime of the entire process from a predicted eon, to 72 hours. While that was still slow, it could be executed over a weekend and the customer was OK with it.
700 Pager Days
Over time, we had to run more and more conversions, as more customers came online. The team running the conversion process was well drilled so we had few user errors. However, we still were running into quite a few data related problems and so every execution usually led to at least one pager call. We did not despair and continued to fix each new problem, always writing new tests before changing the code.
By that time, I had moved into another team, but somehow agreed to continue carrying the dreaded pager until someone else was found to take over the responsibility. I ended up carrying the pager for 700 days straight – 24/7. After a while, I noticed something odd; I often forgot that I was carrying the pager, despite the fact that development continued and new features were added regularly.
Over the entire 700 days, the pager went off six times. Twice to remind me to change the battery, three times because of an easily diagnosable user error and only once because of a real problem.
Epilogue and Inspiration
This project got off to a very rocky start, but by implementing a sensible TDD strategy, we were able to turn it around. Over time, it matured into one of the most stable products that the company had developed. I do not believe that this success would have been possible without TDD, and a failure would have been disastrous to the company's flagship product.
This experience convinced me of the need for a TDD approach to database development, and inspired me to devote time to helping others to take the TDD concepts developed in the object-oriented world and live them in their databases.
Over the course of the remaining articles, I'm going to describe what's required for a test-driven approach to database development, and how to expand from there toward continuous integration for databases. The examples will use the tSQLt testing framework, which Dennis and I have been developing over the last few years, although you can apply most of the techniques using other tools as well; you just might have to do a little more manual work.