Unit testing does not come for free. There is a cost associated with the time it takes to write and run the tests. There is a cost associated with the maintenance of the tests. There are also other factors to consider such as the time it takes to train developers to be effective writing tests as well as the time it takes to fix bugs that tests will find.
If we break down the cost of implementing unit testing for SQL Server, we need to know the time it takes to write tests. Typically to write a test we need to:
- Work out what the code will do
- Work out what we need to test
- Work out how we will write the tests to validate the code
- Write the tests
- Write the code
- Validate the code works correctly, fix any issues with the code or tests
If you write tests upfront, you are forced to spend time thinking about what it is you are going to do. This can seem obvious and a waste of time, especially when you have a deadline and stored procedures that do not write themselves. Thinking about and writing tests before you write the code helps you to think about the important things:
- What is the output of the code going to be, a result set, multiple results sets, an output parameter, a return parameter?
- What tables, views and functions will it interact with, how will that work
- How will the results be verified, will the code return a random guid or the same value each time?
- Can it do multiple things, are there different code paths to verify
- Should we be verifying that as well as meeting the criteria it also doesn't do something bad
- How will this affect performance of the server and application and will it scale, does it need to scale?
- Are there any security concerns we should check for?
These things are all important. While you do not have to necessarily write unit tests to think about most of them upfront, writing tests forces you to think about these upfront. In terms of the cost of writing unit tests, if you are not doing these things today then we already have a significant cost. If the code was simple, you can easily double the time it would have taken to write the code to include preparation and design time. Doubling the time might seem bad but doubling the time because you are thinking about what you are doing is not a bad thing, especially if we are talking doubling from one hour to two. Doubling from 6 months to 12 would be a different matter.
Before we even think about writing unit tests, we need to have invested time in training developers on how to use a testing framework. The tSQLt framework has quite a steep learning curve, especially if you have never done any unit testing. Terminology like mocking, which is so vital to unit testing, can be off putting for developers who have never written any unit tests.
I would estimate that a new team without any unit testing experience is going to take a least a week or so to get their heads around the concepts and practicalities of unit testing. If there was already someone who knew tSQLt and unit testing, then they could teach other developers, and in a day or two, everyone should be up to speed.
Getting up to speed can be quite a significant cost. In defense of the expense, it is a one-off cost and letting your developers learn how to become better programmers can only be a good thing.
How much time is spent writing the tests?
The time it takes to actually write tests depends on the number of tests and the complexity of the tests. Ideally stored procedures should do one thing, so a single test, or a couple of tests should suffice. In this ideal situation then we are looking at double or possibly triple the time it takes to write the code as it takes to write the tests. If we have a large unwieldy 5000+ line stored procedure with different logic depending on the day of the week, then we should have one test for each path through the stored procedure, which could be a lot more tests.
It is not unheard of to have five times or ten times as much test code as code that is under test. While this may sound like a lot, test code is often much simpler than the code it is testing. It typically sets up some test data, runs the code and then checks the results. There is no business logic.
Let’s look at an example stored procedure and a test that should go with it. I should obviously write the test first, but for this article I will show you the code in the procedure first:
create procedure employees.terminate_employee(@employee_id int) as update employees.employee set status_flag = 0, termination_date = getdate() where employee_id = @employee_id; delete from employees.security_group where employee_id = @employee_id; insert into employees.alumni_registration(employee_id, start_date) select @employee_id, getdate(); go
We do more than a single thing but it is fairly clear what we are doing we:
- set a status flag and date in the employee table
- delete the employees security permissions
- add an entry into the alumni registration table
When we look at this, we should ideally have tests that:
- check that the status_flag and termination_date columns are set for the employee_id and no other employees
- check that the security groups are removed from this employee_id and no others
- check that the alumni_registration is added for this employee_id and no others
If I were writing tSQLt tests, then I would want at least three tests. Each case is different, but I would want to verify that each statement both does what it is supposed to do, and it conversely doesn't do anything that it is not supposed to. In this case, an update only affects the one record that it is supposed to. If I was writing the test in C# I would have a test for both scenarios: one for testing that the code does what it should do and another for testing that it doesn’t do what it should not do. When writing tSQLt tests I do combine them to keep the cost of writing the tests as low as possible. It really is about finding a balance that works for each scenario.
So we have the first unit test:
create procedure terminate_employee_tests.[test employees.employee clears the status and adds a termination date] as exec tSQLt.FakeTable 'employee', 'employees'; exec tSQLt.FakeTable 'security_group', 'employees'; exec tSQLt.FakeTable 'alumni_registration', 'employees'; declare @employee_id int = 1480; --canary employees to ensure is still valid at the end insert into employees.employee(employee_id, status_flag, termination_date) select 100, 1, null union select 9999, 0, getdate(); --record to check insert into employees.employee(employee_id, status_flag, termination_date) select @employee_id, 1, null; exec employees.terminate_employee @employee_id; declare @result int; select @result = count(*) from employees.employee where employee_id = @employee_id and status_flag = 0 and termination_date is not null; exec tSQLt.AssertEquals 1, @result, 'It looks like the employee was not update correctly'; select @result = count(*) from employees.employee where status_flag = 0 and termination_date is null; exec tSQLt.AssertEquals 1, @result, 'It looks like the canary employee has had either their status or termination_date changed'; select @result = count(*) from employees.employee where status_flag = 1 and termination_date is not null; exec tSQLt.AssertEquals 2, @result, 'It looks like the canary terminated employee has had their status changed or termination_date removed'; go
We would then need at least another two tests to cover the three basic operations of the code. We can remove some of the redundancy by placing the FakeTable calls and the inserts into a SetUp procedure so that they are shared between all the tests in the class, but we basically have between ten and twenty lines of code for a single statement of the code we are testing. This may seem like an unfeasibly high amount but the test code should not really grow in size much beyond this. A stored procedure can end up being many thousands of lines long, a practice I definitely do not advocate. However, the test code should remain small, and the test really should only ever check for one thing.
Post coding costs
If we have more code, then we have more work to maintain that code. If we refactor the code under test, then we will likely need to make changes in our tests. If we have tests, then they should be run regularly. Perhaps running the tests finds a bug that should be fixed. These are all additional costs that you do not have when you do not have tests. Without unit tests you are free to change whatever code you like without having to then deploy any updates, run any test suites, wait for them to finish and fix any issues that they find.
Why do we write unit tests?
I think it is pretty clear that there is a relatively high cost of entry to writing unit tests, but is it actually worth it? I have personally seen a number of projects both with and without unit tests, and I can categorically say that in all cases, writing unit tests had a definite benefit on the project. I have seen cases where not having unit tests meant that bugs were either not found or found late enough to cause large re-writes and delays, and where regression bugs cropped up again and again. We accept the high cost of unit testing because it gives us:
- A higher likelihood that we do not have bugs
- A way to help document the code, unit tests do not completely document a system, but they play a large part in it.
- The ability to make changes and know what parts will break and what we need to do to fix those.
- The requirement to think about the code and how it will actually work rather than just writing code and worrying about the end result at a later, potentially more costly, time.
One final note on the cost is that it is significantly cheaper to fix bugs early on in the process. If you are shipping database code to your customers, consider the cost involved with fixing bugs when they are found:
- On the developer’s machine as they write the code.
- On a CI server as the code is checked in and fresh in the developer’s mind.
- In a QA environment when the developer has moved onto another task.
- In the customer’s environment, which will involve the support team and possibly rebuilding a development environment along with a patch to manage and deploy.
The further away from the keyboard of the original developer a bug is found and fixed, the more expensive the fix will be. If you have to deploy a fix to databases you do not control, I guarantee you will want to know why your unit tests didn't catch the problem!
So if you have yet to make the jump into unit testing your SQL Server code, understand that there is a cost associated with it, but the benefits typically far outweigh the costs.