In this article I am going to talk about refactoring code. Refactoring means taking code that although it probably works, is not correct in terms of structure or general tidiness. Refactoring involves changing the code so that it is better written or in a position to be made better later on. We can often use refactoring to remove technical debt or to stop technical debt from happening in the first place. If we are actively working on a system it is best to constantly make small improvements over time as and when you touch some code rather than getting to the point that it feels like the best way to make progress is to throw the code away and start again.
To be able to refactor effectively we need to have a valid set of tests, but if you have some code that you want to refactor that does not have a set of tests against it, you can write the tests before you refactor the code. In this example we will take some T-SQL that uses a cursor to loop through some records and add some tSQLt unit tests and then refactor the code making sure the unit tests pass.
This is the schema and code, it intentionally has a number of common issues with naming, structure and design but functionally works:
create table customers(customer_id int not null primary key clustered, name varchar(max)) go create table orders(order_id int not null identity(1,1) primary key clustered, customer_id int, state int) go create table processed_ordrs(order_identifier varchar(32) not null primary key clustered, processed_date datetime not null default (getdate())) go create procedure [dbo].[update_completed] as declare @order_id varchar(32); declare crs cursor for select order_identifier from processed_ordrs open crs fetch next from crs into @order_id while(@@FETCH_STATUS = 0) begin update orders set state = 2 where order_id = @order_id; delete from processed_ordrs where order_identifier = @order_id fetch next from crs into @order_id end close crs deallocate crs go
When we look at this code there are a number of things that we can do to it, some of the changes I would like to make are:
- processed_ordrs should be named processed_orders
- processed_ordrs should have an int as the primary key
- processed_ordrs seems unnessesary, we should look at removing the table
- processed_ordrs has a primary key and a default constraint that are not named
- processed_ordrs should, if the table is needed, have a foreign key pointing to orders
- orders should name the primary key constraint
- orders should have a foreign key pointing to customers
- customers should name the primary key constraint
- customers only has a single column for name
- customers should not need up to 2gb for the customer name so does not need to be a varchar(max)
- customers may need to use an nvarchar rather than a varchar for name
If we look at the stored procedure then we can remove the cursor and replace it with a set based approach.
Approaches to refactoring the schema
When we want to make changes to the database schema there are three main approaches:
- Do nothing
- Write the upgrade scripts manually
- Use a tool to write the upgrade scripts
Sometimes the easiest thing to do is nothing. Perhaps you are unsure of the ramifications of making a change, or you may think the schema is going to be changed by an upgrade later on. Although it is best to make improvements as you go along, sometimes doing nothing is the right option.
The choice between writing manual upgrade scripts or using a tool is a simple one for me, I would prefer to spend time on declaring how the schema should look rather than the journey that the schema takes to get to where it is. The tooling that we have for SQL Server is mature and very capable of creating deployment scripts in the majority of cases. We do however need to be aware of when the tooling is not able to generate upgrade scripts for us.
To understand where the tooling can and cannot write upgrade scripts for us we need to break changes into three categories:
- Online changes that can be automatically generated
- Offline changes that can be automatically generated
- Offline changes that cannot be automatically generated
Online changes that can be automatically generated are changes such as adding an object which typically do not break existing code. In this case the change can be generated by a tool such as SSDT or SQL Compare with no downtime.
Offline changes that can be automatically generated are changes such as changing a column type or naming a constraint. These changes can be automatically generated using SQL Compare or SSDT but should be done offline as they will cause blocking or may be blocked by user requests. Perhaps if you are adding a column in the middle of your existing table then the tool will generate a new table and migrate the data across.
Offline changes that cannot be automatically generated are changes such as splitting or merging tables, so if we wanted to remove the processed_ordrs table and put the processed_date into the main orders table a tool would not be able to generate that change for us automatically without quite a lot on input from the developer.
If we use SSDT to create our upgrade scripts there is one category of changes that can be automatically generated whereas other tools such as SQL Compare cannot. If you use the refactoring support in SSDT to rename an object, a schema, table or column name instead of dropping the old column and creating a new one it will use sp_rename to rename the object.
Once we know what effect a change will have we can be in the position to start making changes to the schema. We can make online changes that can be automatically generated as we go along but the other types of changes we might make in groups or when we have a feature to implement that means that the object will require one of the offline types of changes.
Aside from the schema we will likely want to make changes to the code and in many ways these changes are simpler if we make use of unit testing. The process for making code changes which are normally always online changes that can be automatically generated is:
- Run or write and the run the unit tests
- Change the code
- Re-run the unit tests
The unit tests should all pass before and after making the changes, if they do not work before then spend the time to understand why – is the code incorrect or is there a problem with the test? It is both important and satisfying to write some code and see it pass the tests.
In this example we will re-write the stored procedure so that it uses a set based approach but we have no unit tests so we will create some using tSQLt.
The first thing we need is a schema to hold our test:
create schema [update_completed_tests] go exec sys.sp_addextendedproperty @name=N'tSQLt.TestClass', @value=1 , @level0type=N'SCHEMA',@level0name=N'update_completed_tests' go
Then if we look at the original stored procedure code it basically does two things, firstly it updates orders and then deletes from processed_orders. In this refactoring session we will look to improve the code but leave the schame changes for later. We will write a test that inserts an order and checks that the entry in orders is updated and a second test that inserts entry in processed_orders and check that it is deleted.
create procedure [update_completed_tests].[test order is updated] as exec tSQLt.FakeTable 'processed_ordrs'; exec tSQLt.FakeTable 'orders'; insert into orders(order_id, state) select 193, 1 insert into processed_ordrs(order_identifier) select 193; exec update_completed; declare @actual int = (select state from orders where order_id = 193); exec tSQLt.AssertEquals 2, @actual; go create procedure [update_completed_tests].[test processed order is deleted] as exec tSQLt.FakeTable ‘processed_ordrs’; exec tSQLt.FakeTable ‘orders’; insert into processed_ordrs (order_identifier) select 193 exec update_completed; declare @actual int = (select count(*) from processed_ordrs); exec tSQLt.AssertEquals 0, @actual; go
If we break the test down we have the FakeTable statements. These are really useful because what it does is remove all constraints and triggers and means that we can just insert the columns that we are interested in. Without using FakeTable instead of a single insert statement as we have here, we would need to clear out data from previous runs, ensure that any data in other tables that is required for foreign keys is available and any non-null columns have a valid entry. Using FakeTable we can just insert the data in the columns that we need for the test, which saves time and further means we need to be sure how our code works which cannot be a bad thing.
Because we have the FakeTable statements we can then insert the data we need for the test. We then have the code that we actually want to test:
Followed by our checks to ensure that the code has worked. If tSQLt.AssertEquals finds that the values do not match then it will throw an exception and the test will fail:
declare @actual int = (select state from orders where order_id = 193); exec tSQLt.AssertEquals 2, @actual;
When we have the tests in place we can run them and make sure that they pass. If they do not pass we must understand why and fix the issue before refactoring.
Once we are happy with the tests and we can refactor the code as we are in a safe position to re-write the stored procedure. The new code looks like:
create procedure update_completed as update orders set state = 2 where order_id in (select order_identifier from processed_ordrs); delete from processed_ordrs where order_identifier in (select order_id from orders where state = 2); go
We can now run our tests and ensure that they still pass after the code has been re-factored.
In this article I have shown that refactoring databases is important and can be done, we do need to be aware of the type of deployment that the refactoring will create and whether we have existing tests to cover the changes or whether we should write those before we make the changes.
We have implemented a two unit tests that have allowed us to change the stored procedure and given us given us confidence to make other changes later on and know at least this one stored procedure can be verified.