Quickly Creating a tsqlt Test for a Query

One of the things that I feel is important to building better software is testing your code. It should be easy and simple to test code and determine if a chance or enhancement has broken anything.

This usually requires a testing framework. I like tsqlt, which a friend wrote and my employer supports. This framework is easy, I think, to use for tests, but some people feel differently. This post will show me creating a test in < 5 minutes, based on what someone else wrote to ask for help.

The Scenario

In this post, a user was asking about how to write a query that looks to create distinct results. They posted this code, which isn’t far from the type of thing I’ve had other developers send me inside a company asking for help:

DROP TABLE IF EXISTS #BASKET;
DROP TABLE IF EXISTS #ARRIVED;
DROP TABLE IF EXISTS #PREPARED;
DROP TABLE IF EXISTS #WantedResult;

CREATE TABLE #BASKET
(
BASKET_ID nvarchar(15) PRIMARY KEY NOT NULL
)
;
CREATE TABLE #ARRIVED
(
BASKET_ID nvarchar(15) NOT NULL
,ITEM_ID nvarchar(15) NOT NULL
)
CREATE TABLE #PREPARED
(
BASKET_ID nvarchar(15) NOT NULL
,ITEM_ID nvarchar(15) NOT NULL
)

INSERT INTO #BASKET
(BASKET_ID)
SELECT 'A'
UNION ALL SELECT 'B'
UNION ALL SELECT 'C'
UNION ALL SELECT 'D'
;

INSERT INTO #ARRIVED
(BASKET_ID, ITEM_ID)
SELECT 'A','INO 1'
UNION ALL SELECT 'A','INO 2'
UNION ALL SELECT 'B','INO 3'
UNION ALL SELECT 'D','INO 8'

;
INSERT INTO #PREPARED
(BASKET_ID, ITEM_ID)
SELECT 'A','INO 111'
UNION ALL SELECT 'B','INO 44'
UNION ALL SELECT 'B','INO 55'
UNION ALL SELECT 'B','INO 66'
UNION ALL SELECT 'C','INO 170'
;
/* Gives duplicates INO 111, INO 3  */
SELECT 
BASKET.BASKET_ID
,ARRIVED.ITEM_ID IN_ID
,PREPARED.ITEM_ID OUT_ID
FROM #BASKET BASKET
LEFT JOIN #ARRIVED ARRIVED
ON BASKET.BASKET_ID=ARRIVED.BASKET_ID
LEFT JOIN #PREPARED PREPARED
ON BASKET.BASKET_ID=PREPARED.BASKET_ID
ORDER BY BASKET.BASKET_ID,ARRIVED.BASKET_ID,PREPARED.BASKET_ID
;


CREATE TABLE #WantedResult
(
BASKET_ID nvarchar(15)
,IN_ID nvarchar(15)
,OUT_ID nvarchar(15)
,SORT_NBR int
)
INSERT INTO #WantedResult (BASKET_ID,IN_ID,OUT_ID,SORT_NBR)
SELECT 'A' BASKET_ID,'INO 1' IN_ID,'INO 111' OUT_ID,1 SORT_NBR
UNION ALL SELECT 'A' BASKET_ID,'INO 2' IN_ID,NULL OUT_ID,2 SORT_NBR
UNION ALL SELECT 'B' BASKET_ID,NULL,'INO 55' OUT_ID,4 SORT_NBR
UNION ALL SELECT 'B' BASKET_ID,NULL,'INO 66' OUT_ID,5 SORT_NBR
UNION ALL SELECT 'B' BASKET_ID,'INO 3' IN_ID,'INO 44' OUT_ID,3 SORT_NBR
UNION ALL SELECT 'C' BASKET_ID,NULL IN_ID,'INO 170' OUT_ID,6 SORT_NBR
UNION ALL SELECT 'D' BASKET_ID,'INO 8' IN_ID,NULL OUT_ID,7 SORT_NBR
;
SElECT BASKET_ID,IN_ID,OUT_ID,SORT_NBR
FROM #WantedResult
ORDER BY BASKET_ID,SORT_NBR

Now if I run this, I see a query that doesn’t work, and I have results in a table. I could query #WantedResult and compare that with my query, but that’s not great, and it doesn’t persist this over time.

Let’s make this better.

Setup

I’ll assume you have tsqlt installed. If you don’t, read this. If I paste that code into my SSMS, I can do a couple things. First, I’ll remove the temps, because I want to be able to test this over time. I’ll assume this is something I got from another developer, and I might need to check their tables. If these were tables in a dev database I used, I wouldn’t run this part. I wouldn’t run any, I’d keep my test data.

CREATE TABLE BASKET

(

BASKET_ID nvarchar(15) PRIMARY KEY NOT NULL

)

;

CREATE TABLE ARRIVED

(

BASKET_ID nvarchar(15) NOT NULL

,ITEM_ID nvarchar(15) NOT NULL

)

CREATE TABLE PREPARED

(

BASKET_ID nvarchar(15) NOT NULL

,ITEM_ID nvarchar(15) NOT NULL

)
INSERT INTO BASKET

(BASKET_ID)

SELECT 'A'

UNION ALL SELECT 'B'

UNION ALL SELECT 'C'

UNION ALL SELECT 'D'

;
INSERT INTO ARRIVED

(BASKET_ID, ITEM_ID)

SELECT 'A','INO 1'

UNION ALL SELECT 'A','INO 2'

UNION ALL SELECT 'B','INO 3'

UNION ALL SELECT 'D','INO 8'
;

INSERT INTO PREPARED

(BASKET_ID, ITEM_ID)

SELECT 'A','INO 111'

UNION ALL SELECT 'B','INO 44'

UNION ALL SELECT 'B','INO 55'

UNION ALL SELECT 'B','INO 66'

UNION ALL SELECT 'C','INO 170'

;

Once I have that, I’m ready to build a test.

Building a Test

The first thing you need is a test class. Over time, this class likely would exist in a project for my database. However, since this is a new one for me, I’ll create a class with this.

EXEC tsqlt.NewTestClass @ClassName = N'QueryTests' -- nvarchar(max)

GO

Now I build the test, which is a stored procedure. The test class above is a schema, and inside my proc, I’ll paste the first script above. I will then alter a few things. First, I’ll change each CREATE TABLE to a call to tsqlt.faketable. You can see this edit below.

Below this I have inserts. In this case, I have the test data the other dev has, so I’ll just remove the # from each statement and insert into a real table.

Below this, I’ll leave the CREATE TABLE #WantedResult alone. This is a good place for me expected results. However, I’ll also copy this create statement and edit it to create a new table. You can see this below.

The format for a test is:

assemble
act
assert

The assemble is above. Now let’s act. I’ll trop a fake stored procedure here for the query. I would prefer the devs use procs, but I can’t always get what I want. However, this does encourage them to start writing procs. Here’s the entire act:

This runs the procs and puts the results in my temp table.

Now I need to check this. I have the expected results in #WantedResults and the query stuff in #QueryResult. We can assert these are equal with tsqot.AssertEqualsTable. Here’s that code. The message and failmsg are whatever I want.

Below the assert, you can see the end of the test procedure with GO and then the call to run this. Here is the entire code for the test:

CREATE OR ALTER PROCEDURE QueryTests.[test basket query]

as

-- assemble

EXEC tsqlt.FakeTable
   @TableName = N'BASKET'        -- nvarchar(max)

, @SchemaName = N'dbo'       -- nvarchar(max)

EXEC tsqlt.FakeTable
   @TableName = N'ARRIVED'        -- nvarchar(max)

, @SchemaName = N'dbo'       -- nvarchar(max)

EXEC tsqlt.FakeTable
   @TableName = N'PREPARED'        -- nvarchar(max)

, @SchemaName = N'dbo'       -- nvarchar(max)
INSERT INTO BASKET

(BASKET_ID)

SELECT 'A'

UNION ALL SELECT 'B'

UNION ALL SELECT 'C'

UNION ALL SELECT 'D'

;
INSERT INTO ARRIVED

(BASKET_ID, ITEM_ID)

SELECT 'A','INO 1'

UNION ALL SELECT 'A','INO 2'

UNION ALL SELECT 'B','INO 3'

UNION ALL SELECT 'D','INO 8'
;

INSERT INTO PREPARED

(BASKET_ID, ITEM_ID)

SELECT 'A','INO 111'

UNION ALL SELECT 'B','INO 44'

UNION ALL SELECT 'B','INO 55'

UNION ALL SELECT 'B','INO 66'

UNION ALL SELECT 'C','INO 170'

;
CREATE TABLE #WantedResult

(

BASKET_ID nvarchar(15)

,IN_ID nvarchar(15)

,OUT_ID nvarchar(15)

,SORT_NBR int

)

INSERT INTO #WantedResult (BASKET_ID,IN_ID,OUT_ID,SORT_NBR)

SELECT 'A' BASKET_ID,'INO 1' IN_ID,'INO 111' OUT_ID,1 SORT_NBR

UNION ALL SELECT 'A' BASKET_ID,'INO 2' IN_ID,NULL OUT_ID,2 SORT_NBR

UNION ALL SELECT 'B' BASKET_ID,NULL,'INO 55' OUT_ID,4 SORT_NBR

UNION ALL SELECT 'B' BASKET_ID,NULL,'INO 66' OUT_ID,5 SORT_NBR

UNION ALL SELECT 'B' BASKET_ID,'INO 3' IN_ID,'INO 44' OUT_ID,3 SORT_NBR

UNION ALL SELECT 'C' BASKET_ID,NULL IN_ID,'INO 170' OUT_ID,6 SORT_NBR

UNION ALL SELECT 'D' BASKET_ID,'INO 8' IN_ID,NULL OUT_ID,7 SORT_NBR

;
CREATE TABLE #QueryResult

(

BASKET_ID nvarchar(15)

,IN_ID nvarchar(15)

,OUT_ID nvarchar(15)

,SORT_NBR int

)


-- act

INSERT #QueryResult
   EXEC dbo.BasketQuery
-- assert

EXEC tsqlt.AssertEqualsTable
   @Expected = N'#WantedResult' -- nvarchar(max)

, @Actual = N'#QueryResult'   -- nvarchar(max)

, @Message = N'query result works'  -- nvarchar(max)

, @FailMsg = N'bad query'  -- nvarchar(max)

GO

Before we run the test, we need a proc. I’m going to take the first solution from the thread linked and drop it inside a proc call like this:

CREATE OR ALTER PROCEDURE dbo.BasketQuery

AS

;WITH cte_results AS (

SELECT 

BASKET.BASKET_ID

,ARRIVED.ITEM_ID IN_ID

,PREPARED.ITEM_ID OUT_ID

,ROW_NUMBER() OVER(ORDER BY BASKET.BASKET_ID,ARRIVED.BASKET_ID,PREPARED.BASKET_ID) AS row_num

FROM BASKET BASKET

LEFT JOIN ARRIVED ARRIVED

ON BASKET.BASKET_ID=ARRIVED.BASKET_ID

LEFT JOIN PREPARED PREPARED

ON BASKET.BASKET_ID=PREPARED.BASKET_ID

--ORDER BY BASKET.BASKET_ID,ARRIVED.BASKET_ID,PREPARED.BASKET_ID

)

SELECT

c1.BASKET_ID,

CASE WHEN EXISTS(SELECT 1 FROM cte_results c2 WHERE c2.BASKET_ID = c1.BASKET_ID AND c2.IN_ID = c1.IN_ID AND c2.row_num < c1.row_num) THEN NULL ELSE c1.IN_ID END,

CASE WHEN EXISTS(SELECT 1 FROM cte_results c2 WHERE c2.BASKET_ID = c1.BASKET_ID AND c2.OUT_ID = c1.OUT_ID AND c2.row_num < c1.row_num) THEN NULL ELSE c1.OUT_ID END

FROM cte_results c1

ORDER BY c1.row_num

GO

When I call my proc, I see this:

My test failed. Why? The query doesn’t have the right number of columns for the result. Easy for a developer to miss. Easy to catch with a real testing framework.

I can alter the proc by adding a 4th column to the results, returning the row number. When I do that, my code for the final select is:

SELECT

c1.BASKET_ID,

CASE WHEN EXISTS(SELECT 1 FROM cte_results c2 WHERE c2.BASKET_ID = c1.BASKET_ID AND c2.IN_ID = c1.IN_ID AND c2.row_num < c1.row_num) THEN NULL ELSE c1.IN_ID END,

CASE WHEN EXISTS(SELECT 1 FROM cte_results c2 WHERE c2.BASKET_ID = c1.BASKET_ID AND c2.OUT_ID = c1.OUT_ID AND c2.row_num < c1.row_num) THEN NULL ELSE c1.OUT_ID END

, c1.row_num AS sort_nbr

FROM cte_results c1

ORDER BY c1.row_num

Now if I run my test, I see this:

It worked. The query has the right results, at least according to the original developer. I might double check their sample data, but regardless of how I change my own dev database, or if the data changes, this test will continue to run with this set of data.

Testing a Refactor

In the thread, there was a second query. I can test that easily as well. All I’ll do is run this code:

CREATE OR ALTER PROCEDURE dbo.BasketQuery

AS

SELECT b.BASKET_ID
  , CASE WHEN ROW_NUMBER() OVER(PARTITION BY b.BASKET_ID, a.ITEM_ID ORDER BY p.ITEM_ID) = 1 THEN a.ITEM_ID END AS IN_ID
  , CASE WHEN ROW_NUMBER() OVER(PARTITION BY b.BASKET_ID, p.ITEM_ID ORDER BY a.ITEM_ID) = 1 THEN p.ITEM_ID END AS OUT_ID
      , ROW_NUMBER() OVER(ORDER BY B.BASKET_ID, a.ITEM_ID, p.ITEM_ID) AS Sort_Nbr

FROM BASKET AS b

LEFT OUTER JOIN ARRIVED AS a
      ON b.BASKET_ID = a.BASKET_ID

LEFT OUTER JOIN PREPARED AS p
     ON b.BASKET_ID = p.BASKET_ID

GO

This is Drew’s solution, where I just replaced one query in the proc with another. After I do that, I just run this again, no test code changes.

EXEC tsqlt.run 'QueryTests.[test basket query]'

GO

I get the same result as the previous test: success.

Adding Other Cases

If I wanted to test other test cases, I could do one of two things here. First, I could also the test data that is assembled in the top of the test and then alter the inserts for the #WantedResult table.

The other option is copy this entire test, give it a different name, and then add different test data that might test specific things. Like, what if I’m missing data in a column or even a table. Whatever crazy cases I get from customers in the live database, I can build tests for those.

Performance

I don’t have a good way to programmatically test performance, but if I were capturing this code in a VCS, I could easily see two versions of the proc. I’d do something like this:

CREATE OR ALTER PROCEDURE dbo.BasketQuery

AS

;WITH cte_results AS (

SELECT 

BASKET.BASKET_ID

,ARRIVED.ITEM_ID IN_ID

,PREPARED.ITEM_ID OUT_ID

,ROW_NUMBER() OVER(ORDER BY BASKET.BASKET_ID,ARRIVED.BASKET_ID,PREPARED.BASKET_ID) AS row_num

FROM BASKET BASKET

LEFT JOIN ARRIVED ARRIVED

ON BASKET.BASKET_ID=ARRIVED.BASKET_ID

LEFT JOIN PREPARED PREPARED

ON BASKET.BASKET_ID=PREPARED.BASKET_ID

--ORDER BY BASKET.BASKET_ID,ARRIVED.BASKET_ID,PREPARED.BASKET_ID

)

SELECT

c1.BASKET_ID,

CASE WHEN EXISTS(SELECT 1 FROM cte_results c2 WHERE c2.BASKET_ID = c1.BASKET_ID AND c2.IN_ID = c1.IN_ID AND c2.row_num < c1.row_num) THEN NULL ELSE c1.IN_ID END,

CASE WHEN EXISTS(SELECT 1 FROM cte_results c2 WHERE c2.BASKET_ID = c1.BASKET_ID AND c2.OUT_ID = c1.OUT_ID AND c2.row_num < c1.row_num) THEN NULL ELSE c1.OUT_ID END

, c1.row_num AS sort_nbr

FROM cte_results c1

ORDER BY c1.row_num

GO

SET STATISTICS IO ON

EXEC dbo.BasketQuery

SET STATISTICS IO OFF

GO

CREATE OR ALTER PROCEDURE dbo.BasketQuery

AS

SELECT b.BASKET_ID
  , CASE WHEN ROW_NUMBER() OVER(PARTITION BY b.BASKET_ID, a.ITEM_ID ORDER BY p.ITEM_ID) = 1 THEN a.ITEM_ID END AS IN_ID
  , CASE WHEN ROW_NUMBER() OVER(PARTITION BY b.BASKET_ID, p.ITEM_ID ORDER BY a.ITEM_ID) = 1 THEN p.ITEM_ID END AS OUT_ID
      , ROW_NUMBER() OVER(ORDER BY B.BASKET_ID, a.ITEM_ID, p.ITEM_ID) AS Sort_Nbr

FROM BASKET AS b

LEFT OUTER JOIN ARRIVED AS a
      ON b.BASKET_ID = a.BASKET_ID

LEFT OUTER JOIN PREPARED AS p
     ON b.BASKET_ID = p.BASKET_ID

GO

SET STATISTICS IO ON

EXEC dbo.BasketQuery

SET STATISTICS IO OFF

When I run this, I see these results:

Clearly query 2 runs more efficiently than query 1. That’s the one that needs to be submitted to the VCS for deployment. Of course, I’d ensure I’d run the test to be sure my query works, and then the CI process would re-run the test because it can’t trust me.

And it shouldn’t.

Summary

It took my much longer to write this post, especially with picking code formatting and shooting images, than it did to write the test. Literally within 5 minutes I’d pasted the code from the thread twice and use it to create real tables and the test as shown above. I verified both queries within 5 minutes.

Using a known process to test is good. It helps your developers. Getting set up with test data, and then learning to re-use some of those Assemble structures will help your team get up to speed and spread the load of creating data. This also forces developers to think logically through what they’re testing and not just vaguely looking at results and thinking they’re correct.

This also means as we discover holes we haven’t tested, or we want to refactor and enhance our code, we prevent regressions.

Give tsqlt a try and see if it can help you.