Adding a Query Governor to Dynamic-Search Stored Procedures

,

Sometime back I wrote a piece that SQLServerCentral kindly published called “How to Design, Build and Test a Dynamic Search Stored Procedure.”  While I believe that the technical advice given in that article is sound, there is a case where it could use a small improvement.

That article focused on how to build up highly-efficient, dynamic SQL to retrieve a row set from a table, based on filtering criteria passed into the stored procedure.  Presumably this filtering criteria is the result of an application’s user visiting a form and entering the filtering criteria from a set of form fields corresponding to the filtering criteria expected by the stored procedure.

What the technical aspects lacked was a practical focus on a general user’s behavior.  So let’s consider that for a moment.  When a user comes to a search form he is offered an array of filtering criteria to select his choice of rows from the table for display in the application.  Perhaps a bewildering array of choices.  The user must consciously think about what makes his need important and what filtering criteria will get him there the fastest.  So the user pauses to contemplate these deep questions.

Most people in the Information Technology industry have heard the term “dumb user” before.  Normally we try to avoid using it when we happen to be around one of the potential users of the system for fear of the political repercussions.  But the term persists and has been around as long as I can remember.  It originated from the concept that each form in an application needs to be coded around the principle of the dumbest possible user operating that form.  The idea was that the application should be the shepherd, leading its flock (the user community) to a desired outcome, this being that expected results should be delivered quickly and efficiently.

Let’s go back now to our intensely dumb user who has our search form displayed on his monitor and is now pondering his choices.  What makes the record this person is seeking unique?  How can it be found the quickest?  Is he even sure how to limit the choice?  How is a dumb user expected to react?

Why not hit the query button with no filtering criteria present?  His lizard-like brain comes to the conclusion that the machine is probably not as capable of filtering the records to just the one he would like to see, and knows that he’s the dominant life form on this planet so probably can do a better job of filtering the results by eyeball.

That’s fine when there’s just a few rows in the table, but what if there are tens of millions?  Our poor web application just retrieved all of those rows, and hopefully was at least smart enough to display a limited set (called a page) at a time.  But somewhere all of those tens of millions of cached records exist, pining away for our lizard to hit the “next page” key and show another tiny slice of the cached results.

Enter the Query Governor, better known as a way to limit rows retrieved for an unfiltered query and still be most likely to retrieve what the user needs.

Some Sample Data and a Simplified Dynamic Search Stored Procedure

Since I’m big into code reuse, let’s reuse some of the code from that article to construct our sample data.

CREATE TABLE dbo.Shipments
(
    Consignment_No  VARCHAR(30)   NOT NULL
    ,ReferenceNo    VARCHAR(30)   NULL
    ,CustID         VARCHAR(20)   NULL
    ,Total_Pkgs     INT           NOT NULL
    ,Last_Status    VARCHAR(12)   NULL
    ,Last_Status_DT DATETIME      NULL
    ,PRIMARY KEY CLUSTERED (Consignment_No)
);
INSERT INTO dbo.Shipments (Consignment_No, ReferenceNo, CustID, Total_Pkgs,Last_Status_DT)
VALUES ('8033701','PO 12345','ACME',1, '2013-10-23 20:12:00.000')
    ,('0152037',NULL,'ACME',1, '2013-10-28 13:11:00.000')
    ,('4292860',NULL,'ATLAS',1, '2013-10-25 08:49:00.000')
    ,('0806473','INV 43523','ATLAS',1, '2013-10-24 10:54:00.000')
    ,('6289811',NULL,'APPLE',1, '2013-10-27 14:45:00.000')
    ,('0642191','SO 1111','APPLE',1, '2013-10-29 11:37:00.000')
    ,('8363496',NULL,NULL,1, '2013-10-22 02:44:00.000')
    ,('7271931',NULL,NULL,1, '2013-10-29 00:36:00.000');
   
GO
--DROP PROCEDURE dbo.Shipment_Tracking;
--DROP PROCEDURE dbo.Shipment_Tracking_QG1;
--DROP PROCEDURE dbo.Shipment_Tracking_QG2;
--DROP TABLE dbo.QueryGovernor;
--DROP TABLE dbo.Shipments;

The DROPs are provided to ultimately clean out of your sandbox all of the objects we’ll create throughout this article.

We’ll use a trimmed down version of the dynamic search stored procedure from the article, showing just some of the criteria for filtering.

CREATE PROCEDURE [dbo].[Shipment_Tracking]
(
    @StartDT           DATETIME    = NULL
    ,@EndDT            DATETIME    = NULL
    ,@Consignment_No   VARCHAR(30) = NULL
    ,@CustID           VARCHAR(20) = NULL
    ,@debug            TINYINT     = 0       -- 1=Display debug results
)
AS BEGIN
    SET NOCOUNT ON;
   
    -- Truncate time component from start date
    DECLARE @StartDate DATETIME = DATEADD(day, 0, DATEDIFF(day, 0, @StartDT))
        -- Truncate time component from end date and add 1
        ,@EndDate      DATETIME = DATEADD(day, 1, DATEDIFF(day, 0, @EndDT))
        ,@SQL          NVARCHAR(MAX)
        ,@SQLParms     NVARCHAR(MAX) = N'
            @StartDT           DATETIME
            ,@EndDT            DATETIME
            ,@Consignment_No   VARCHAR(30)
            ,@CustID           VARCHAR(20)';
   
    -- If @EndDate is NULL, then use @StartDT + 1   
    SELECT @EndDate        = ISNULL(@EndDate, DATEADD(day, 1, DATEDIFF(day, 0, @StartDT)))
        ,@Consignment_No   = NULLIF(@Consignment_No, '')
        ,@CustID           = NULLIF(@CustID, '')
        ,@debug            = CASE WHEN @debug IS NULL THEN 0
                                WHEN @debug IN (0,1) THEN @debug
                                ELSE 0 END;
    IF @debug = 1
    BEGIN  
        PRINT CONVERT(VARCHAR(20), @StartDate, 120);
        PRINT CONVERT(VARCHAR(20), @EndDate, 120);
    END
    
    SELECT @SQL = N'
        SELECT a.Consignment_No, a.ReferenceNo, a.CustID, a.Total_Pkgs
            ,a.Last_Status, a.Last_Status_DT
        FROM dbo.Shipments a' +                                                      CHAR(10) +
        CASE WHEN @StartDate IS NOT NULL
            THEN N'            AND a.Last_Status_DT >= @StartDT ' +                  CHAR(10) +
                 N'            AND a.Last_Status_DT < @EndDT ' +                     CHAR(10)
            ELSE N'' END +
        CASE WHEN @Consignment_No IS NOT NULL
            THEN N'            AND a.Consignment_No = @Consignment_No ' +            CHAR(10)
            ELSE N'' END +   
        CASE WHEN @CustID IS NOT NULL
            THEN N'            AND a.CustID = @CustID ' +                            CHAR(10)
            ELSE N'' END +                                                           CHAR(10) +
        N'ORDER BY Consignment_No';
    IF @debug = 1
    PRINT @SQL;
    EXEC sp_executesql @SQL
        ,@SQLParms
        ,@StartDT        = @StartDate
        ,@EndDT          = @EndDate
        ,@Consignment_No = @Consignment_No
        ,@CustID         = @CustID;
       
END

If we execute the following, we find that both queries return the same results.  First the query:

SELECT *
FROM dbo.Shipments
ORDER BY Consignment_No;
EXEC dbo.Shipment_Tracking
    @StartDT           = NULL
    ,@EndDT            = NULL
    ,@Consignment_No   = NULL
    ,@CustID           = NULL
    ,@debug            = 0;

And now the results:

Consignment_No ReferenceNo CustID Total_Pkgs Last_Status Last_Status_DT
0152037        NULL        ACME      1       MDE         2013-10-28 13:11:00.000
0642191        SO 1111     APPLE     1       MDE         2013-10-29 11:37:00.000
0806473        INV 43523   ATLAS     1       MDE         2013-10-24 10:54:00.000
4292860        NULL        ATLAS     1       POD         2013-10-25 08:49:00.000
6289811        NULL        APPLE     1       POD         2013-10-27 14:45:00.000
7271931        NULL        NULL      1       WGT         2013-10-29 00:36:00.000
8033701        PO 12345    ACME      1       POD         2013-10-23 20:12:00.000
8363496        NULL        NULL      1       SIP         2013-10-22 02:44:00.000

The Most Common Query Governor

When most T-SQL folks want to limit the returned rows, the normal thought process is to use the TOP clause of the SELECT statement.  So the idea then is to implement something like “if the user has specified no filtering criteria, return only the top n rows.”

You must also think about, TOP n based on what?  The clever developer will think “how about returning the most current information?”  Indeed, that is most likely to satisfy the need of the lizard-brain, that prefers tactile skills (the execute query button push) over cerebral ones (like thinking about the filtering criteria).

Let’s modify our SP to wrap a TOP clause that returns the TOP 5 rows that are the most current around our base query.

CREATE PROCEDURE dbo.Shipment_Tracking_QG1
(
    @StartDT           DATETIME    = NULL
    ,@EndDT            DATETIME    = NULL
    ,@Consignment_No   VARCHAR(30) = NULL
    ,@CustID           VARCHAR(20) = NULL
    ,@debug            TINYINT     = 0       -- 1=Display debug results
)
AS BEGIN
    SET NOCOUNT ON;
   
    -- Truncate time component from start date
    DECLARE @StartDate DATETIME = DATEADD(day, 0, DATEDIFF(day, 0, @StartDT))
        -- Truncate time component from end date and add 1
        ,@EndDate      DATETIME = DATEADD(day, 1, DATEDIFF(day, 0, @EndDT))
        ,@SQL          NVARCHAR(MAX) = N''
        ,@SQLParms     NVARCHAR(MAX) = N'
            @StartDT           DATETIME
            ,@EndDT            DATETIME
            ,@Consignment_No   VARCHAR(30)
            ,@CustID           VARCHAR(20)';
   
    -- If @EndDate is NULL, then use @StartDT + 1   
    SELECT @EndDate        = ISNULL(@EndDate, DATEADD(day, 1, DATEDIFF(day, 0, @StartDT)))
        ,@Consignment_No   = NULLIF(@Consignment_No, '')
        ,@CustID           = NULLIF(@CustID, '')
        ,@debug            = CASE WHEN @debug IS NULL THEN 0
                                WHEN @debug IN (0,1) THEN @debug
                                ELSE 0 END;
    IF @debug = 1
    BEGIN  
        PRINT CONVERT(VARCHAR(20), @StartDate, 120);
        PRINT CONVERT(VARCHAR(20), @EndDate, 120);
    END
   
    IF @StartDT IS NULL AND @EndDT IS NULL AND
        @Consignment_No IS NULL AND @CustID IS NULL
    SELECT @SQL = @SQL + N'SELECT Consignment_No, ReferenceNo, CustID, Total_Pkgs
            ,Last_Status, Last_Status_DT' +                                           CHAR(10) +
            N'FROM (' +                                                               CHAR(10);
    
    SELECT @SQL = @SQL + N'SELECT ' +
        CASE WHEN @StartDT IS NULL AND @EndDT IS NULL AND
                  @Consignment_No IS NULL AND @CustID IS NULL
            THEN N'TOP 5 '
            ELSE '' END +
        N'   a.Consignment_No, a.ReferenceNo, a.CustID, a.Total_Pkgs
            ,a.Last_Status, a.Last_Status_DT
        FROM dbo.Shipments a' +                                                       CHAR(10) +
        CASE WHEN @StartDate IS NOT NULL
            THEN N'            AND a.Last_Status_DT >= @StartDT ' +                   CHAR(10) +
                 N'            AND a.Last_Status_DT < @EndDT ' +                      CHAR(10)
            ELSE N'' END +
        CASE WHEN @Consignment_No IS NOT NULL
            THEN N'            AND a.Consignment_No = @Consignment_No ' +             CHAR(10)
            ELSE N'' END +   
        CASE WHEN @CustID IS NOT NULL
            THEN N'            AND a.CustID = @CustID ' +                             CHAR(10)
            ELSE N'' END +                                                            CHAR(10) +
        CASE WHEN @StartDT IS NULL AND @EndDT IS NULL AND
                  @Consignment_No IS NULL AND @CustID IS NULL
            THEN N'ORDER BY Last_Status_DT DESC) Shipments' +                         CHAR(10)
            ELSE '' END +
        N'ORDER BY Consignment_No';
    IF @debug = 1
    PRINT @SQL;
    EXEC sp_executesql @SQL
        ,@SQLParms
        ,@StartDT        = @StartDate
        ,@EndDT          = @EndDate
        ,@Consignment_No = @Consignment_No
        ,@CustID         = @CustID;
       
END

You can see how in the above we’ve added some SQL that is specific to the case where all four of the SP’s input parameters are passed as NULL.  While just a bit messy in my opinion, it can be made to work.

Running this query, produces a new results set:

EXEC dbo.Shipment_Tracking_QG1
    @StartDT           = NULL
    ,@EndDT            = NULL
    ,@Consignment_No   = NULL
    ,@CustID           = NULL
    ,@debug            = 1;

The dynamic SQL produced by the PRINT statement (when @debug=1) now looks like this (after a bit of touch up formatting):

SELECT Consignment_No, ReferenceNo, CustID, Total_Pkgs
    ,Last_Status, Last_Status_DT
FROM (
    SELECT TOP 5 a.Consignment_No, a.ReferenceNo, a.CustID, a.Total_Pkgs
        ,a.Last_Status, a.Last_Status_DT
    FROM dbo.Shipments a
    ORDER BY Last_Status_DT DESC
) Shipments
ORDER BY Consignment_No

And the results returned are the “governed” results:

Consignment_No ReferenceNo CustID Total_Pkgs Last_Status Last_Status_DT
0152037        NULL        ACME      1       MDE         2013-10-28 13:11:00.000
0642191        SO 1111     APPLE     1       MDE         2013-10-29 11:37:00.000
4292860        NULL        ATLAS     1       POD         2013-10-25 08:49:00.000
6289811        NULL        APPLE     1       POD         2013-10-27 14:45:00.000
7271931        NULL        NULL      1       WGT         2013-10-29 00:36:00.000

Of course, this approach begs the question, how many rows is the right number to return to maximize the likelihood that the user will find what he needs?  That would probably depend on the daily transaction volume, so what if over time that increases ten or one hundred-fold?  If we start by returning 1,000 rows, it might require 10,000 rows a year from now.

An Alternative Query Governor

Let’s define a table that contains rows that are based on the search form.

CREATE TABLE dbo.QueryGovernor
(
    FormName        VARCHAR(20) 
    ,UserRole       VARCHAR(20)
    ,LookbackDays   INT
    ,PRIMARY KEY (FormName, UserRole)
);
INSERT INTO dbo.QueryGovernor (FormName, UserRole, LookbackDays)
SELECT 'MySearchQuery', 'NORMAL', 5;
SELECT *
FROM dbo.QueryGovernor;

Note how, in our table we have left a placeholder column for the UserRole, which is to say different roles may have a different number of look back days assigned.  The assumption is that a super-user of the system might not be quite as lizard-brained as a normal user, so for them we might allow a longer look back period.

The modifications to our original SP are a little bit more straightforward and a lot less messy.

CREATE PROCEDURE [dbo].[Shipment_Tracking_QG2]
(
    @StartDT           DATETIME    = NULL
    ,@EndDT            DATETIME    = NULL
    ,@Consignment_No   VARCHAR(30) = NULL
    ,@CustID           VARCHAR(20) = NULL
    ,@debug            TINYINT     = 0       -- 1=Display debug results
)
AS BEGIN
    SET NOCOUNT ON;
   
    -- Truncate time component from start date
    DECLARE @StartDate DATETIME = DATEADD(day, 0, DATEDIFF(day, 0, @StartDT))
        -- Truncate time component from end date and add 1
        ,@EndDate      DATETIME = DATEADD(day, 1, DATEDIFF(day, 0, @EndDT))
        ,@CurrentDT    DATETIME = '2013-10-30'          -- Normally you should assign GETDATE()
        -- Days to look back from the QueryGovernor table
        ,@LookbackDays INT      =
            (
                SELECT LookbackDays
                FROM dbo.QueryGovernor
                WHERE FormName = 'MySearchQuery'        -- AND the type of user if applicable
            )
        ,@SQL          NVARCHAR(MAX)
        ,@SQLParms     NVARCHAR(MAX) = N'
            @StartDT           DATETIME
            ,@EndDT            DATETIME
            ,@Consignment_No   VARCHAR(30)
            ,@CustID           VARCHAR(20)
            ,@CurrentDT        DATETIME
            ,@LookbackDays     INT';
   
    -- If @EndDate is NULL, then use @StartDT + 1   
    SELECT @EndDate        = ISNULL(@EndDate, DATEADD(day, 1, DATEDIFF(day, 0, @StartDT)))
        ,@Consignment_No   = NULLIF(@Consignment_No, '')
        ,@CustID           = NULLIF(@CustID, '')
        ,@debug            = CASE WHEN @debug IS NULL THEN 0
                                WHEN @debug IN (0,1) THEN @debug
                                ELSE 0 END;
    IF @debug = 1
    BEGIN  
        PRINT CONVERT(VARCHAR(20), @StartDate, 120);
        PRINT CONVERT(VARCHAR(20), @EndDate, 120);
    END
    
    SELECT @SQL = N'
        SELECT a.Consignment_No, a.ReferenceNo, a.CustID, a.Total_Pkgs
            ,a.Last_Status, a.Last_Status_DT
        FROM dbo.Shipments a' +                                                        CHAR(10) +
        CASE WHEN @StartDate IS NOT NULL
            THEN N'            AND a.Last_Status_DT >= @StartDT ' +                    CHAR(10) +
                 N'            AND a.Last_Status_DT < @EndDT ' +                       CHAR(10)
            ELSE N'' END +
        CASE WHEN @Consignment_No IS NOT NULL
            THEN N'            AND a.Consignment_No = @Consignment_No ' +              CHAR(10)
            ELSE N'' END +   
        CASE WHEN @CustID IS NOT NULL
            THEN N'            AND a.CustID = @CustID ' +                              CHAR(10)
            ELSE N'' END +                                                             CHAR(10) +
        CASE WHEN @StartDT IS NULL AND @EndDT IS NULL AND
                  @Consignment_No IS NULL AND @CustID IS NULL
            THEN N'WHERE a.Last_Status_DT >= DATEADD(day,-@LookbackDays,@CurrentDT)' + CHAR(10)
            ELSE N'' END +                                                             CHAR(10) +
        N'ORDER BY Consignment_No';
    IF @debug = 1
    PRINT @SQL;
    EXEC sp_executesql @SQL
        ,@SQLParms
        ,@StartDT        = @StartDate
        ,@EndDT          = @EndDate
        ,@Consignment_No = @Consignment_No
        ,@CustID         = @CustID
        ,@CurrentDT      = @CurrentDT
        ,@LookbackDays   = @LookbackDays;
       
END

Now when we run our SP and trigger the query governor, our results are the same as limiting to the TOP 5 rows but will now automatically vary depending on the number of transactions created each day:

Consignment_No ReferenceNo CustID Total_Pkgs Last_Status Last_Status_DT
0152037        NULL        ACME      1       MDE         2013-10-28 13:11:00.000
0642191        SO 1111     APPLE     1       MDE         2013-10-29 11:37:00.000
4292860        NULL        ATLAS     1       POD         2013-10-25 08:49:00.000
6289811        NULL        APPLE     1       POD         2013-10-27 14:45:00.000
7271931        NULL        NULL      1       WGT         2013-10-29 00:36:00.000

The SQL created by our SP now looks like this (also more straightforward):

SELECT a.Consignment_No, a.ReferenceNo, a.CustID, a.Total_Pkgs
    ,a.Last_Status, a.Last_Status_DT
FROM dbo.Shipments a
WHERE a.Last_Status_DT >= DATEADD(day, -@LookbackDays, @CurrentDT)
ORDER BY Consignment_No

In reality, the @CurrentDT local variable and parameter we passed when executing the dynamic SQL is not really needed because you can simply replace it with GETDATE().  We did that so you could replicate and play around with the SQL provided in this article.

Of course, if you need to vary the query governor by role, you’d probably also need to pass the user’s login into the SP and apply it where the @Dayslookback local variable’s initial value is set (in the sub-query’s WHERE clause).

The Down-side Risk of Using one of these Approaches

When our dumb user with the lizard-brain runs one of his “I’m too lazy to think about what I want, so give me everything queries” and then does a manual search through the returned results, he will be mighty disconcerted when he doesn’t find that which he seeks.

If the query governor is quietly limiting the results returned, users may be asking the question “I know what I want is there, so why can’t I see it?”  It is quite possible that they’ll assume there’s a bug in the application for this case.

Communication is essential to limiting misunderstandings.  Tell the user community why the query governor is employed, and better yet give them a form that allows them control over its settings on a form-by-form basis.

And be sure to remind them that if they are careful in considering what filtering criteria to apply, it overrides the query governor.

Conclusion

When searching against a transactions table, my belief is that users for the most part are interested in looking at recent transactions.  The more current transactions are more likely to be of interest than those from the distant past.  Most transaction tables have some sort of date on each row that you can use to identify what is current and what is not.

So my preferred approach is to implement a query governor that is configurable.  Give the user access to the governor and allow them to define the governors’ limits by form and based on role.

The main point of this article though is to limit the loading of cached records onto your application server and improving the overall performance of your application, when you’re not dealing with users that have a concern for such things.

Dwain Camps

SQL Performance Evangelist

Follow me on Twitter: @DwainCSQL

Disclaimer:

I hope my use of the “he” pronoun, to describe our lizard-brained, dumb user doesn’t offend anybody out there reading this.  The ladies will surely be pleased.  It was not my intention to imply that men are any more or less lizard-brained than women.

© Copyright Dwain Camps 2014 All Rights Reserved 

Rate

5 (4)

Share

Share

Rate

5 (4)