Search Optomisation

  • I have inherited a DB, and need to make some changes for a search proc.

    This app and DB are still in dev, and have not yet been published to production, so schema changes are still possible - not easy, but possible.

    I have a table with an integer field that needs to be searched for partial matches. The partial match can be anywhere within the number.

    In the physical world, the number that is visible is a 6-character number, left-padded with zeros (eg: 000123).

    The table will contain anywhere from 300,000 to 5,000,000 records.

    Currently, most instances, will have only 1 CategoryId in the table, but I have one instance that will have 3.

    The way I see it, I have 3 options...

    • OPTION 1 - Keep the existing structure and indexes, and live with the bad performance.
    • OPTION 2 - Create a persisted computed column, and index it. This will at least improve to an index seek.
    • OPTION 3 - Change the data type in the table, and modify the clustered index. This will result in a clustered index seek, and is my chosen solution.

      * Am I missing any obvious improvement options?

      * Am I at least on the right track?

    The new query requirements

    DECLARE @PartialPartNumber VARCHAR(6) = '234';

    SELECT CategoryId, PartNumber

    FROM #PartsList AS pl

    WHERE pl.PartNumber LIKE '%' + @PartialPartNumber + '%';

    Below is a representation of the current structure

    -- This is the original table design

    IF OBJECT_ID('tempdb..#PartsList', N'U') IS NOT NULL

    BEGIN

    DROP TABLE #PartsList;

    END;

    GO

    CREATE TABLE #PartsList

    (

    PartId INT IDENTITY(1,1) NOT NULL

    , CategoryId INT NOT NULL

    , PartNumber INT NOT NULL CHECK (PartNumber > 0 AND PartNumber < 1000000)

    , UTCDateCreated DATETIME2(7) NOT NULL DEFAULT (SYSUTCDATETIME())

    , PRIMARY KEY CLUSTERED (PartId)

    );

    GO

    CREATE UNIQUE NONCLUSTERED INDEX uxParts

    ON #PartsList (CategoryId, PartNumber);

    GO

    Code to load some sample data

    ;WITH E1(N) AS (

    SELECT 1 UNION ALL SELECT 1 UNION ALL SELECT 1 UNION ALL

    SELECT 1 UNION ALL SELECT 1 UNION ALL SELECT 1 UNION ALL

    SELECT 1 UNION ALL SELECT 1 UNION ALL SELECT 1 UNION ALL SELECT 1

    ), --10E+1 or 10 rows

    E2(N) AS (SELECT 1 FROM E1 a, E1 b), --10E+2 or 100 rows

    E4(N) AS (SELECT 1 FROM E2 a, E2 b), --10E+4 or 10,000 rows

    E8(N) AS (SELECT 1 FROM E4 a, E4 b), --10E+8 or 100,000,000 rows max

    cteTally(N) AS (SELECT TOP (999999) ROW_NUMBER() OVER (ORDER BY (SELECT NULL)) FROM E8 )

    , cteCat1 AS (

    SELECT TOP (5000) CategoryId = 1, PartNumber = N

    FROM cteTally

    ORDER BY NEWID()

    )

    , cteCat2 AS (

    SELECT TOP (5000) CategoryId = 2, PartNumber = N

    FROM cteTally

    ORDER BY NEWID()

    )

    , cteCat3 AS (

    SELECT TOP (5000) CategoryId = 3, PartNumber = N

    FROM cteTally

    ORDER BY NEWID()

    )

    INSERT INTO #PartsList ( CategoryId, PartNumber )

    SELECT CategoryId, RIGHT('000000' + CONVERT(VARCHAR(6), PartNumber, 0), 6) FROM cteCat1 UNION ALL

    SELECT CategoryId, RIGHT('000000' + CONVERT(VARCHAR(6), PartNumber, 0), 6) FROM cteCat2 UNION ALL

    SELECT CategoryId, RIGHT('000000' + CONVERT(VARCHAR(6), PartNumber, 0), 6) FROM cteCat3;

    GO

    OPTION 2 - Add a persisted computed column and index

    -- This is a minimal change, adding a persisted computed column and index

    IF OBJECT_ID('tempdb..#PartsList', N'U') IS NOT NULL

    BEGIN

    DROP TABLE #PartsList;

    END;

    GO

    CREATE TABLE #PartsList

    (

    PartId INT IDENTITY(1,1) NOT NULL

    , CategoryId INT NOT NULL

    , PartNumber INT NOT NULL CHECK (PartNumber > 0 AND PartNumber < 1000000)

    , UTCDateCreated DATETIME2(7) NOT NULL DEFAULT (SYSUTCDATETIME())

    , PartNumberSearch AS RIGHT('000000' + CONVERT(VARCHAR(6), PartNumber, 0), 6) PERSISTED

    , PRIMARY KEY CLUSTERED (PartId)

    );

    GO

    CREATE UNIQUE NONCLUSTERED INDEX uxParts

    ON #PartsList (CategoryId, PartNumber);

    GO

    CREATE NONCLUSTERED INDEX ixPartSearch

    ON #PartsList (PartNumberSearch, CategoryId)

    INCLUDE (PartNumber);

    GO

    OPTION 3 - Fix the data type and clustered key

    -- This change is the big one. It requires FK changes on other tables, and related proc changes.

    IF OBJECT_ID('tempdb..#PartsList', N'U') IS NOT NULL

    BEGIN

    DROP TABLE #PartsList;

    END;

    GO

    CREATE TABLE #PartsList

    (

    CategoryId INT NOT NULL

    , PartNumber CHAR(6) NOT NULL CHECK (LEN(LTRIM(PartNumber)) = 6 AND CAST(PartNumber AS INT) > 0 AND CAST(PartNumber AS INT) < 1000000)

    , UTCDateCreated DATETIME2(7) NOT NULL DEFAULT (SYSUTCDATETIME())

    , PRIMARY KEY CLUSTERED (PartNumber, CategoryId)

    );

    GO

  • Interesting dilemma and great work with the sample data/ddl!

    Quick question what are the Partnumber column always going to be 6 characters long or less?

    Also (and I expect the answer to be "No" but it's worth asking because I have a great solution if the answer is "yes").

    Is @PartialPartNumber always going to be the same length? Is it possible that it could always be one of 2 or 3 different lengths?

    Update:

    After playing around with your solutions it appears that Option 3 is pretty brilliant and certainly the way to go!

    "I cant stress enough the importance of switching from a sequential files mindset to set-based thinking. After you make the switch, you can spend your time tuning and optimizing your queries instead of maintaining lengthy, poor-performing code."

    -- Itzik Ben-Gan 2001

  • Alan.B (4/5/2016)


    Interesting dilemma and great work with the sample data/ddl!

    Quick question what are the Partnumber column always going to be 6 characters long or less?

    Also (and I expect the answer to be "No" but it's worth asking because I have a great solution if the answer is "yes").

    Is @PartialPartNumber always going to be the same length? Is it possible that it could always be one of 2 or 3 different lengths?

    The PartNumber is a 6-character numeric field. It is actually a subset of a larger number.

    Real word examples are .... [other data-]123456, or [other data-]009865, or [other data-]000037

    The @PartialPartNumber search could be 1 to 6 characters. Although, at this point, we are considering forcing the user to enter at least 3 digits. This will result in a smaller resultset being returned to the application, and hopefully an easier search for SQL.

  • Alan.B (4/5/2016)


    Interesting dilemma and great work with the sample data/ddl!

    Quick question what are the Partnumber column always going to be 6 characters long or less?

    Also (and I expect the answer to be "No" but it's worth asking because I have a great solution if the answer is "yes").

    Is @PartialPartNumber always going to be the same length? Is it possible that it could always be one of 2 or 3 different lengths?

    Let's take a look at your solution for the case where the length is always 4, 5, or 6 characters, as this is where we are trying to push the user.

    I have also made a slight change to the CHECK parameter in OPTION 3

    IF OBJECT_ID('tempdb..#PartsList', N'U') IS NOT NULL

    BEGIN

    DROP TABLE #PartsList;

    END;

    CREATE TABLE #PartsList

    (

    CategoryId INT NOT NULL

    , PartNumber CHAR(6) NOT NULL CHECK ( (LEN(LTRIM(PartNumber)) = 6) AND (PartNumber NOT LIKE '%[^0-9]%') )

    , UTCDateCreated DATETIME2(7) NOT NULL DEFAULT (SYSUTCDATETIME())

    , PRIMARY KEY CLUSTERED (PartNumber, CategoryId)

    );

  • Another possibility is to leave the main table alone, and create a separate table to hold the partial partids to match on. Use a trigger on the main table to load the partial table. I did trigger code only for the 3-digit part of it, but it's easy enough to add 4 and 5 digit values if you want to (or even to use the 3-digit code to relatively quickly get to the 4 and 5 digit matches(?)). I switched both tables from #names to "real" tables names since temp objects cannot have a trigger.

    Note the very nice query plan for the last query: a merge join on the main table.

    USE tempdb;

    CREATE TABLE tempdb.dbo.PartsListPartials (

    PartIdPartialLength tinyint NOT NULL,

    PartIdPartial smallint NOT NULL,

    PartIdPartialStart tinyint NOT NULL,

    PartId int NOT NULL

    )

    CREATE CLUSTERED INDEX PartsListPartials__CL ON tempdb.dbo.PartsListPartials ( PartIdPartialLength, PartIdPartial ) WITH ( FILLFACTOR = 85 );

    GO

    CREATE TRIGGER PartsList__TR_Insert_Partial

    ON tempdb.dbo.PartsList

    AFTER INSERT

    AS

    SET NOCOUNT ON;

    INSERT INTO tempdb.dbo.PartsListPartials ( PartId, PartIdPartial, PartIdPartialLength, PartIdPartialStart )

    SELECT i.PartId, ca1.PartIdPartial, ca1.PartIdPartialLength, ca1.PartIdPartialStart

    FROM inserted i

    CROSS APPLY (

    SELECT PartId / 1000 AS PartIdPartial, 3 AS PartIdPartialLength, 1 AS PartIdPartialStart

    WHERE PartId / 1000 > 0

    UNION ALL

    SELECT PartId % 100000 / 100, 3, 2

    WHERE PartId % 100000 / 100 > 0

    UNION ALL

    SELECT PartId % 10000 / 10, 3, 3

    WHERE PartId % 10000 / 10 > 0

    UNION ALL

    SELECT PartId % 1000, 3, 4

    WHERE PartId % 1000 > 0

    ) AS ca1

    GO

    DECLARE @PartIdPartial smallint

    SET @PartIdPartial = 123

    SELECT pl.*

    FROM tempdb.dbo.PartsList pl

    INNER JOIN (

    SELECT DISTINCT PartId

    FROM PartsListPartials

    WHERE PartIdPartialLength = LEN(@PartIdPartial) AND

    PartIdPartial = @PartIdPartial

    ) AS plp ON plp.PartId = pl.PartId

    SQL DBA,SQL Server MVP(07, 08, 09) A socialist is someone who will give you the shirt off *someone else's* back.

  • DesNorton (4/5/2016)


    I have inherited a DB, and need to make some changes for a search proc.

    This app and DB are still in dev, and have not yet been published to production, so schema changes are still possible - not easy, but possible.

    I have a table with an integer field that needs to be searched for partial matches. The partial match can be anywhere within the number.

    Hold on a sec.

    What do you mean "anywhere within the number"?

    If you search for a number "23" you'll find "000023", "230000", "000123", "052321", "123456", etc.

    Is it what do you really want?

    _____________
    Code for TallyGenerator

  • DesNorton (4/5/2016)


    Alan.B (4/5/2016)


    Interesting dilemma and great work with the sample data/ddl!

    Quick question what are the Partnumber column always going to be 6 characters long or less?

    Also (and I expect the answer to be "No" but it's worth asking because I have a great solution if the answer is "yes").

    Is @PartialPartNumber always going to be the same length? Is it possible that it could always be one of 2 or 3 different lengths?

    Let's take a look at your solution for the case where the length is always 4, 5, or 6 characters, as this is where we are trying to push the user.

    I have also made a slight change to the CHECK parameter in OPTION 3

    IF OBJECT_ID('tempdb..#PartsList', N'U') IS NOT NULL

    BEGIN

    DROP TABLE #PartsList;

    END;

    CREATE TABLE #PartsList

    (

    CategoryId INT NOT NULL

    , PartNumber CHAR(6) NOT NULL CHECK ( (LEN(LTRIM(PartNumber)) = 6) AND (PartNumber NOT LIKE '%[^0-9]%') )

    , UTCDateCreated DATETIME2(7) NOT NULL DEFAULT (SYSUTCDATETIME())

    , PRIMARY KEY CLUSTERED (PartNumber, CategoryId)

    );

    Ok, never mind the solution I was talking about - I can't come up with something better than your third option. I can show you how to make it a little better however. I'm still blown away at how Option 3 gets an index seek considering how you have wild cards in the front of the WHERE clause.

    Anyhow, here's how you can make your solution a little better. Below is 1,050,000 rows of sample data. I made some slight changes including removing the extra zeros logic.

    SET NOCOUNT ON;

    USE tempdb

    GO

    -- This change is the big one. It requires FK changes on other tables, and related proc changes.

    IF OBJECT_ID('tempdb.dbo.PartsList', N'U') IS NOT NULL DROP TABLE dbo.PartsList;

    CREATE TABLE dbo.PartsList

    (

    CategoryId INT NOT NULL

    , PartNumber CHAR(6) NOT NULL

    , UTCDateCreated DATETIME2(7) NOT NULL DEFAULT (SYSUTCDATETIME())

    , CONSTRAINT pk_pl PRIMARY KEY CLUSTERED (PartNumber, CategoryId)

    , CONSTRAINT ck_pl1 CHECK (PartNumber > 0 AND PartNumber < 1000000)

    );

    GO

    ;WITH

    E1(N) AS

    (

    SELECT 1 UNION ALL SELECT 1 UNION ALL SELECT 1 UNION ALL

    SELECT 1 UNION ALL SELECT 1 UNION ALL SELECT 1 UNION ALL

    SELECT 1 UNION ALL SELECT 1 UNION ALL SELECT 1 UNION ALL SELECT 1

    ),

    cteTally(N) AS

    (

    SELECT TOP (999999) ROW_NUMBER() OVER (ORDER BY (SELECT NULL))

    FROM E1 a, E1 b, E1 c, E1 d, E1 e, E1 f -- 1,000,0000 Rows

    ),

    cteCat1 AS (SELECT TOP(350000) CategoryId=1, PartNumber=N FROM cteTally ORDER BY NEWID()),

    cteCat2 AS (SELECT TOP(350000) CategoryId=2, PartNumber=N FROM cteTally ORDER BY NEWID()),

    cteCat3 AS (SELECT TOP(350000) CategoryId=3, PartNumber=N FROM cteTally ORDER BY NEWID())

    INSERT INTO dbo.PartsList (CategoryId, PartNumber)

    SELECT CategoryId, PartNumber FROM cteCat1 UNION ALL

    SELECT CategoryId, PartNumber FROM cteCat2 UNION ALL

    SELECT CategoryId, PartNumber FROM cteCat3;

    GO

    Now let's compare your query when the optimizer uses a clustered index to when a UNIQUE nonclustered index is available (if you run this with "include actual execution plan" on you'll see that the optimizer picks the UNIQUE index when available):

    SET STATISTICS IO ON;

    SET STATISTICS TIME ON;

    PRINT 'Clustered Index:'+CHAR(10)+REPLICATE('-',50);

    DECLARE @PartialPartNumber VARCHAR(6) = '234';

    SELECT CategoryId, PartNumber

    FROM dbo.PartsList AS pl

    WHERE pl.PartNumber LIKE '%' + @PartialPartNumber + '%';

    SET STATISTICS TIME OFF;

    SET STATISTICS IO OFF;

    GO

    CREATE UNIQUE NONCLUSTERED INDEX uq_vPL ON dbo.PartsList(PartNumber, CategoryId);

    GO

    SET STATISTICS IO ON;

    SET STATISTICS TIME ON;

    PRINT CHAR(10)+'Unique Index:'+CHAR(10)+REPLICATE('-',50);

    DECLARE @PartialPartNumber VARCHAR(6) = '234';

    SELECT CategoryId, PartNumber

    FROM dbo.PartsList AS pl

    WHERE pl.PartNumber LIKE '%' + @PartialPartNumber + '%';

    SET STATISTICS TIME OFF;

    SET STATISTICS IO OFF;

    Results:

    Clustered Index:

    --------------------------------------------------

    Table 'PartsList'. Scan count 1, logical reads 3532, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.

    SQL Server Execution Times:

    CPU time = 235 ms, elapsed time = 245 ms.

    Unique Index:

    --------------------------------------------------

    Table 'PartsList'. Scan count 1, logical reads 2083, physical reads 0, read-ahead reads 2, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.

    SQL Server Execution Times:

    CPU time = 125 ms, elapsed time = 150 ms.

    Notice that a UNIQUE index reduces the number of reads by 40%+ and the CPU by almost 1/2. That's the best I can offer.

    "I cant stress enough the importance of switching from a sequential files mindset to set-based thinking. After you make the switch, you can spend your time tuning and optimizing your queries instead of maintaining lengthy, poor-performing code."

    -- Itzik Ben-Gan 2001

  • Sergiy (4/5/2016)


    DesNorton (4/5/2016)


    I have inherited a DB, and need to make some changes for a search proc.

    This app and DB are still in dev, and have not yet been published to production, so schema changes are still possible - not easy, but possible.

    I have a table with an integer field that needs to be searched for partial matches. The partial match can be anywhere within the number.

    Hold on a sec.

    What do you mean "anywhere within the number"?

    If you search for a number "23" you'll find "000023", "230000", "000123", "052321", "123456", etc.

    Is it what do you really want?

    Hi Sergiy

    YES ... I do mean "anywhere within the number".

    If I search for "23" I need to get "000023", "230000", "000123", "052321", "123456", etc.

    This is a requirement from the Business Analysts. The dev team are pushing back, and we are trying to ensure that we get at least 3 digits (trying hard for 4) from the user.

    So: search for "234" to get back "000234", "234000", "052341", "123456", etc.

    Or: search for "2345" to get back "002345", "234500", "023457", "123456", etc.

  • Scott and Alan

    I have just walked into a #$@%storm that needs my full attention. I will take a closer look at your suggestions as soon as I can.

  • Alan.B (4/5/2016)


    Now let's compare your query when the optimizer uses a clustered index to when a UNIQUE nonclustered index is available (if you run this with "include actual execution plan" on you'll see that the optimizer picks the UNIQUE index when available):

    SET STATISTICS IO ON;

    SET STATISTICS TIME ON;

    PRINT 'Clustered Index:'+CHAR(10)+REPLICATE('-',50);

    DECLARE @PartialPartNumber VARCHAR(6) = '234';

    SELECT CategoryId, PartNumber

    FROM dbo.PartsList AS pl

    WHERE pl.PartNumber LIKE '%' + @PartialPartNumber + '%';

    SET STATISTICS TIME OFF;

    SET STATISTICS IO OFF;

    GO

    CREATE UNIQUE NONCLUSTERED INDEX uq_vPL ON dbo.PartsList(PartNumber, CategoryId);

    GO

    SET STATISTICS IO ON;

    SET STATISTICS TIME ON;

    PRINT CHAR(10)+'Unique Index:'+CHAR(10)+REPLICATE('-',50);

    DECLARE @PartialPartNumber VARCHAR(6) = '234';

    SELECT CategoryId, PartNumber

    FROM dbo.PartsList AS pl

    WHERE pl.PartNumber LIKE '%' + @PartialPartNumber + '%';

    SET STATISTICS TIME OFF;

    SET STATISTICS IO OFF;

    Results:

    Clustered Index:

    --------------------------------------------------

    Table 'PartsList'. Scan count 1, logical reads 3532, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.

    SQL Server Execution Times:

    CPU time = 235 ms, elapsed time = 245 ms.

    Unique Index:

    --------------------------------------------------

    Table 'PartsList'. Scan count 1, logical reads 2083, physical reads 0, read-ahead reads 2, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.

    SQL Server Execution Times:

    CPU time = 125 ms, elapsed time = 150 ms.

    Notice that a UNIQUE index reduces the number of reads by 40%+ and the CPU by almost 1/2. That's the best I can offer.

    Hi Alan. I fear that your numbers are using cached data. On my system, I ran your sample, and had slightly different results.

    Results of your code: The I/O is well reduced, but the CPU usage is only marginally reduced

    Clustered Index:

    --------------------------------------------------

    Table 'PartsList'. Scan count 1, logical reads 3532, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.

    SQL Server Execution Times:

    CPU time = 266 ms, elapsed time = 262 ms.

    Unique Index:

    --------------------------------------------------

    Table 'PartsList'. Scan count 1, logical reads 2088, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.

    SQL Server Execution Times:

    CPU time = 234 ms, elapsed time = 238 ms.

    I then added sql to ensure that both statements run with clear cache.

    -----------------------------------

    DBCC DROPCLEANBUFFERS;

    DBCC FREEPROCCACHE;

    -----------------------------------

    SET STATISTICS IO ON;

    SET STATISTICS TIME ON;

    PRINT 'Clustered Index:'+CHAR(10)+REPLICATE('-',50);

    DECLARE @PartialPartNumber VARCHAR(6) = '234';

    SELECT CategoryId, PartNumber

    FROM dbo.PartsList AS pl

    WHERE pl.PartNumber LIKE '%' + @PartialPartNumber + '%';

    SET STATISTICS TIME OFF;

    SET STATISTICS IO OFF;

    GO

    CREATE UNIQUE NONCLUSTERED INDEX uq_vPL ON dbo.PartsList(PartNumber, CategoryId);

    GO

    -----------------------------------

    DBCC DROPCLEANBUFFERS;

    DBCC FREEPROCCACHE;

    -----------------------------------

    SET STATISTICS IO ON;

    SET STATISTICS TIME ON;

    PRINT CHAR(10)+'Unique Index:'+CHAR(10)+REPLICATE('-',50);

    DECLARE @PartialPartNumber VARCHAR(6) = '234';

    SELECT CategoryId, PartNumber

    FROM dbo.PartsList AS pl

    WHERE pl.PartNumber LIKE '%' + @PartialPartNumber + '%';

    SET STATISTICS TIME OFF;

    SET STATISTICS IO OFF;

    Results of code with cleared cache: The I/O is still well reduced, but the CPU usage is marginally increased

    Clustered Index:

    --------------------------------------------------

    Table 'PartsList'. Scan count 1, logical reads 3532, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.

    SQL Server Execution Times:

    CPU time = 297 ms, elapsed time = 291 ms.

    Unique Index:

    --------------------------------------------------

    Table 'PartsList'. Scan count 1, logical reads 2088, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.

    SQL Server Execution Times:

    CPU time = 313 ms, elapsed time = 309 ms.

    I get the same results, whether the second index is unique or not.

    It seems that SQL uses the second index, as it has everything that it needs.

    If I add UTCDateCreated to the SELECT list, then SQL reverts back to the PK. So, due to fact that my real table and select statements have additional columns (omitted here for berevity), I will exclude the additional index.

  • ScottPletcher (4/5/2016)


    Another possibility is to leave the main table alone, and create a separate table to hold the partial partids to match on. Use a trigger on the main table to load the partial table. I did trigger code only for the 3-digit part of it, but it's easy enough to add 4 and 5 digit values if you want to (or even to use the 3-digit code to relatively quickly get to the 4 and 5 digit matches(?)). I switched both tables from #names to "real" tables names since temp objects cannot have a trigger.

    Note the very nice query plan for the last query: a merge join on the main table.

    USE tempdb;

    CREATE TABLE tempdb.dbo.PartsListPartials (

    PartIdPartialLength tinyint NOT NULL,

    PartIdPartial smallint NOT NULL,

    PartIdPartialStart tinyint NOT NULL,

    PartId int NOT NULL

    )

    CREATE CLUSTERED INDEX PartsListPartials__CL ON tempdb.dbo.PartsListPartials ( PartIdPartialLength, PartIdPartial ) WITH ( FILLFACTOR = 85 );

    GO

    CREATE TRIGGER PartsList__TR_Insert_Partial

    ON tempdb.dbo.PartsList

    AFTER INSERT

    AS

    SET NOCOUNT ON;

    INSERT INTO tempdb.dbo.PartsListPartials ( PartId, PartIdPartial, PartIdPartialLength, PartIdPartialStart )

    SELECT i.PartId, ca1.PartIdPartial, ca1.PartIdPartialLength, ca1.PartIdPartialStart

    FROM inserted i

    CROSS APPLY (

    SELECT PartId / 1000 AS PartIdPartial, 3 AS PartIdPartialLength, 1 AS PartIdPartialStart

    WHERE PartId / 1000 > 0

    UNION ALL

    SELECT PartId % 100000 / 100, 3, 2

    WHERE PartId % 100000 / 100 > 0

    UNION ALL

    SELECT PartId % 10000 / 10, 3, 3

    WHERE PartId % 10000 / 10 > 0

    UNION ALL

    SELECT PartId % 1000, 3, 4

    WHERE PartId % 1000 > 0

    ) AS ca1

    GO

    DECLARE @PartIdPartial smallint

    SET @PartIdPartial = 123

    SELECT pl.*

    FROM tempdb.dbo.PartsList pl

    INNER JOIN (

    SELECT DISTINCT PartId

    FROM PartsListPartials

    WHERE PartIdPartialLength = LEN(@PartIdPartial) AND

    PartIdPartial = @PartIdPartial

    ) AS plp ON plp.PartId = pl.PartId

    Hi Scott

    This is certainly fast, with similar I/O to the other solutions, but very low CPU numbers. However, I am unable to wrap my head around the math in the trigger. Could you please explain the logic behind the math.

  • DesNorton (4/6/2016)

    Hi Scott

    This is certainly fast, with similar I/O to the other solutions, but very low CPU numbers. However, I am unable to wrap my head around the math in the trigger. Could you please explain the logic behind the math.

    The math is just an efficient (I hope) way to extract all non-zero 3-digit parts of the number. Let's walk thru it for value 123456:

    123456 / 1000 = 123 (for integer arithmetic, SQL drops all fractions).

    123456 % 100000 = 23456 (strip off first digit) / 100 = 234 (again, fraction is dropped)

    123456 % 10000 = 3456 (strip off first two digits) / 10 = 345

    123456 % 1000 = 456

    So the math just returns each non-zero three digit number:

    123, 234, 345 and 456.

    SQL DBA,SQL Server MVP(07, 08, 09) A socialist is someone who will give you the shirt off *someone else's* back.

  • ScottPletcher (4/6/2016)


    DesNorton (4/6/2016)

    Hi Scott

    This is certainly fast, with similar I/O to the other solutions, but very low CPU numbers. However, I am unable to wrap my head around the math in the trigger. Could you please explain the logic behind the math.

    The math is just an efficient (I hope) way to extract all non-zero 3-digit parts of the number. Let's walk thru it for value 123456:

    123456 / 1000 = 123 (for integer arithmetic, SQL drops all fractions).

    123456 % 100000 = 23456 (strip off first digit) / 100 = 234 (again, fraction is dropped)

    123456 % 10000 = 3456 (strip off first two digits) / 10 = 345

    123456 % 1000 = 456

    So the math just returns each non-zero three digit number:

    123, 234, 345 and 456.

    Thank you. Now it seems so simple. I'll apply my mind to it when I'm back at work tomorrow.

  • DesNorton (4/6/2016)


    ScottPletcher (4/6/2016)


    DesNorton (4/6/2016)

    Hi Scott

    This is certainly fast, with similar I/O to the other solutions, but very low CPU numbers. However, I am unable to wrap my head around the math in the trigger. Could you please explain the logic behind the math.

    The math is just an efficient (I hope) way to extract all non-zero 3-digit parts of the number. Let's walk thru it for value 123456:

    123456 / 1000 = 123 (for integer arithmetic, SQL drops all fractions).

    123456 % 100000 = 23456 (strip off first digit) / 100 = 234 (again, fraction is dropped)

    123456 % 10000 = 3456 (strip off first two digits) / 10 = 345

    123456 % 1000 = 456

    So the math just returns each non-zero three digit number:

    123, 234, 345 and 456.

    Thank you. Now it seems so simple. I'll apply my mind to it when I'm back at work tomorrow.

    I am thinking of taking this method to the next level. Rather than a lookup table, I want to try and CROSS APPLY this logic to a table. So I have come up with something like this. It returns data, but also raises a Divide by zero error.

    DECLARE @BaseNumber INT = 123456;

    DECLARE @PartNum INT = 23;

    ;WITH E1(N) AS (

    SELECT 1 UNION ALL SELECT 1 UNION ALL SELECT 1 UNION ALL

    SELECT 1 UNION ALL SELECT 1 UNION ALL SELECT 1

    )

    , cteTally(N) AS (SELECT TOP (LEN(@BaseNumber)) ROW_NUMBER() OVER (ORDER BY (SELECT NULL)) FROM E1 )

    , cteStart(N) AS (SELECT N FROM cteTally WHERE N <= (LEN(@BaseNumber) - LEN(@PartNum) +1) )

    , cteParts AS (

    SELECT BaseNumber = @BaseNumber

    , RequiredPart = @BaseNumber % POWER(10, LEN(@BaseNumber) -s.N +1) / POWER(10, LEN(@BaseNumber) -LEN(@PartNum) -s.N +1 )

    FROM cteStart AS s

    )

    -- This could probably all be moved into the cteParts, but need to get it working first.

    SELECT p.*

    FROM cteParts AS p

    WHERE @PartNum = p.RequiredPart;

  • OK. So I have tweaked Scott's math and massaged it into a CROSS APPLY against the table to search for any partial PartNumber, regardless of length.

    The performance is a bit slower, but somehow the other developers are able to understand it a bit better than the separate table.

    Create the table and data:

    -- Drop and recreate the table

    IF OBJECT_ID('tempdb..#PartsList', N'U') IS NOT NULL

    BEGIN

    DROP TABLE #PartsList;

    END;

    GO

    CREATE TABLE #PartsList

    (

    PartId INT IDENTITY(1,1) NOT NULL

    , CategoryId INT NOT NULL

    , PartNumber INT NOT NULL CHECK (PartNumber > 0 AND PartNumber < 1000000)

    , UTCDateCreated DATETIME2(7) NOT NULL DEFAULT (SYSUTCDATETIME())

    --, PRIMARY KEY CLUSTERED (PartId)

    , PRIMARY KEY CLUSTERED (CategoryId, PartNumber)

    );

    --CREATE UNIQUE NONCLUSTERED INDEX uxParts

    --ON #PartsList (CategoryId, PartNumber);

    CREATE UNIQUE NONCLUSTERED INDEX uxPartId

    ON #PartsList (PartId); -- Needed for FK

    GO

    -- Populate the table

    ;WITH

    E1(N) AS (

    SELECT 1 UNION ALL SELECT 1 UNION ALL SELECT 1 UNION ALL

    SELECT 1 UNION ALL SELECT 1 UNION ALL SELECT 1 UNION ALL

    SELECT 1 UNION ALL SELECT 1 UNION ALL SELECT 1 UNION ALL SELECT 1

    ),

    cteTally(N) AS (

    SELECT TOP (999999) ROW_NUMBER() OVER (ORDER BY (SELECT NULL))

    FROM E1 a, E1 b, E1 c, E1 d, E1 e, E1 f -- 1,000,0000 Rows

    ),

    cteCat1 AS (SELECT TOP(350000) CategoryId=1, PartNumber=N FROM cteTally ORDER BY NEWID()),

    cteCat2 AS (SELECT TOP(350000) CategoryId=2, PartNumber=N FROM cteTally ORDER BY NEWID()),

    cteCat3 AS (SELECT TOP(350000) CategoryId=3, PartNumber=N FROM cteTally ORDER BY NEWID())

    INSERT INTO #PartsList (CategoryId, PartNumber)

    SELECT CategoryId, PartNumber FROM cteCat1 UNION ALL

    SELECT CategoryId, PartNumber FROM cteCat2 UNION ALL

    SELECT CategoryId, PartNumber FROM cteCat3;

    GO

    Query the data:

    SET NOCOUNT ON;

    SET STATISTICS TIME, IO ON;

    DECLARE @PartialPartNumber INT = 123;

    DECLARE @CategoryId INT = 2;

    SELECT pl.PartId, pl.CategoryId, pl.PartNumber --, s.*

    FROM #PartsList AS pl

    CROSS APPLY ( -- TOP(LEN(pl.PartNumber)) and WHERE n.N <= (LEN(pl.PartNumber) - LEN(@PartialPartNumber) +1)

    -- required to prevent DivideByZero errors

    SELECT TOP (LEN(pl.PartNumber))

    MatchThis = pl.PartNumber % POWER(10, LEN(pl.PartNumber) -n.N +1) / POWER(10, LEN(pl.PartNumber) -LEN(@PartialPartNumber) -n.N +1 )

    --, MatchThisLength = LEN(@PartialPartNumber)

    --, MatchThisStart = n.N

    FROM (

    SELECT 1 UNION ALL SELECT 2 UNION ALL SELECT 3 UNION ALL

    SELECT 4 UNION ALL SELECT 5 UNION ALL SELECT 6 UNION ALL

    SELECT 7 UNION ALL SELECT 8 UNION ALL SELECT 9

    ) AS n(N)

    WHERE n.N <= (LEN(pl.PartNumber) - LEN(@PartialPartNumber) +1)

    AND pl.PartNumber % POWER(10, LEN(pl.PartNumber) -n.N +1) / POWER(10, LEN(pl.PartNumber) -LEN(@PartialPartNumber) -n.N +1 ) = @PartialPartNumber

    ) AS s

    WHERE pl.CategoryId = @CategoryId;

    SET STATISTICS TIME, IO OFF;

    The Execution Plan shows parallelism as well, which is good.

Viewing 15 posts - 1 through 15 (of 16 total)

You must be logged in to reply to this topic. Login to reply