Query Performance Problem

  • ScottPletcher (11/19/2015)


    No person outside the company can possibly put it back into the current table. So how would they ever be able to reuse it? It's just not possible. How do they get around that and re-use the card? How??????

    It's simple, really.

    If you do not check a new code against all previously generated, there is a chance the Code Generator will give you one of those codes on an old deactivated card.

    And that old code will be put back into the current table, no need for an intrusion from an outsider.

    Then somebody who has that old expired card could still enter the code on it into the system and get some benefits which were not intended to him/her.

    _____________
    Code for TallyGenerator

  • Sergiy (11/19/2015)


    ScottPletcher (11/19/2015)


    Try to follow this, it's simple enough even for you.

    When it's used up initially, it's removed forever from the current table, since it has no current value.

    No person outside the company can possibly put it back into the current table. So how would they ever be able to reuse it? They simply can't.

    LOL!

    Do you really think Amazon has already generated all the promo codes they are gonna use in any foreseen future?

    And they do not generate new codes anymore?

    😛

    OK, I see what you cannot understand.

    Promo codes are not issued all at once.

    Got it?

    In other words:

    Promo codes are issued in bunches sufficient for each particular promo campaign.

    And each bunch of promo codes is meant to be used for that particular campaign it was issued for.

    When one campaign is over all the relevant promo codes are made inactive.

    And when and another one is about to start they issue a new bunch of promo codes.

    But no business would want customers to use old promo codes in the new campaign.

    Therefore they want to make sure the bunch of newly issued and activated codes contain no codes from old and closed campaigns.

    For that - they need to be able to check if a new code has been used ever before, in DB terms - is it in the list of ever issued codes, active or inactive.

    Got it now?

    When one campaign is over all the relevant promo codes are made inactive.

    Yes, by deleting them from the current promo table, not by just flagging them. They can still check the historical table before issuing new codes. And that is exactly what I originally recommended -- that active and inactive codes be put in different tables. Why on earth are you insanely insistent that only a single table can ever be used for this?? It's vastly less overhead to check a vastly smaller, active-only codes table when people log in. I guarantee the main look up table does not contain a complete history of all codes every issued: that would just be horribly, ridiculously inefficient.

    SQL DBA,SQL Server MVP(07, 08, 09) A socialist is someone who will give you the shirt off *someone else's* back.

  • ScottPletcher (11/19/2015)


    When one campaign is over all the relevant promo codes are made inactive.

    Yes, by deleting them from the current promo table, not by just flagging them. They can still check the historical table before issuing new codes. And that is exactly what I originally recommended -- that active and inactive codes be put in different tables.

    What makes you so sure that running WHERE NOT EXISTS check against 2 tables is so much more efficient than doing it against a single table?

    Why on earth are you insanely insistent that only a single table can ever be used for this?? It's vastly less overhead to check a vastly smaller, active-only codes table when people log in. I guarantee the main look up table does not contain a complete history of all codes every issued: that would just be horribly, ridiculously inefficient.

    Before you use words "insanely", "horribly", "ridiculously" you better provide a solid proof (at least for yourself) that separating inactive codes from active ones provides any kind of performance benefit comparing to having a clustered index on (IsActive, ID).

    Because the test reveals that it only adds overhead of maintaining 2 tables instead of one with no performance benefits, it's you who's gonna look insane and ridiculous.

    _____________
    Code for TallyGenerator

  • Sergiy (11/19/2015)


    ScottPletcher (11/19/2015)


    When one campaign is over all the relevant promo codes are made inactive.

    Yes, by deleting them from the current promo table, not by just flagging them. They can still check the historical table before issuing new codes. And that is exactly what I originally recommended -- that active and inactive codes be put in different tables.

    What makes you so sure that running WHERE NOT EXISTS check against 2 tables is so much more efficient than doing it against a single table?

    Why on earth are you insanely insistent that only a single table can ever be used for this?? It's vastly less overhead to check a vastly smaller, active-only codes table when people log in. I guarantee the main look up table does not contain a complete history of all codes every issued: that would just be horribly, ridiculously inefficient.

    Before you use words "insanely", "horribly", "ridiculously" you better provide a solid proof (at least for yourself) that separating inactive codes from active ones provides any kind of performance benefit comparing to having a clustered index on (IsActive, ID).

    Because the test reveals that it only adds overhead of maintaining 2 tables instead of one with no performance benefits, it's you who's gonna look insane and ridiculous.

    I'm certain they're never running an EXISTS check because they are not reusing codes, since, as I've noted 4 times, there are hundreds of ways to keep generating unique codes if the codes themselves are significant. However, the NOT EXISTS against two tables would still be more efficient since it would have to be done only when new numbers are being generated rather than for every single lookup.

    A clustering key on ( IsActive, ID ) is so awful I can't even comment further on it. Glwt.

    SQL DBA,SQL Server MVP(07, 08, 09) A socialist is someone who will give you the shirt off *someone else's* back.

  • ScottPletcher (11/19/2015)


    I'm certain they're never running an EXISTS check because they are not reusing codes,

    Have a look at the topic starter:

    SET @RedemptionCode = dbo.GenerateRandomBase32(@RedemptionSeed)

    WHILE((@RedemptionCode IS NULL) OR EXISTS(SELECT RedemptionCode FROM dbo.UsedRedemptionCodes

    WHERE RedemptionCode = @RedemptionCode))

    Don't you look ridiculous already?

    Should we proceed with the test?

    _____________
    Code for TallyGenerator

  • Sergiy (11/19/2015)


    ScottPletcher (11/19/2015)


    I'm certain they're never running an EXISTS check because they are not reusing codes,

    Have a look at the topic starter:

    SET @RedemptionCode = dbo.GenerateRandomBase32(@RedemptionSeed)

    WHILE((@RedemptionCode IS NULL) OR EXISTS(SELECT RedemptionCode FROM dbo.UsedRedemptionCodes

    WHERE RedemptionCode = @RedemptionCode))

    Don't you look ridiculous already?

    Should we proceed with the test?

    if the codes are significant you don't have to repeat them, and thus don't have to check. Proceed with whatever you want, I'm done here.

    SQL DBA,SQL Server MVP(07, 08, 09) A socialist is someone who will give you the shirt off *someone else's* back.

  • ScottPletcher (11/19/2015)


    if the codes are significant

    what does that mean?

    In English, please.

    _____________
    Code for TallyGenerator

  • OK,

    Here we go.

    The test as promissed.

    Starting with creating a table:

    CREATE TABLE [dbo].[UsedRedemptionCodesTest](

    [RedemptionCodeID] [int] IDENTITY(1,1) NOT NULL,

    [RedemptionCode] [char](36) NULL,

    [IsActive] [bit] NOT NULL,

    CONSTRAINT [PK_UsedRedemptionCodesTest_RedemptionCodeID] PRIMARY KEY CLUSTERED ([IsActive], [RedemptionCodeID] ASC))

    CREATE UNIQUE NONCLUSTERED INDEX [AK_UsedRedemptionCodesTest_RedemptionCode] ON [dbo].[UsedRedemptionCodesTest] ([RedemptionCode] ASC)

    GO

    Not much of a difference from the OP, only different indexing approach.

    And I do not have the code generator, so I intend to use NEW_ID() instead, so I have increased the length of the RedemptionCode.

    Now populate it:

    DECLARE @N INT

    SET @N = 1E7

    INSERT INTO dbo.UsedRedemptionCodesTest

    ([RedemptionCode], [IsActive])

    SELECT NEWID(), CASE WHEN N < @N * 0.99 THEN 0 ELSE 1 END IsActive

    FROM dbo.TallyGenerator(0, @N, 1, 1)

    ORDER BY N

    10 million +1 records. Not an Amazon scale, but should be enough to get the picture.

    1% of the records (100k rows) are marked as active, the rest - inactive.

    Should be pretty close to a real life scenario.

    Now we run the RedemtionCode generator:

    DECLARE @RedemptionSeed INT

    DECLARE @RedemptionCode CHAR(36)

    SET STATISTICS IO ON

    WHILE((@RedemptionCode IS NULL) OR EXISTS(SELECT RedemptionCode FROM dbo.UsedRedemptionCodesTest

    WHERE RedemptionCode = @RedemptionCode))

    BEGIN

    SET @RedemptionSeed = (SELECT ABS(CHECKSUM(NEWID())))

    SET @RedemptionCode = NEWID() -- dbo.GenerateRandomBase32(@RedemptionSeed)

    END

    INSERT INTO dbo.UsedRedemptionCodesTest (RedemptionCode, [IsActive])

    VALUES (@RedemptionCode, 1)

    SET STATISTICS IO OFF

    GO 10

    Here is what statistics indicate:

    Table 'Worktable'. Scan count 0, logical reads 0, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.

    (1 row(s) affected)

    (1 row(s) affected)

    Table 'UsedRedemptionCodesTest'. Scan count 0, logical reads 4, physical reads 3, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.

    (1 row(s) affected)

    Table 'UsedRedemptionCodesTest'. Scan count 0, logical reads 7, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.

    (1 row(s) affected)

    I'm not posting timing stats, as they're gonna be different on every other machine, but the 10 runs above took less than a second on my laptop.

    Now it's time to retrieve the codes.

    Here is the code:

    SET STATISTICS IO ON

    SELECT TOP 10 RedemptionCode, [RedemptionCodeID]

    INTO #Out

    FROM dbo.UsedRedemptionCodesTest

    WHERE isActive = 1

    ORDER BY [RedemptionCodeID] -- selecting in the order of rows in the clustered index, and in the same time in a random order of codes appearance

    SELECT * FROM #Out

    UPDATE dbo.UsedRedemptionCodesTest

    SET IsActive = 0

    WHERE IsActive = 1

    AND [RedemptionCodeID] <= (SELECT MAX([RedemptionCodeID]) FROM #Out)

    SET STATISTICS IO OFF

    -- we know that the selected ID's are all sequential and they are all first ID's in sequence of still active ones.

    -- only records within a single data page or within 2 consecutive ones are updated,

    -- and their order within clustered index remain the same after update - no rewriting records, no fragmentation.

    DROP TABLE #Out

    GO

    And stats:

    Table 'UsedRedemptionCodesTest'. Scan count 1, logical reads 4, physical reads 4, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.

    (10 row(s) affected)

    (1 row(s) affected)

    (10 row(s) affected)

    Table '#Out________________________________________________________________________________________________________________00000000003D'. Scan count 1, logical reads 2, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.

    (1 row(s) affected)

    Table 'UsedRedemptionCodesTest'. Scan count 1, logical reads 175, physical reads 2, read-ahead reads 23, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.

    Table '#Out________________________________________________________________________________________________________________00000000003D'. Scan count 2, logical reads 4, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.

    If we need a bigger bunches of codes:SELECT TOP 100 RedemptionCode, [RedemptionCodeID]

    the stats will look like this:

    Table 'UsedRedemptionCodesTest'. Scan count 1, logical reads 7, physical reads 5, read-ahead reads 589, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.

    (100 row(s) affected)

    (1 row(s) affected)

    (100 row(s) affected)

    Table '#Out________________________________________________________________________________________________________________00000000003E'. Scan count 1, logical reads 2, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.

    (1 row(s) affected)

    Table 'UsedRedemptionCodesTest'. Scan count 1, logical reads 1538, physical reads 1, read-ahead reads 196, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.

    Table '#Out________________________________________________________________________________________________________________00000000003E'. Scan count 2, logical reads 4, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.

    (100 row(s) affected)

    Again, it does not change a digit on my laptop's clock while completed.

    For the reference:

    SET STATISTICS IO ON

    SELECT COUNT(*) FROM dbo.UsedRedemptionCodesTest

    SET STATISTICS IO OFF

    takes about 30 seconds to complete and shows these stats:

    Table 'UsedRedemptionCodesTest'. Scan count 1, logical reads 62360, physical reads 182, read-ahead reads 62356, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.

    (1 row(s) affected)

    Conclusion:

    Appropriate clustered index exclude historical records with inactive cards from the execution plan.

    Those records are not accessed in routine calls, they do not create any overhead and do not slow down queries at all.

    The test confirms that there is no any reason to partition the data in the table. It won't add anything to the queries performance, but will create an overhead of maintaining 2 tables instead of 1.

    If anybody can show how a separate table for inactive codes or any other form of partitioning provides better performance counters for this task - I'd really appreciate that.

    _____________
    Code for TallyGenerator

  • OK,

    Turns out, Scott does not want to discredit his religious beliefs by publishing results of scientific tests.

    So I decided to wrap the discussion with doing my own test using Active-Inactive tables.

    I've done it in the most efficient way (to my best understanding).

    I appreciate that somebody can offer a better way, so any corrections are welcome.

    Here is my setup:

    CREATE TABLE [dbo].[UsedRedemptionCodes_Active](

    [RedemptionCode] [char](36) NOT NULL

    CONSTRAINT [PK_UsedRedemptionCodes_Active] PRIMARY KEY CLUSTERED ([RedemptionCode])

    )

    GO

    CREATE TABLE [dbo].[UsedRedemptionCodes_Inactive](

    [RedemptionCode] [char](36) NOT NULL,

    CONSTRAINT [PK_UsedRedemptionCodes_Inactive] PRIMARY KEY CLUSTERED ([RedemptionCode])

    )

    GO

    INSERT INTO dbo.UsedRedemptionCodes_Active

    ([RedemptionCode])

    SELECT RedemptionCode

    FROM dbo.UsedRedemptionCodesTest

    WHERE IsActive = 1

    ORDER BY RedemptionCode

    INSERT INTO dbo.UsedRedemptionCodes_Inactive

    ([RedemptionCode])

    SELECT RedemptionCode

    FROM dbo.UsedRedemptionCodesTest

    WHERE IsActive = 0

    ORDER BY RedemptionCode

    I used the records from the same table which I populated in the previous test. So, it's exactly the same recordset.

    Now, let's generate new codes:

    DECLARE @RedemptionSeed INT

    DECLARE @RedemptionCode CHAR(36)

    SET STATISTICS IO ON

    WHILE((@RedemptionCode IS NULL)

    OR EXISTS(SELECT RedemptionCode FROM dbo.UsedRedemptionCodes_Active WHERE RedemptionCode = @RedemptionCode)

    OR EXISTS(SELECT RedemptionCode FROM dbo.UsedRedemptionCodes_Inactive WHERE RedemptionCode = @RedemptionCode)

    )

    BEGIN

    SET @RedemptionSeed = (SELECT ABS(CHECKSUM(NEWID())))

    SET @RedemptionCode = NEWID() -- dbo.GenerateRandomBase32(@RedemptionSeed)

    END

    INSERT INTO dbo.UsedRedemptionCodes_Active (RedemptionCode)

    VALUES (@RedemptionCode)

    SET STATISTICS IO OFF

    GO 10

    Output statistics:

    Beginning execution loop

    Table 'Worktable'. Scan count 0, logical reads 0, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.

    Table 'UsedRedemptionCodes_Inactive'. Scan count 0, logical reads 4, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.

    Table 'UsedRedemptionCodes_Active'. Scan count 0, logical reads 3, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.

    Table 'UsedRedemptionCodes_Active'. Scan count 0, logical reads 3, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.

    ... repeated 8 times ...

    Table 'Worktable'. Scan count 0, logical reads 0, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.

    Table 'UsedRedemptionCodes_Inactive'. Scan count 0, logical reads 4, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.

    Table 'UsedRedemptionCodes_Active'. Scan count 0, logical reads 3, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.

    Table 'UsedRedemptionCodes_Active'. Scan count 0, logical reads 3, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.

    Batch execution completed 10 times.

    Aggregated Numbers are pretty much the same as in the first test.

    Now - giving away the random codes:

    SET STATISTICS PROFILE ON

    SET STATISTICS IO ON

    SELECT TOP 100 RedemptionCode

    INTO #Out

    FROM dbo.UsedRedemptionCodes_Active

    ORDER BY NEWID() -- we must make sure that the codes are given away in the random order

    -- as we cannot allow codes presented in a sequence which would be easy to guess

    SELECT * FROM #Out

    INSERT INTO UsedRedemptionCodes_Inactive (RedemptionCode)

    SELECT RedemptionCode FROM #Out

    DELETE FROM dbo.UsedRedemptionCodes_Active

    WHERE EXISTS (SELECT * FROM #Out

    WHERE #Out.RedemptionCode = dbo.UsedRedemptionCodes_Active.RedemptionCode)

    SET STATISTICS IO OFF

    SET STATISTICS PROFILE OFF

    DROP TABLE #Out

    And the stats:

    Table 'UsedRedemptionCodes_Active'. Scan count 1, logical reads 605, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.

    Table '#Out________________________________________________________________________________________________________________0000000000CA'. Scan count 1, logical reads 2, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.

    Table 'UsedRedemptionCodes_Inactive'. Scan count 0, logical reads 1140, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.

    Table '#Out________________________________________________________________________________________________________________0000000000CA'. Scan count 1, logical reads 2, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.

    Table 'UsedRedemptionCodes_Active'. Scan count 0, logical reads 615, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.

    Table '#Out________________________________________________________________________________________________________________0000000000CA'. Scan count 1, logical reads 2, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.

    Aggregated numbers are on the same level as in the test with a single table, only actually "a little" worse.

    Conclusion:

    Partitioning of a table not only does not improve performance, it actually makes it worse.

    _____________
    Code for TallyGenerator

  • [font="Tahoma"]

    I was intrigued by this debate and felt compelled to register and add my own experience.

    We generated similar codes (circa 100k) which have an expiry date (default from purchase date is 36 months)

    We also have an active flag which once used was set to inactive (as well as the expiration date set). We then used a schema bound view to limit the active "none expired" codes to check against, adding relevant indexes.

    We explored partitioning, datamart and archiving techniques but the most useful for our scenario was to leave the values in the table. Since then we have also tried filtered indexes with a degree of success.

    Whilst I appreciate the methods suggested, there is no; "universal, one solution suits all technique" one can adopt but instead try different techniques until you find a solution that suits you. 😉

    [/font]

Viewing 10 posts - 31 through 39 (of 39 total)

You must be logged in to reply to this topic. Login to reply