Query Performance Problem

Question

Query Performance Problem

lraju10

Right there with Babe

Points: 725
More actions
November 13, 2015 at 6:29 pm

#309573

Problem : currently working on purchase sproc where it is taking 2 sec for the Purchase transaction to complete, i identified the problem statement, where i generate a six digit coupon code and make sure every time the support code is Unique
DECLARE @RedemptionSeed INT
DECLARE @RedemptionCode VARCHAR(10)
WHILE @RedemptionSeed IS NULL
BEGIN
SET @RedemptionSeed = (SELECT ABS(CHECKSUM(NEWID())))
END
SET @RedemptionCode = dbo.GenerateRandomBase32(@RedemptionSeed)
WHILE((@RedemptionCode IS NULL) OR EXISTS(SELECT RedemptionCode FROM dbo.UsedRedemptionCodes
WHERE RedemptionCode = @RedemptionCode))
BEGIN
SET @RedemptionSeed = (SELECT ABS(CHECKSUM(NEWID())))
SET @RedemptionCode = dbo.GenerateRandomBase32(@RedemptionSeed)
END
INSERT INTO dbo.UsedRedemptionCodes (RedemptionCode)
VALUES (@RedemptionCode)
In the above code EXISTS clause takes more than 1.2 sec - which is doing a check against the UsedRedemptionCodes table - where i insert each generated coupon into the table, currently UsedRedemptionCodes table has 2 million rows
Schema of the UsedRedemptionCodes table -
CREATE TABLE [dbo].[UsedRedemptionCodes](
[RedemptionCodeID] [int] IDENTITY(1,1) NOT NULL,
[RedemptionCode] [char](10) NULL,
[IsActive] [bit] NOT NULL,
CONSTRAINT [PK_UsedRedemptionCodes_RedemptionCodeID] PRIMARY KEY CLUSTERED ([RedemptionCodeID] ASC))
Created a Unique Non Clustered Index on RedemptionCode
CREATE UNIQUE NONCLUSTERED INDEX [AK_UsedRedemptionCodes_RedemptionCode] ON [dbo].[UsedRedemptionCodes]
([RedemptionCode] ASC) WHERE ([RedemptionCode] IS NOT NULL)
GO
If i just run the statement the select statement with hard coded values, it return in less than 10 milli second
SELECT RedemptionCode FROM dbo.UsedRedemptionCodes WHERE RedemptionCode = 'A1B2C3'
Any help would be greatly appreciated ..

Viewing 15 posts - 1 through 15 (of 39 total)

You must be logged in to reply to this topic. Login to reply

Jeff Moden SSC Guru Points: 1003872 More actions · Answer 1

lraju10 (11/13/2015)
Problem : currently working on purchase sproc where it is taking 2 sec for the Purchase transaction to complete, i identified the problem statement, where i generate a six digit coupon code and make sure every time the support code is Unique
DECLARE @RedemptionSeed INT
DECLARE @RedemptionCode VARCHAR(10)
WHILE @RedemptionSeed IS NULL
BEGIN
SET @RedemptionSeed = (SELECT ABS(CHECKSUM(NEWID())))
END
SET @RedemptionCode = dbo.GenerateRandomBase32(@RedemptionSeed)
WHILE((@RedemptionCode IS NULL) OR EXISTS(SELECT RedemptionCode FROM dbo.UsedRedemptionCodes
WHERE RedemptionCode = @RedemptionCode))
BEGIN
SET @RedemptionSeed = (SELECT ABS(CHECKSUM(NEWID())))
SET @RedemptionCode = dbo.GenerateRandomBase32(@RedemptionSeed)
END
INSERT INTO dbo.UsedRedemptionCodes (RedemptionCode)
VALUES (@RedemptionCode)
In the above code EXISTS clause takes more than 1.2 sec - which is doing a check against the UsedRedemptionCodes table - where i insert each generated coupon into the table, currently UsedRedemptionCodes table has 2 million rows
Schema of the UsedRedemptionCodes table -
CREATE TABLE [dbo].[UsedRedemptionCodes](
[RedemptionCodeID] [int] IDENTITY(1,1) NOT NULL,
[RedemptionCode] [char](10) NULL,
[IsActive] [bit] NOT NULL,
CONSTRAINT [PK_UsedRedemptionCodes_RedemptionCodeID] PRIMARY KEY CLUSTERED ([RedemptionCodeID] ASC))
Created a Unique Non Clustered Index on RedemptionCode
CREATE UNIQUE NONCLUSTERED INDEX [AK_UsedRedemptionCodes_RedemptionCode] ON [dbo].[UsedRedemptionCodes]
([RedemptionCode] ASC) WHERE ([RedemptionCode] IS NOT NULL)
GO
If i just run the statement the select statement with hard coded values, it return in less than 10 milli second
SELECT RedemptionCode FROM dbo.UsedRedemptionCodes WHERE RedemptionCode = 'A1B2C3'
Any help would be greatly appreciated ..

The following snippet in the code above might get you and your company sued into oblivion. Please post the code so we can make sure that doesn't happen...

dbo.GenerateRandomBase32(@RedemptionSeed)

--Jeff Moden

RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
First step towards the paradigm shift of writing Set Based code:
________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

Change is inevitable... Change for the better is not.

Helpful Links:
How to post code problems
How to Post Performance Problems
Create a Tally Function (fnTally)

Gail Shaw SSC Guru Points: 1004504 More actions · Answer 2

To be honest, the performance if probably the smallest problem there. That code has race conditions in it. I suspect it's throwing the occasional unique constraint violation (and if not, it will under heavier load)

If all you care about is performance though, we'll need the execution plan.

Gail Shaw
Microsoft Certified Master: SQL Server, MVP, M.Sc (Comp Sci)
SQL In The Wild: Discussions on DB performance with occasional diversions into recoverability

We walk in the dark places no others will enter
We stand on the bridge and no one may pass

Sergiy SSC Guru Points: 110206 More actions · Answer 3

You are using different data types for RedemptionCode and @RedemptionCode.

When you check for existence you compare values of different data types, which forces implicit conversion.

When an operator combines two expressions of different data types, the rules for data type precedence specify that the data type with the lower precedence is converted to the data type with the higher precedence.

https://msdn.microsoft.com/en-us/library/ms190309.aspx

In your case it means the CHAR value(s) have to be converted to VARCHAR.

As CHAR type is used in the table column, all the values in it must be implicitly converted to VARCHAR - which means table scan, no use of index.

Synchronise the data types and your performance issue will go away.

(you may use this suggestion in all other scripts as well, as pass it to all your colleagues)

Apart from that - the 1st WHILE loop is not needed at all.

The 2nd loop has it's condition included, and it does exactly the same thing.

Excluding unnecessary steps also helps to improve performance:

DECLARE @RedemptionSeed INT

DECLARE @RedemptionCode CHAR(10)

--WHILE @RedemptionSeed IS NULL

--BEGIN

--SET @RedemptionSeed = (SELECT ABS(CHECKSUM(NEWID())))

--END

--SET @RedemptionCode = dbo.GenerateRandomBase32(@RedemptionSeed)

WHILE((@RedemptionCode IS NULL) OR EXISTS(SELECT RedemptionCode FROM dbo.UsedRedemptionCodes

WHERE RedemptionCode = @RedemptionCode))

BEGIN

SET @RedemptionSeed = (SELECT ABS(CHECKSUM(NEWID())))

SET @RedemptionCode = dbo.GenerateRandomBase32(@RedemptionSeed)

END

INSERT INTO dbo.UsedRedemptionCodes (RedemptionCode)

VALUES (@RedemptionCode)

_____________
Code for TallyGenerator

ScottPletcher SSC Guru Points: 100949 More actions · Answer 4

I don't think SQL will have to use an implicit conversion for varchar to char, since it pads spaces when doing character comparisons anyway. Not 100% sure, so I'd be interested in seeing a plan where SQL did do such a conversion.

On a more critical note, you should just get rid of the dopey identity column (in this specific case) and cluster the table on its actual data key, the RedemptionCode. If RedemptionCode can contain a NULL value (??), you can leave the identity column as a PK if you really want to, but still cluster the table on the RedemptionCode.

CREATE TABLE [dbo].[UsedRedemptionCodes] (

[RedemptionCode] [char](10) NULL,

[IsActive] [bit] NOT NULL

)

--if no NULL present.

ALTER TABLE [dbo].[UsedRedemptionCodes] ADD CONSTRAINT [PK_UsedRedemptionCodes_RedemptionCodeID] PRIMARY KEY ( RedemptionCode ) WITH ( FILLFACTOR = 99 ) ;

--if (one) NULL present(??why??).

CREATE UNIQUE CLUSTERED INDEX [CL_UsedRedemptionCodes_RedemptionCodeID] ON [dbo].[UsedRedemptionCodes] ( RedemptionCode ) WITH ( FILLFACTOR = 99 ) ;

Edit: I'm assuming RedemptionCode never (or almost never) changes. You'd add new codes, but you would not change existing ones.

SQL DBA,SQL Server MVP(07, 08, 09) A socialist is someone who will give you the shirt off *someone else's* back.

lraju10 Right there with Babe Points: 725 More actions · Answer 5

i really appreciate your guys time and effort, i implemented a hybrid solution based on the Scott and Sergy ideas and it worked execution times have dropped from 1.2 Sec to 400 Milli Second, where i removed the Identity column and assigned RedemptionCode as a Primary Clustered Index making it not null and Sergy ideas of avoiding the Implicit Conversion . Thanks for everyone who responded to this thread. due to company restrictions i could post the execution plan in this thread

Sergiy SSC Guru Points: 110206 More actions · Answer 6

ScottPletcher (11/16/2015)
but still cluster the table on the RedemptionCode.

Clustering a column with randomly generating values is not the best approach ever.

It's as bad as setting a GUID as clustered PK.

especially when you use:

... ( RedemptionCode ) WITH ( FILLFACTOR = 99 ) ;

This means that on every insert you get a case of index fragmentation.

I would suggest fillfactor not more than 50 with padding.

It will take a bit more time to read, but save more on writing.

But I must say - since ID column is not really needed, and there are not much columns left - the choice of a better candidate for a clustered PK is quite limited. 🙂

I may assume that [IsActive] must be used in selecting codes somewhere. So, ([IsActive], [RedemptionCode]) may be a reasonable choice.

Edit: I'm assuming RedemptionCode never (or almost never) changes. You'd add new codes, but you would not change existing ones.

It does not really matter in this case - update existing or add new. It will lead to the same consequences. Because new codes are not sequential and will be inevitably placed in between of existing codes. Splitting pages in the process.

_____________
Code for TallyGenerator

ScottPletcher SSC Guru Points: 100949 More actions · Answer 7

Sergiy (11/16/2015)
ScottPletcher (11/16/2015)
but still cluster the table on the RedemptionCode.
Clustering a column with randomly generating values is not the best approach ever.

That's a drastic overstatement. Remember, the table can be rebuilt after loading to it if necessary. Read performance is typically what's critical for these types of tables, and packing the data in reduces page buffer usage overall.

It's true that the fillfactor might need lowered, but I wouldn't do it based on general concerns about fragmentation. How often are codes added? Are they added on a periodic basis, in (big) groups, or added constantly and not on a pre-determined schedule? All those things factor into the best way to handle this. Fragmentation, even page splits, are not something you have to avoid at any and all costs. There are trade offs.

Finally, realize that the original table definition still had the entire table being stored in an index keyed by the code (the IsActive flag would be needed in the nonclustered index to make it covering anyway). All the dopey identity column did was force the entire table to be stored twice. Yes, (maybe) that gives you less fragmentation on the base table, but that's irrelevant if you (almost) never look up by the id anyway!! This is one of those situations where the perpetuated myth that every table should be clustered, by default, on an identity column is so damaging.

SQL DBA,SQL Server MVP(07, 08, 09) A socialist is someone who will give you the shirt off *someone else's* back.

Jeff Moden SSC Guru Points: 1003872 More actions · Answer 8

Jeff Moden (11/13/2015)
lraju10 (11/13/2015)
Problem : currently working on purchase sproc where it is taking 2 sec for the Purchase transaction to complete, i identified the problem statement, where i generate a six digit coupon code and make sure every time the support code is Unique
DECLARE @RedemptionSeed INT
DECLARE @RedemptionCode VARCHAR(10)
WHILE @RedemptionSeed IS NULL
BEGIN
SET @RedemptionSeed = (SELECT ABS(CHECKSUM(NEWID())))
END
SET @RedemptionCode = dbo.GenerateRandomBase32(@RedemptionSeed)
WHILE((@RedemptionCode IS NULL) OR EXISTS(SELECT RedemptionCode FROM dbo.UsedRedemptionCodes
WHERE RedemptionCode = @RedemptionCode))
BEGIN
SET @RedemptionSeed = (SELECT ABS(CHECKSUM(NEWID())))
SET @RedemptionCode = dbo.GenerateRandomBase32(@RedemptionSeed)
END
INSERT INTO dbo.UsedRedemptionCodes (RedemptionCode)
VALUES (@RedemptionCode)
In the above code EXISTS clause takes more than 1.2 sec - which is doing a check against the UsedRedemptionCodes table - where i insert each generated coupon into the table, currently UsedRedemptionCodes table has 2 million rows
Schema of the UsedRedemptionCodes table -
CREATE TABLE [dbo].[UsedRedemptionCodes](
[RedemptionCodeID] [int] IDENTITY(1,1) NOT NULL,
[RedemptionCode] [char](10) NULL,
[IsActive] [bit] NOT NULL,
CONSTRAINT [PK_UsedRedemptionCodes_RedemptionCodeID] PRIMARY KEY CLUSTERED ([RedemptionCodeID] ASC))
Created a Unique Non Clustered Index on RedemptionCode
CREATE UNIQUE NONCLUSTERED INDEX [AK_UsedRedemptionCodes_RedemptionCode] ON [dbo].[UsedRedemptionCodes]
([RedemptionCode] ASC) WHERE ([RedemptionCode] IS NOT NULL)
GO
If i just run the statement the select statement with hard coded values, it return in less than 10 milli second
SELECT RedemptionCode FROM dbo.UsedRedemptionCodes WHERE RedemptionCode = 'A1B2C3'
Any help would be greatly appreciated ..
The following snippet in the code above might get you and your company sued into oblivion. Please post the code so we can make sure that doesn't happen...
dbo.GenerateRandomBase32(@RedemptionSeed)

Thanks for sending me your function. My suggestions are...

1) If you're going to continue using that function, remove the "1" from the list of available characters. "SH1T" (for example) still looks very offensive. Some might even take offense to things like "D1CK" and even "G33K" or "D1K3".

2) Convert the code to set based code. Right now you have it as a Scalar Function with a While Loop.

3) My biggest suggestion is to simply stop trying to use a Base32 generator. I just don't understand why people feel that random "values" must contain both letters and digits or even all letters to be random. Sure, short is good and that's (I suppose) the main purpose of people using Base32 and other similar number but they're usually painful and can become a legal liability.

--Jeff Moden

RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
First step towards the paradigm shift of writing Set Based code:
________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

Change is inevitable... Change for the better is not.

Helpful Links:
How to post code problems
How to Post Performance Problems
Create a Tally Function (fnTally)

Sergiy SSC Guru Points: 110206 More actions · Answer 9

ScottPletcher (11/16/2015)
[Yes, (maybe) that gives you less fragmentation on the base table, but that's irrelevant if you (almost) never look up by the id anyway!!

I'd say it this way:

But the index which is actually used for seek (the one on RedemptionCode column) would be as badly fragmented.

This is one of those situations where the perpetuated myth that every table should be clustered, by default, on an identity column is so damaging.

Oh, is there such a myth?

Is it allowed to be said out loud in public???

It's true that the fillfactor might need lowered, but I wouldn't do it based on general concerns about fragmentation. How often are codes added? Are they added on a periodic basis, in (big) groups, or added constantly and not on a pre-determined schedule? All those things factor into the best way to handle this. Fragmentation, even page splits, are not something you have to avoid at any and all costs. There are trade offs.

Agree with you last sentence by 100%.

But - from what we see the codes are generated and added 1 by 1.

Every one of those inserts will certainly cause a fragmentation. Because every one of them will almost certainly hit another fully packed data page.

Therefore, with given approach to the table population it's better to have the index padded.

Finally, realize that the original table definition still had the entire table being stored in an index keyed by the code (the IsActive flag would be needed in the nonclustered index to make it covering anyway). All the dopey identity column did was force the entire table to be stored twice.

Actually, the original table definition is not so far away from being perfect.

Don't forget - they need random codes, and clustering against the Code column will put those codes in order.

So, they're gonna need to randomise them once again on selection.

I'd leave the ID column where it is, but made a clustered index on (IsActive, ID).

Then I'd populate the table with certain amount of codes sufficient for the nearest future, defragment it, and start giving away the TOP 1 WHERE IsActive = 1 ORDER BY ID.

That would use the clustered index and keep the distribution of the codes random.

And "deactivating" of supplied codes (as well as adding more codes in future) would not cause the table fragmentation at all. Only index which would need defragmentation is on the Code column, but it could be done once in a while, after adding another bulk of codes.

_____________
Code for TallyGenerator

ScottPletcher SSC Guru Points: 100949 More actions · Answer 10

Sergiy (11/16/2015)
ScottPletcher (11/16/2015)
[Yes, (maybe) that gives you less fragmentation on the base table, but that's irrelevant if you (almost) never look up by the id anyway!!
I'd say it this way:
But the index which is actually used for seek (the one on RedemptionCode column) would be as badly fragmented.
This is one of those situations where the perpetuated myth that every table should be clustered, by default, on an identity column is so damaging.
Oh, is there such a myth?
Is it allowed to be said out loud in public???
It's true that the fillfactor might need lowered, but I wouldn't do it based on general concerns about fragmentation. How often are codes added? Are they added on a periodic basis, in (big) groups, or added constantly and not on a pre-determined schedule? All those things factor into the best way to handle this. Fragmentation, even page splits, are not something you have to avoid at any and all costs. There are trade offs.
Agree with you last sentence by 100%.
But - from what we see the codes are generated and added 1 by 1.
Every one of those inserts will certainly cause a fragmentation. Because every one of them will almost certainly hit another fully packed data page.
Therefore, with given approach to the table population it's better to have the index padded.
Finally, realize that the original table definition still had the entire table being stored in an index keyed by the code (the IsActive flag would be needed in the nonclustered index to make it covering anyway). All the dopey identity column did was force the entire table to be stored twice.
Actually, the original table definition is not so far away from being perfect.
Don't forget - they need random codes, and clustering against the Code column will put those codes in order.
So, they're gonna need to randomise them once again on selection.
I'd leave the ID column where it is, but made a clustered index on (IsActive, ID).
Then I'd populate the table with certain amount of codes sufficient for the nearest future, defragment it, and start giving away the TOP 1 WHERE IsActive = 1 ORDER BY ID.
That would use the clustered index and keep the distribution of the codes random.
And "deactivating" of supplied codes (as well as adding more codes in future) would not cause the table fragmentation at all. Only index which would need defragmentation is on the Code column, but it could be done once in a while, after adding another bulk of codes.

Yes, if they are constantly adding codes, then lower the fill factor, absolutely. But there is still no reason whatsoever to cluster this table by identity. It's too bad people can't let go of the identity myth.

Assuming most of the codes are active, clustering by IsActive is also not necessary. Typically you'd want to remove inactive rows from the table anyway. Not to channel Celko here, but those inactive (logical delete) flags really are leftovers from bygone days. And activating codes stored in the table as not active would certainly fragment the table. A change in a clustering key is a delete and an insert combined, so it not only increases fragmentation but the logging required as well.

SQL DBA,SQL Server MVP(07, 08, 09) A socialist is someone who will give you the shirt off *someone else's* back.

Sergiy SSC Guru Points: 110206 More actions · Answer 11

ScottPletcher (11/16/2015)
But there is still no reason whatsoever to cluster this table by identity.

Actually, there is.

And I pointed on it in my previous post:

to preserve random sequence of the codes during its distribution.

Assuming most of the codes are active, clustering by IsActive is also not necessary. Typically you'd want to remove inactive rows from the table anyway.

No, you don't.

You don't want to generate and issue a previously issued code minutes after you have deleted it from the database.

Therefore you must keep deactivated codes in the table as long as it's required by the business. Probably forever.

_____________
Code for TallyGenerator

ScottPletcher SSC Guru Points: 100949 More actions · Answer 12

Sergiy (11/16/2015)
ScottPletcher (11/16/2015)
But there is still no reason whatsoever to cluster this table by identity.
Actually, there is.
And I pointed on it in my previous post:
to preserve random sequence of the codes during its distribution.
Assuming most of the codes are active, clustering by IsActive is also not necessary. Typically you'd want to remove inactive rows from the table anyway.
No, you don't.
You don't want to generate and issue a previously issued code minutes after you have deleted it from the database.
Therefore you must keep deactivated codes in the table as long as it's required by the business. Probably forever.

You can retain the rows in a different table. Use a partitioned view if that works better, with the appropriate CHECK constraints on IsActive in each table of the view. But it's overall detrimental to performance to leave a lot of inactive rows in the main table. And it complicates lookups.

SQL DBA,SQL Server MVP(07, 08, 09) A socialist is someone who will give you the shirt off *someone else's* back.

Sergiy SSC Guru Points: 110206 More actions · Answer 13

ScottPletcher (11/17/2015)
You can retain the rows in a different table. Use a partitioned view if that works better, with the appropriate CHECK constraints on IsActive in each table of the view. But it's overall detrimental to performance to leave a lot of inactive rows in the main table. And it complicates lookups.

You are missing the point.

The lookups must be done against the whole set of codes - active and inactive.

So, having them all in 1 table actually simplifies lookups.

As for giving the codes away - you need to find 1st code(s) which are active (clustered index on IsActive is better for this task than any partitioning) in order of their appearance (which is preserved by auto-incremental ID column, which goes into the 2nd column of the clustered index).

Yes, it's that simple.

Correct table design with an appropriate clustering solve all the problems in the most efficient way.

If you think partitioning (in any form) can do better - feel free to create a test script and show how it beats clustered index on (IsActive, ID).

_____________
Code for TallyGenerator

ScottPletcher SSC Guru Points: 100949 More actions · Answer 14

Sergiy (11/17/2015)
ScottPletcher (11/17/2015)
You can retain the rows in a different table. Use a partitioned view if that works better, with the appropriate CHECK constraints on IsActive in each table of the view. But it's overall detrimental to performance to leave a lot of inactive rows in the main table. And it complicates lookups.
You are missing the point.
The lookups must be done against the whole set of codes - active and inactive.
So, having them all in 1 table actually simplifies lookups.
As for giving the codes away - you need to find 1st code(s) which are active (clustered index on IsActive is better for this task than any partitioning) in order of their appearance (which is preserved by auto-incremental ID column, which goes into the 2nd column of the clustered index).
Yes, it's that simple.
Correct table design with an appropriate clustering solve all the problems in the most efficient way.
If you think partitioning (in any form) can do better - feel free to create a test script and show how it beats clustered index on (IsActive, ID).

First, how do you know that inactive codes are routinely searched? That's almost never the case.

Besides, the lookup is based on the Code value, not the ID value. Yet you're so mentally fixated on the idea of an identity that even when it is never used in any sample lookup given you still insist on clustering using it. Because of that, you're then forced to store the entire table again as a nonclustered index that can actually be used for lookups. That's a horrible -- if sadly often typical -- waste of resources.

SQL DBA,SQL Server MVP(07, 08, 09) A socialist is someone who will give you the shirt off *someone else's* back.