Update Order and MAXDOP

Question

Update Order and MAXDOP

Viewing 15 posts - 16 through 30 (of 35 total)

You must be logged in to reply to this topic. Login to reply

Jeff Moden SSC Guru Points: 1004664 More actions · Answer 1

Matt is correct about needing the CASE... but I'm not sure why we need the #Summary table, at all...

First, a brief lesson in creating massive amounts of test data in a short period of time... 3 words brief, in fact... "Avoid the loop". 1,000 rows is nothing in the world of databases so, if you're gonna test for performance, test against at least a million rows... and it takes a whopping minute and 34 seconds to create a million rows using the loop. A bit slow for us impatient data-trolls. 😀 Try this, instead... takes less than 30 seconds to gen both a million randomized rows and the clustered index...

--===== If the "sales" test table already exists, drop it and rebuild it...

-- This "loopless" method is very fast and will create a table with a million

-- rows in it in about 12 seconds. It does 1,000 rows almost instantaneously

IF OBJECT_ID('TempDB..#Sales','U') IS NOT NULL

DROP TABLE #Sales

SELECT TOP 1000000

Dt = CAST('20071001' as SmallDateTime)

+ CAST(RAND(CAST(NEWID() AS VARBINARY))*100 AS INT),

Sales = CAST(FLOOR(RAND(CAST(NEWID() AS VARBINARY))*200) AS NUMERIC(10,2)),

Ord = CAST(0 AS INT)

INTO #Sales

FROM Master.dbo.SysColumns sc1,

Master.dbo.SysColumns sc2

--===== Add the required index as before

CREATE CLUSTERED INDEX IDX_#Sales_Dt_Sales ON #Sales (Dt, Sales)

Ok... on to the problem... like I said, I see no reason for the #Summary table, at all. With that in mind... this solves the million row example in about 6 seconds...

----====================================================================================

-- Solve the problem...

--====================================================================================

--===== Create a couple of necessary local variables and initialize them for this run

DECLARE @PrevDt SMALLDATETIME --Remembers value of Dt from previous row

DECLARE @PrevOrd INT --Remembers value of Ord from previous row

SET @PrevDt = '19000101'

--===== Solve the problem... this only takes 6 seconds on a million rows

UPDATE #Sales

SET @PrevOrd = Ord = CASE WHEN @PrevDt = Dt THEN @PrevOrd + 1 ELSE 1 END,

@PrevDt = Dt

FROM #Sales WITH (INDEX(IDX_#Sales_Dt_Sales),TABLOCKX)

--===== Display the result (first 1000 rows) from the #Sales table

SELECT TOP 1000 * FROM #sales ORDER BY Dt,Sales

--Jeff Moden

RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
First step towards the paradigm shift of writing Set Based code:
________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

Change is inevitable... Change for the better is not.

Helpful Links:
How to post code problems
How to Post Performance Problems
Create a Tally Function (fnTally)

Paulo Buchsbaum SSC Eights! Points: 851 More actions · Answer 2

Hi, Jeff

Thanks a lot for your help.

You are absolutely right. I Put the loop because I would want to make some "wrong" result fastly.

It's a kind a old programmer habit.

I put the Inner Join #Summary BECAUSE it's essential to make the wrong output!

Then I put again Inner Join #Summary code in the final Update and output is not right anymore!

In order to add some "utility" to the added Inner Join, I've computed the relative sales percentual!

I put my added parts in bold.

Here go the code:

--===== If the "sales" test table already exists, drop it and rebuild it...

-- This "loopless" method is very fast and will create a table with a million

-- rows in it in about 12 seconds. It does 1,000 rows almost instantaneously

IF OBJECT_ID('TempDB..#Sales','U') IS NOT NULL

DROP TABLE #Sales

SELECT TOP 1000

Dt = CAST('20071001' as SmallDateTime)

+ CAST(RAND(CAST(NEWID() AS VARBINARY))*100 AS INT),

Sales = CAST(FLOOR(RAND(CAST(NEWID() AS VARBINARY))*200) AS NUMERIC(10,2)),

Ord = CAST(0 AS INT),

Share = CAST(0E1 As Float)

INTO #Sales

FROM Master.dbo.SysColumns sc1,

Master.dbo.SysColumns sc2

--===== Add the required index as before

CREATE CLUSTERED INDEX IDX_#Sales_Dt_Sales ON #Sales (Dt, Sales)

----====================================================================================

-- Solve the problem...

--====================================================================================

--===== Create a couple of necessary local variables and initialize them for this run

DECLARE @PrevDt SMALLDATETIME --Remembers value of Dt from previous row

DECLARE @PrevOrd INT --Remembers value of Ord from previous row

SET @PrevDt = '19000101'

IF OBJECT_ID('TempDB..#Summary','U') IS NOT NULL

Drop Table #Summary

Select Dt As Dat,

Sum(Sales) As TSales

Into #Summary

From #Sales

Group By Dt

Create Clustered Index #SumVen On #Summary(Dat)

--===== Solve the problem... this only takes 6 seconds on a million rows

UPDATE #Sales

SET @PrevOrd = Ord = CASE WHEN @PrevDt = Dt

THEN @PrevOrd + 1 ELSE 1 END,

@PrevDt = Dt,

Share = Sales/TSales

FROM #Sales AS V WITH (INDEX(IDX_#Sales_Dt_Sales),TABLOCKX)

Inner Join #Summary

On Dat = Dt

--===== Display the result (first 1000 rows) from the #Sales table

SELECT TOP 1000 * FROM #sales ORDER BY Dt,Sales

Now the result is not right anymore. Why???????:alien:

Matt Miller (4) SSC Guru Points: 124210 More actions · Answer 3

p.buchsbaum (10/9/2007)
Really, when I change 1 line in your code, it's all over.
Create Clustered Index sales_ix On #Sales(Dt, Sales)
________In Place Of
Create Clustered Index sales_ix On #Sales(Dt, Salestext)
38 is considered greater than 41, 90, etc.

It looks to me that it was doing some type of concatenation, so I noticed a few that did:

date sales ord

xxxx 15 1

xxxx 159 2

xxx 16 3

Thus the padding (to make SURE that it didn't do that).

For what it's worth - I understand what you're getting at with the join, but posting something where the join is actually used (I suspect you have a reason for asking that question - you just haven't told anyone what that is) helps. As you can see - efficiency experts like Jeff will "throw out" anything not used (which is what you'd want if you're tryimg to make something efficient).

Jeff - take a look at a scenario where the join actually IS used or left in for better or worse, and you might find some oddities like what he mentioned.

----------------------------------------------------------------------------------
Your lack of planning does not constitute an emergency on my part...unless you're my manager...or a director and above...or a really loud-spoken end-user..All right - what was my emergency again?

Paulo Buchsbaum SSC Eights! Points: 851 More actions · Answer 4

Hi, Matt.

Matt, in the previous post, I already have placed a pratical application for Inner Join in that case, changing directly the Jeff's code.

This one simple alteration twists completely the output!

You can notice that using your code (Salestext in place of Sales), the output seems better, but keeps on with trouble in tie key index case (same index value)

Oct,9 - 07 - Sales 187 has two ORD vales (74 and 978)

Matt Miller (4) SSC Guru Points: 124210 More actions · Answer 5

That's a function of a clustered index. "In the case of a TIE, a unique number is associated to the key to keep the rows distinct" - that's the reason for using a clustered index. That's what's causing that issue. the fact that the "unique number" is disrupting the sequence is unfortunate. You'd have to ensure your values in the running sum are in fact unique somehow.

----------------------------------------------------------------------------------
Your lack of planning does not constitute an emergency on my part...unless you're my manager...or a director and above...or a really loud-spoken end-user..All right - what was my emergency again?

Paulo Buchsbaum SSC Eights! Points: 851 More actions · Answer 6

Matt, Thank you again

You are absolutely right, but it doesn't explain the cause.

I get your code and change the bold part.

First I've added a Identity field in the table.

After I've used this field in the index fields

Finally I've changed @ord assignment in order to get easier to notice the

order.

Now, there is no duplicate and it's working perfectly.

But it's very strange, because Inner Join is a disturbing clause

that causes everything!

Mistery .... 😉

Set NoCount ON

Declare @Dt char(10)

Declare @Count Int

Declare @NDays Int

Declare @sales Int

Declare @Inic Float

Select @Inic=rand(1000)

Drop Table #Sales

Create Table #Sales (Iden Int Identity, Dt char(8), Sales Numeric(10,2),salestext char(10), Ord Int)

Set @Count = 0

While @Count< 1000

begin

Set @NDays = Floor( rand()*100 )

Set @sales= Floor(rand()*200)

Insert Into #Sales

Select

Convert(char,Cast('20071001' as SmallDateTime) + @NDays,112),@Sales, right('0000000000'+cast(@Sales as varchar(10)),10), 0

Set @Count = @Count + 1

end

Create Clustered Index sales_ix On #Sales(Dt, Salestext, Iden)

Drop Table #Summary

Select Dt,

Sum(Sales) As Sales

Into #Summary

From #Sales

Group By Dt

Create Clustered Index #SumVen On #Summary(Dt)

Declare @Ord Int

Set @Ord = 0

set @dt=''

Update T

Set

@Ord = @Ord+1, -- case when @dt=t.dt then @Ord else 0 end +1,Ord = @Ord,

@dt=t.dt

From #Sales As T with (index(sales_ix),tablock)

, #Summary As S with (index(#sumven),tablock)

where T.Dt=s.dt

Select * From #Sales

Order By dt,sales

Jeff Moden SSC Guru Points: 1004664 More actions · Answer 7

The real problem is that you're trying to solve two different problems at once... divide and conquer... the code will not only be faster performance wise, it will also be easier to read, troubleshoot, and modify.

--Jeff Moden

RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
First step towards the paradigm shift of writing Set Based code:
________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

Change is inevitable... Change for the better is not.

Helpful Links:
How to post code problems
How to Post Performance Problems
Create a Tally Function (fnTally)

Paulo Buchsbaum SSC Eights! Points: 851 More actions · Answer 8

Jeff,

The trouble is that the examples that I post in this forum is

a very rough example just based on real problem, that includes

many databases from my server.

I would intend building in this post a query that was completely independent and playable in any SQL Server 2000 in the same way.

In my work, I was starting to use Update based on clustered index order of my updated table. Suddenly, I've discovered that it doen't works OK in all situations!

This feature would seems like a fast way to do complicated stuff without using cursors or subqueries in select part of update.

I already had tested the efficiency and, no doubt, it's a very fast approach in my SQL Server 2000

Now, sadly, I'm changing my updates, place subqueries in SET part, like

Update PD

Set

...

@CountaAnt = ( Select Counta From PT.Arq Where

PT.Store=PD.Store and PT.Iden=PD.Iden-1 ),

...

From Arq AS PD

...

In this case, I need get the dates where the equipment count (COUNTA) changes in the store.

Iden is a Identity Field. It would be more simple if the following code

works well!

Update PD

Set

...

@ContaAnt = @Conta,

@Conta = Counta ...

....

From Arq AS PD

...

Jeff Moden SSC Guru Points: 1004664 More actions · Answer 9

The update method works well, in all cases... and it does NOT depend on a Clustered Index... just SOME index.

You just need to settle down, maybe have a beer, and decide how you're going to divide this problem up... you keep posting snippets of code that you decided to change and, and, and... we can't hit a rolling donut for ya... at least not in the center. 😛 And, Yes, I understand that you're posting things a bit differently than what you really have to try to simplify things... and it's ok to do that... but then you need to take those simple lessons and apply them to your real situation... one at a time. Divide and conquer... peel one potatoe at a time...

--Jeff Moden

RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
First step towards the paradigm shift of writing Set Based code:
________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

Change is inevitable... Change for the better is not.

Helpful Links:
How to post code problems
How to Post Performance Problems
Create a Tally Function (fnTally)

Kenneth Wilhelmsson SSC-Dedicated Points: 30043 More actions · Answer 10

I'm not sure I'm getting what the real problem is, but... it seems to be about ordering...?

That is, the problem is that a given order is expected, and that order is supposed to rely on a clustered index... Is that correct?

If this is it, then you're out of luck.

It feels like beating a dead horse, but if you depend on a specific order, you must use ORDER BY.

If there is no ORDER BY, then rows will be 'processed' in the order the server physically finds them, and that may change from time to time, from plan to plan.

If there is no ORDER BY you can *never* guarantee anything relying on ordering to actually happen as expected, in that particular order.

If this wasn't it, and I'm way off, just forget I said anything 😉

/Kenneth

Jeff Moden SSC Guru Points: 1004664 More actions · Answer 11

Please provide the code and data that demonstrates when the correctly specified Update and Index fails. Then provide your correct Order By method that solves the problem.

--Jeff Moden

RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
First step towards the paradigm shift of writing Set Based code:
________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

Change is inevitable... Change for the better is not.

Helpful Links:
How to post code problems
How to Post Performance Problems
Create a Tally Function (fnTally)

Paulo Buchsbaum SSC Eights! Points: 851 More actions · Answer 12

Hi, Kenneth

I've discovered that even if the records are created in the same cluster index order, the UPDATE statement processing order is unpredictable:

The UPDATE Order is different in each situation. In the post example, with no INNER JOIN statement, it works well. If put a useless INNER JOIN it's the end of the right universe! But if I use string fields in clustered index, it works again (no data, no number). Almost! because records with duplicate index key got crazy. But if I add a identity field in the index, it's OK again.

How can you see, this erratic behavior cannot be assigned to phisical order!

My SQL SERVER is 2000 SP4

Matt Miller (4) SSC Guru Points: 124210 More actions · Answer 13

Kenneth Wilhelmsson (10/10/2007)
I'm not sure I'm getting what the real problem is, but... it seems to be about ordering...?
That is, the problem is that a given order is expected, and that order is supposed to rely on a clustered index... Is that correct?
/Kenneth

It IS about order, but it's one of those where ORDER BY doesn't work (it's an update statement), so we're "backdooring" the order in using the WITH (INDEX(....)) syntax. With an understanding of the pitfalls involved with duplicate handlings in this case - it's workable.

The most interesting part about this one was what seemed to be different handling of this syntax by 2000 vs 2005. It's not unworkable - just one of those "oh - I will have to mess with this piece of code when I move it over" things. Of course - I don't think I would introduce a join in most cases (I'd denormalize during the insert into temp table phase).

----------------------------------------------------------------------------------
Your lack of planning does not constitute an emergency on my part...unless you're my manager...or a director and above...or a really loud-spoken end-user..All right - what was my emergency again?

Kenneth Wilhelmsson SSC-Dedicated Points: 30043 More actions · Answer 14

Ok.

But then the case is that we're exploiting a physical implementation (that is, we think we know how our data is stored on the indexpages) and we cross our fingers and hope to die that this will serve as a predictable sorting-sequence...?

It's not foolproof and can't be relied on. Simple as that.

It is as I stated. If we require an order to be obeyed, we *must* say ORDER BY.

All other tricks are bound by chance and pure luck.

Exactly how to solve this particular orderingproblem I can't say right away..

(haven't read that closely, so I don't really know what is to begin with or what the goal is) 😉

/Kenneth

Paulo Buchsbaum SSC Eights! Points: 851 More actions · Answer 15

Jeff, Matt and Kenneth

😀

I think that I've found a solution for my problem,

Inspired by Kenneth (he've said that I should use ORDER BY), even

knowing that UPDATE there is no ORDER BY clause, I've decided search ORDER word in UPDATE Statement help and I've met a FORCE ORDER option.

The helps tells about FORCE ORDER "Specifies that the join order indicated by the query syntax should be preserved during query optimization"

Since I've noticed that "INNER JOIN" is the devil in this question, I saw a light in the end of the tunnel.

So, no problems with duplicate key index. No problems with data and number fields in the index.

It will be this solution realiable in the SQL SERVER 2000?????

The code: (Thanks Jeff and Matt)

Set NoCount ON

Drop Table #Summary

Drop Table #Sales

Declare @Dt SmallDateTime

Declare @Count Int

Declare @NDays Int

Declare @sales Int

Declare @Inic Float

Select @Inic=rand(1000)

SELECT TOP 1000

Dt = CAST('20071001' as SmallDateTime)+ Floor( rand(CAST(NEWID() AS VARBINARY))*100 ),

Sales = CAST( Floor(rand(CAST(NEWID() AS VARBINARY))*200) AS NUMERIC(10,2)),

Ord = CAST(0 AS INT)

INTO #Sales

FROM Master.dbo.SysColumns sc1, Master.dbo.SysColumns sc2

Create Clustered Index #Sales On #Sales(Dt, Sales)

Select Dt,

Sum(Sales) As Sales

Into #Summary

From #Sales

Group By Dt

Create Clustered Index #SumVen On #Summary(Dt)

Declare @Ord Int

Set @Ord = 0

Update T

Set

@Ord = case when @dt=t.dt then @Ord else 0 end +1,

@dt = t.dt,

Ord = @Ord

From #Sales As T

Inner Join #Summary As S

On S.Dt=T.Dt

Option (MAXDOP 1, Force Order)

Select * From #Sales