Update Order and MAXDOP

  • Matt is correct about needing the CASE... but I'm not sure why we need the #Summary table, at all...

    First, a brief lesson in creating massive amounts of test data in a short period of time... 3 words brief, in fact... "Avoid the loop". 1,000 rows is nothing in the world of databases so, if you're gonna test for performance, test against at least a million rows... and it takes a whopping minute and 34 seconds to create a million rows using the loop. A bit slow for us impatient data-trolls. 😀 Try this, instead... takes less than 30 seconds to gen both a million randomized rows and the clustered index...

    --===== If the "sales" test table already exists, drop it and rebuild it...

    -- This "loopless" method is very fast and will create a table with a million

    -- rows in it in about 12 seconds. It does 1,000 rows almost instantaneously

    IF OBJECT_ID('TempDB..#Sales','U') IS NOT NULL

    DROP TABLE #Sales

    SELECT TOP 1000000

    Dt = CAST('20071001' as SmallDateTime)

    + CAST(RAND(CAST(NEWID() AS VARBINARY))*100 AS INT),

    Sales = CAST(FLOOR(RAND(CAST(NEWID() AS VARBINARY))*200) AS NUMERIC(10,2)),

    Ord = CAST(0 AS INT)

    INTO #Sales

    FROM Master.dbo.SysColumns sc1,

    Master.dbo.SysColumns sc2

    --===== Add the required index as before

    CREATE CLUSTERED INDEX IDX_#Sales_Dt_Sales ON #Sales (Dt, Sales)

    Ok... on to the problem... like I said, I see no reason for the #Summary table, at all. With that in mind... this solves the million row example in about 6 seconds...

    ----====================================================================================

    -- Solve the problem...

    --====================================================================================

    --===== Create a couple of necessary local variables and initialize them for this run

    DECLARE @PrevDt SMALLDATETIME --Remembers value of Dt from previous row

    DECLARE @PrevOrd INT --Remembers value of Ord from previous row

    SET @PrevDt = '19000101'

    --===== Solve the problem... this only takes 6 seconds on a million rows

    UPDATE #Sales

    SET @PrevOrd = Ord = CASE WHEN @PrevDt = Dt THEN @PrevOrd + 1 ELSE 1 END,

    @PrevDt = Dt

    FROM #Sales WITH (INDEX(IDX_#Sales_Dt_Sales),TABLOCKX)

    --===== Display the result (first 1000 rows) from the #Sales table

    SELECT TOP 1000 * FROM #sales ORDER BY Dt,Sales

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

    Change is inevitable... Change for the better is not.


    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)

  • Hi, Jeff

    Thanks a lot for your help.

    You are absolutely right. I Put the loop because I would want to make some "wrong" result fastly.

    It's a kind a old programmer habit.

    I put the Inner Join #Summary BECAUSE it's essential to make the wrong output!

    Then I put again Inner Join #Summary code in the final Update and output is not right anymore!

    In order to add some "utility" to the added Inner Join, I've computed the relative sales percentual!

    I put my added parts in bold.

    Here go the code:

    --===== If the "sales" test table already exists, drop it and rebuild it...

    -- This "loopless" method is very fast and will create a table with a million

    -- rows in it in about 12 seconds. It does 1,000 rows almost instantaneously

    IF OBJECT_ID('TempDB..#Sales','U') IS NOT NULL

    DROP TABLE #Sales

    SELECT TOP 1000

    Dt = CAST('20071001' as SmallDateTime)

    + CAST(RAND(CAST(NEWID() AS VARBINARY))*100 AS INT),

    Sales = CAST(FLOOR(RAND(CAST(NEWID() AS VARBINARY))*200) AS NUMERIC(10,2)),

    Ord = CAST(0 AS INT),

    Share = CAST(0E1 As Float)

    INTO #Sales

    FROM Master.dbo.SysColumns sc1,

    Master.dbo.SysColumns sc2

    --===== Add the required index as before

    CREATE CLUSTERED INDEX IDX_#Sales_Dt_Sales ON #Sales (Dt, Sales)

    ----====================================================================================

    -- Solve the problem...

    --====================================================================================

    --===== Create a couple of necessary local variables and initialize them for this run

    DECLARE @PrevDt SMALLDATETIME --Remembers value of Dt from previous row

    DECLARE @PrevOrd INT --Remembers value of Ord from previous row

    SET @PrevDt = '19000101'

    IF OBJECT_ID('TempDB..#Summary','U') IS NOT NULL

    Drop Table #Summary

    Select Dt As Dat,

    Sum(Sales) As TSales

    Into #Summary

    From #Sales

    Group By Dt

    Create Clustered Index #SumVen On #Summary(Dat)

    --===== Solve the problem... this only takes 6 seconds on a million rows

    UPDATE #Sales

    SET @PrevOrd = Ord = CASE WHEN @PrevDt = Dt

    THEN @PrevOrd + 1 ELSE 1 END,

    @PrevDt = Dt,

    Share = Sales/TSales

    FROM #Sales AS V WITH (INDEX(IDX_#Sales_Dt_Sales),TABLOCKX)

    Inner Join #Summary

    On Dat = Dt

    --===== Display the result (first 1000 rows) from the #Sales table

    SELECT TOP 1000 * FROM #sales ORDER BY Dt,Sales

    Now the result is not right anymore. Why???????:alien:

  • p.buchsbaum (10/9/2007)


    Really, when I change 1 line in your code, it's all over.

    Create Clustered Index sales_ix On #Sales(Dt, Sales)

    ________In Place Of

    Create Clustered Index sales_ix On #Sales(Dt, Salestext)

    38 is considered greater than 41, 90, etc.

    It looks to me that it was doing some type of concatenation, so I noticed a few that did:

    date sales ord

    xxxx 15 1

    xxxx 159 2

    xxx 16 3

    Thus the padding (to make SURE that it didn't do that).

    For what it's worth - I understand what you're getting at with the join, but posting something where the join is actually used (I suspect you have a reason for asking that question - you just haven't told anyone what that is) helps. As you can see - efficiency experts like Jeff will "throw out" anything not used (which is what you'd want if you're tryimg to make something efficient).

    Jeff - take a look at a scenario where the join actually IS used or left in for better or worse, and you might find some oddities like what he mentioned.

    ----------------------------------------------------------------------------------
    Your lack of planning does not constitute an emergency on my part...unless you're my manager...or a director and above...or a really loud-spoken end-user..All right - what was my emergency again?

  • Hi, Matt.

    Matt, in the previous post, I already have placed a pratical application for Inner Join in that case, changing directly the Jeff's code.

    This one simple alteration twists completely the output!

    You can notice that using your code (Salestext in place of Sales), the output seems better, but keeps on with trouble in tie key index case (same index value)

    Oct,9 - 07 - Sales 187 has two ORD vales (74 and 978)

  • That's a function of a clustered index. "In the case of a TIE, a unique number is associated to the key to keep the rows distinct" - that's the reason for using a clustered index. That's what's causing that issue. the fact that the "unique number" is disrupting the sequence is unfortunate. You'd have to ensure your values in the running sum are in fact unique somehow.

    ----------------------------------------------------------------------------------
    Your lack of planning does not constitute an emergency on my part...unless you're my manager...or a director and above...or a really loud-spoken end-user..All right - what was my emergency again?

  • Matt, Thank you again

    You are absolutely right, but it doesn't explain the cause.

    I get your code and change the bold part.

    First I've added a Identity field in the table.

    After I've used this field in the index fields

    Finally I've changed @ord assignment in order to get easier to notice the

    order.

    Now, there is no duplicate and it's working perfectly.

    But it's very strange, because Inner Join is a disturbing clause

    that causes everything!

    Mistery .... 😉

    Set NoCount ON

    Declare @Dt char(10)

    Declare @Count Int

    Declare @NDays Int

    Declare @sales Int

    Declare @Inic Float

    Select @Inic=rand(1000)

    Drop Table #Sales

    Create Table #Sales (Iden Int Identity, Dt char(8), Sales Numeric(10,2),salestext char(10), Ord Int)

    Set @Count = 0

    While @Count< 1000

    begin

    Set @NDays = Floor( rand()*100 )

    Set @sales= Floor(rand()*200)

    Insert Into #Sales

    Select

    Convert(char,Cast('20071001' as SmallDateTime) + @NDays,112),@Sales, right('0000000000'+cast(@Sales as varchar(10)),10), 0

    Set @Count = @Count + 1

    end

    Create Clustered Index sales_ix On #Sales(Dt, Salestext, Iden)

    Drop Table #Summary

    Select Dt,

    Sum(Sales) As Sales

    Into #Summary

    From #Sales

    Group By Dt

    Create Clustered Index #SumVen On #Summary(Dt)

    Declare @Ord Int

    Set @Ord = 0

    set @dt=''

    Update T

    Set

    @Ord = @Ord+1, -- case when @dt=t.dt then @Ord else 0 end +1,Ord = @Ord,

    @dt=t.dt

    From #Sales As T with (index(sales_ix),tablock)

    , #Summary As S with (index(#sumven),tablock)

    where T.Dt=s.dt

    Select * From #Sales

    Order By dt,sales

  • The real problem is that you're trying to solve two different problems at once... divide and conquer... the code will not only be faster performance wise, it will also be easier to read, troubleshoot, and modify.

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

    Change is inevitable... Change for the better is not.


    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)

  • Jeff,

    The trouble is that the examples that I post in this forum is

    a very rough example just based on real problem, that includes

    many databases from my server.

    I would intend building in this post a query that was completely independent and playable in any SQL Server 2000 in the same way.

    In my work, I was starting to use Update based on clustered index order of my updated table. Suddenly, I've discovered that it doen't works OK in all situations!

    This feature would seems like a fast way to do complicated stuff without using cursors or subqueries in select part of update.

    I already had tested the efficiency and, no doubt, it's a very fast approach in my SQL Server 2000

    Now, sadly, I'm changing my updates, place subqueries in SET part, like

    Update PD

    Set

    ...

    @CountaAnt = ( Select Counta From PT.Arq Where

    PT.Store=PD.Store and PT.Iden=PD.Iden-1 ),

    ...

    From Arq AS PD

    ...

    In this case, I need get the dates where the equipment count (COUNTA) changes in the store.

    Iden is a Identity Field. It would be more simple if the following code

    works well!

    Update PD

    Set

    ...

    @ContaAnt = @Conta,

    @Conta = Counta ...

    ....

    From Arq AS PD

    ...

  • The update method works well, in all cases... and it does NOT depend on a Clustered Index... just SOME index.

    You just need to settle down, maybe have a beer, and decide how you're going to divide this problem up... you keep posting snippets of code that you decided to change and, and, and... we can't hit a rolling donut for ya... at least not in the center. 😛 And, Yes, I understand that you're posting things a bit differently than what you really have to try to simplify things... and it's ok to do that... but then you need to take those simple lessons and apply them to your real situation... one at a time. Divide and conquer... peel one potatoe at a time...

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

    Change is inevitable... Change for the better is not.


    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)

  • I'm not sure I'm getting what the real problem is, but... it seems to be about ordering...?

    That is, the problem is that a given order is expected, and that order is supposed to rely on a clustered index... Is that correct?

    If this is it, then you're out of luck.

    It feels like beating a dead horse, but if you depend on a specific order, you must use ORDER BY.

    If there is no ORDER BY, then rows will be 'processed' in the order the server physically finds them, and that may change from time to time, from plan to plan.

    If there is no ORDER BY you can *never* guarantee anything relying on ordering to actually happen as expected, in that particular order.

    If this wasn't it, and I'm way off, just forget I said anything 😉

    /Kenneth

  • Please provide the code and data that demonstrates when the correctly specified Update and Index fails. Then provide your correct Order By method that solves the problem.

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

    Change is inevitable... Change for the better is not.


    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)

  • Hi, Kenneth

    I've discovered that even if the records are created in the same cluster index order, the UPDATE statement processing order is unpredictable:

    The UPDATE Order is different in each situation. In the post example, with no INNER JOIN statement, it works well. If put a useless INNER JOIN it's the end of the right universe! But if I use string fields in clustered index, it works again (no data, no number). Almost! because records with duplicate index key got crazy. But if I add a identity field in the index, it's OK again.

    How can you see, this erratic behavior cannot be assigned to phisical order!

    My SQL SERVER is 2000 SP4

  • Kenneth Wilhelmsson (10/10/2007)


    I'm not sure I'm getting what the real problem is, but... it seems to be about ordering...?

    That is, the problem is that a given order is expected, and that order is supposed to rely on a clustered index... Is that correct?

    /Kenneth

    It IS about order, but it's one of those where ORDER BY doesn't work (it's an update statement), so we're "backdooring" the order in using the WITH (INDEX(....)) syntax. With an understanding of the pitfalls involved with duplicate handlings in this case - it's workable.

    The most interesting part about this one was what seemed to be different handling of this syntax by 2000 vs 2005. It's not unworkable - just one of those "oh - I will have to mess with this piece of code when I move it over" things. Of course - I don't think I would introduce a join in most cases (I'd denormalize during the insert into temp table phase).

    ----------------------------------------------------------------------------------
    Your lack of planning does not constitute an emergency on my part...unless you're my manager...or a director and above...or a really loud-spoken end-user..All right - what was my emergency again?

  • Ok.

    But then the case is that we're exploiting a physical implementation (that is, we think we know how our data is stored on the indexpages) and we cross our fingers and hope to die that this will serve as a predictable sorting-sequence...?

    It's not foolproof and can't be relied on. Simple as that.

    It is as I stated. If we require an order to be obeyed, we *must* say ORDER BY.

    All other tricks are bound by chance and pure luck.

    Exactly how to solve this particular orderingproblem I can't say right away..

    (haven't read that closely, so I don't really know what is to begin with or what the goal is) 😉

    /Kenneth

  • Jeff, Matt and Kenneth

    😀

    I think that I've found a solution for my problem,

    Inspired by Kenneth (he've said that I should use ORDER BY), even

    knowing that UPDATE there is no ORDER BY clause, I've decided search ORDER word in UPDATE Statement help and I've met a FORCE ORDER option.

    The helps tells about FORCE ORDER "Specifies that the join order indicated by the query syntax should be preserved during query optimization"

    Since I've noticed that "INNER JOIN" is the devil in this question, I saw a light in the end of the tunnel.

    So, no problems with duplicate key index. No problems with data and number fields in the index.

    It will be this solution realiable in the SQL SERVER 2000?????

    The code: (Thanks Jeff and Matt)

    Set NoCount ON

    Drop Table #Summary

    Drop Table #Sales

    Declare @Dt SmallDateTime

    Declare @Count Int

    Declare @NDays Int

    Declare @sales Int

    Declare @Inic Float

    Select @Inic=rand(1000)

    SELECT TOP 1000

    Dt = CAST('20071001' as SmallDateTime)+ Floor( rand(CAST(NEWID() AS VARBINARY))*100 ),

    Sales = CAST( Floor(rand(CAST(NEWID() AS VARBINARY))*200) AS NUMERIC(10,2)),

    Ord = CAST(0 AS INT)

    INTO #Sales

    FROM Master.dbo.SysColumns sc1, Master.dbo.SysColumns sc2

    Create Clustered Index #Sales On #Sales(Dt, Sales)

    Select Dt,

    Sum(Sales) As Sales

    Into #Summary

    From #Sales

    Group By Dt

    Create Clustered Index #SumVen On #Summary(Dt)

    Declare @Ord Int

    Set @Ord = 0

    Update T

    Set

    @Ord = case when @dt=t.dt then @Ord else 0 end +1,

    @dt = t.dt,

    Ord = @Ord

    From #Sales As T

    Inner Join #Summary As S

    On S.Dt=T.Dt

    Option (MAXDOP 1, Force Order)

    Select * From #Sales

Viewing 15 posts - 16 through 30 (of 35 total)

You must be logged in to reply to this topic. Login to reply