column value balancing (copy previous value <> NULL)

  • Hi Wayne,

    WOW !!!!!!!!!, I have tested it and add an extra column and it works greatly , I'am very impressed with your skills !!!!

    Thank you soooo much.

    How do you think performance will on a file with 26 milj records ?? I will test it anyway to see myself but i like to here your comments on this

    Regards

    Dick Herberts

  • Great job, Wayne!

    That's the solution I had in mind... 😉



    Lutz
    A pessimist is an optimist with experience.

    How to get fast answers to your question[/url]
    How to post performance related questions[/url]
    Links for Tally Table [/url] , Cross Tabs [/url] and Dynamic Cross Tabs [/url], Delimited Split Function[/url]

  • @Dick: VERY IMPORTANT!!!

    Based on your previous note "4. No clustered index yet " Do NOT run the code until you added a clustered index that supports the order the data need to be updated!!

    I'd probably go with

    CREATE UNIQUE CLUSTERED INDEX CX_testtable_ImportantForQuirkyUpdate

    ON testtable (Category,Class,Location,RowNo)

    Regarding performance:

    I duplicated your data to build a Million-Row table (increasing Category value, actually 1.4Mill rows). Waynes code run in 3 seconds (excluding the final SELECT * which took another 17 sec). My system: Home PC with WinXP and SQL2K5.

    Range as expected 😀



    Lutz
    A pessimist is an optimist with experience.

    How to get fast answers to your question[/url]
    How to post performance related questions[/url]
    Links for Tally Table [/url] , Cross Tabs [/url] and Dynamic Cross Tabs [/url], Delimited Split Function[/url]

  • LutzM (11/17/2010)


    @Dick: VERY IMPORTANT!!!

    Based on your previous note "4. No clustered index yet " Do NOT run the code until you added a clustered index that supports the order the data need to be updated!!

    I'd probably go with

    CREATE UNIQUE CLUSTERED INDEX CX_testtable_ImportantForQuirkyUpdate

    ON testtable (Category,Class,Location,RowNo)

    I knew there was something I was going to add to my post - THANKS LUTZ!

    Since you said that the PK was these same four columns - I just make the PK clustered. Since you have a multi-million row table, you will have to consider whether this is what you want to do or not.

    Regarding performance:

    I duplicated your data to build a Million-Row table (increasing Category value, actually 1.4Mill rows). Waynes code run in 3 seconds (excluding the final SELECT * which took another 17 sec). My system: Home PC with WinXP and SQL2K5.

    Range as expected 😀

    It took 3 whole seconds on your system? What'd you do, kick off a full system virus scan at the same time? :w00t:

    Seriously, that's about the expected time for this to run.

    Sorry I forgot to mention the clustered index - and again, thank you Lutz for picking up on that. Dick, that clustered index is supremely important for this to work correctly - otherwise, you'll end up with garbage.

    Wayne
    Microsoft Certified Master: SQL Server 2008
    Author - SQL Server T-SQL Recipes


    If you can't explain to another person how the code that you're copying from the internet works, then DON'T USE IT on a production system! After all, you will be the one supporting it!
    Links:
    For better assistance in answering your questions
    Performance Problems
    Common date/time routines
    Understanding and Using APPLY Part 1 & Part 2

  • Hi guys,

    I'am testing the routine right now on a table with 26.000.000 records, tomorrow i will give the performance results.

    Thanks very much for all your knowledge input

    Have a nice evening and till tomorrow

  • WayneS (11/17/2010)


    ...

    I knew there was something I was going to add to my post - THANKS LUTZ!

    ...

    It took 3 whole seconds on your system? What'd you do, kick off a full system virus scan at the same time? :w00t:

    Seriously, that's about the expected time for this to run.

    ...

    My pleasure, mate! ()

    Regarding performance: I double checked it using SET STATISTICS and Profiler. Both confirmed 2.6 sec.

    I have to correct myself: The code didn't take whole 3 seconds. Just slightly more than 2, which is almost 1sec. And one sec doesn't count. So, the code finished instantly. :-D:-P



    Lutz
    A pessimist is an optimist with experience.

    How to get fast answers to your question[/url]
    How to post performance related questions[/url]
    Links for Tally Table [/url] , Cross Tabs [/url] and Dynamic Cross Tabs [/url], Delimited Split Function[/url]

  • Dick Herberts-320552 (11/17/2010)


    Hi guys,

    I'am testing the routine right now on a table with 26.000.000 records, tomorrow i will give the performance results.

    Thanks very much for all your knowledge input

    Have a nice evening and till tomorrow

    Did you add the clustered index before running it? (fingers crossed)

    And you seriously don't have a minute to wait for the result? 😉



    Lutz
    A pessimist is an optimist with experience.

    How to get fast answers to your question[/url]
    How to post performance related questions[/url]
    Links for Tally Table [/url] , Cross Tabs [/url] and Dynamic Cross Tabs [/url], Delimited Split Function[/url]

  • LutzM (11/17/2010)


    And you seriously don't have a minute to wait for the result? 😉

    He doesn't think it will finish that fast... won't he be in for a surprise in the morning when he comes in.

    If he didn't do that clustered index... due to the safety check, it won't complete if anything is being processed out of order. So all he'll have to do is build the index, then run again. (And if that new-fangled safety-check wasn't there, he'd be having to do a database restore to get the correct data!)

    Wayne
    Microsoft Certified Master: SQL Server 2008
    Author - SQL Server T-SQL Recipes


    If you can't explain to another person how the code that you're copying from the internet works, then DON'T USE IT on a production system! After all, you will be the one supporting it!
    Links:
    For better assistance in answering your questions
    Performance Problems
    Common date/time routines
    Understanding and Using APPLY Part 1 & Part 2

  • Don't be surprised if you find the method works on heaps, as well.

    Nice job, though, guys. 🙂

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

    Change is inevitable... Change for the better is not.


    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)

  • Jeff Moden (11/17/2010)


    Don't be surprised if you find the method works on heaps, as well.

    Nice job, though, guys. 🙂

    Heh... we learned from the master! :w00t:

    Wayne
    Microsoft Certified Master: SQL Server 2008
    Author - SQL Server T-SQL Recipes


    If you can't explain to another person how the code that you're copying from the internet works, then DON'T USE IT on a production system! After all, you will be the one supporting it!
    Links:
    For better assistance in answering your questions
    Performance Problems
    Common date/time routines
    Understanding and Using APPLY Part 1 & Part 2

  • Hi Wayne and other techies,

    Today i tested Wayne's routine on a table with 20.000.000 records. In total 6 columns are calculated and updated by Wayne's routine and it took 6 minutes. Before i ran the routine i created the clustered index. I have to add 8 more columns to calculate and then i will test it again. I have checked some results (random check) and it looked fine.

    I have to say this again: I'am very impressed by this little nasty program who does the job so well. I never thought it was possible, but thanks to you clever guys and especially Wayne, i'am in heaven.

    Thanks Wayne and all others for the input.

    Regards,

    Dick Herberts

  • Dick Herberts-320552 (11/18/2010)


    Hi Wayne and other techies,

    Today i tested Wayne's routine on a table with 20.000.000 records. In total 6 columns are calculated and updated by Wayne's routine and it took 6 minutes. Before i ran the routine i created the clustered index. I have to add 8 more columns to calculate and then i will test it again. I have checked some results (random check) and it looked fine.

    I have to say this again: I'am very impressed by this little nasty program who does the job so well. I never thought it was possible, but thanks to you clever guys and especially Wayne, i'am in heaven.

    Thanks Wayne and all others for the input.

    Regards,

    Dick Herberts

    :blush:

    Dick, I really do appreciate your kind words. And truth be told, it's things like this that keep many of us going.

    However, to be truthful, I learned this from the guy I call "the master"... Jeff Moden. It would be worth your time to read all of his articles that he has written (and there are many other authors on this site with articles just as excellent as his - just search them out). You can find Jeff's articles here[/url].

    Edit: But seriously... 6 minutes? It took that long? That must be one large table you're dealing with there!

    Edit2: I would expect only a modest increase in the time to complete when adding the additional 8 columns... I would anticipate no more than 15 seconds total. It would be interesting to see how it ends up! Please keep us informed!

    Wayne
    Microsoft Certified Master: SQL Server 2008
    Author - SQL Server T-SQL Recipes


    If you can't explain to another person how the code that you're copying from the internet works, then DON'T USE IT on a production system! After all, you will be the one supporting it!
    Links:
    For better assistance in answering your questions
    Performance Problems
    Common date/time routines
    Understanding and Using APPLY Part 1 & Part 2

  • WayneS (11/18/2010)


    Dick Herberts-320552 (11/18/2010)


    Hi Wayne and other techies,

    Today i tested Wayne's routine on a table with 20.000.000 records. In total 6 columns are calculated and updated by Wayne's routine and it took 6 minutes. Before i ran the routine i created the clustered index. I have to add 8 more columns to calculate and then i will test it again. I have checked some results (random check) and it looked fine.

    I have to say this again: I'am very impressed by this little nasty program who does the job so well. I never thought it was possible, but thanks to you clever guys and especially Wayne, i'am in heaven.

    Thanks Wayne and all others for the input.

    Regards,

    Dick Herberts

    :blush:

    Dick, I really do appreciate your kind words. And truth be told, it's things like this that keep many of us going.

    However, to be truthful, I learned this from the guy I call "the master"... Jeff Moden. It would be worth your time to read all of his articles that he has written (and there are many other authors on this site with articles just as excellent as his - just search them out). You can find Jeff's articles here[/url].

    Edit: But seriously... 6 minutes? It took that long? That must be one large table you're dealing with there!

    Edit2: I would expect only a modest increase in the time to complete when adding the additional 8 columns... I would anticipate no more than 15 seconds total. It would be interesting to see how it ends up! Please keep us informed!

    Heh... no... not a master. If I were, I'd have been the one to figure out the on-the-fly safety check. :blush: Gotta give Paul White the credit for figuring it out and Tom Thompson the credit for making it a bit more bullet proof. I believe I've figure out how to add a similar safety check to 2k so maybe I'm a brown belt. 😛

    The reason why the update is taking so long is because the tipping point (every computer has one) has probably probably been reached. It would probably only take 3 seconds per million rows if the UPDATE were divided into logical sections of 2 million rows each.

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

    Change is inevitable... Change for the better is not.


    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)

  • Hi guys,

    today i added 8 extra columns (in total there 14 columns now which are calculated) with wayne's routine. The test i did today was on a table with 21.000.000 records. it took a couple of minutes more (in total 11 minutes) which is for me fantastic.

    Regards,

    Dick Herberts

  • Thanks for the feedback Dick. Glad you're pleased with it!

    Wayne
    Microsoft Certified Master: SQL Server 2008
    Author - SQL Server T-SQL Recipes


    If you can't explain to another person how the code that you're copying from the internet works, then DON'T USE IT on a production system! After all, you will be the one supporting it!
    Links:
    For better assistance in answering your questions
    Performance Problems
    Common date/time routines
    Understanding and Using APPLY Part 1 & Part 2

Viewing 15 posts - 16 through 29 (of 29 total)

You must be logged in to reply to this topic. Login to reply