Loop to update table

  • Hello

    Let's say I have two variable or temp tables. Table1 contains a column of datetime and Table2 has two columns both are datetime. Column 1 in both tables match each other but I want to loop through and update Table2 column2 with the previous datetime from column1.

    So I would want something like this

    Table 1 Column 1 Table2 Column1 Column 2

    3/12/2009 07:00 3/12/2009 07:00

    3/12/2009 07:30 3/12/2009 07:30 3/12/2009 07:00

    3/12/2009 08:00 3/12/2009 08:00 3/12/2009 07:30

    3/12/2009 08:30 3/12/2009 08:30 3/12/2009 08:00

    3/12/2009 09:00 3/12/2009 09:00 3/12/2009 08:30

  • There are a number of solutions for something like this. Which one is best depends a lot on what the data means and what it's for.

    For example, if it's non-sequential in nature, you might need to update the second column in table 2 if a new row is entered with a date earlier than the current entry. In that case, it might be better to not store it at all, and just include the calculation in a query.

    Can you give any data on what the dates are and how they are generated and used?

    - Gus "GSquared", RSVP, OODA, MAP, NMVP, FAQ, SAT, SQL, DNA, RNA, UOI, IOU, AM, PM, AD, BC, BCE, USA, UN, CF, ROFL, LOL, ETC
    Property of The Thread

    "Nobody knows the age of the human race, but everyone agrees it's old enough to know better." - Anon

  • GSquared (3/12/2009)


    There are a number of solutions for something like this. Which one is best depends a lot on what the data means and what it's for.

    For example, if it's non-sequential in nature, you might need to update the second column in table 2 if a new row is entered with a date earlier than the current entry. In that case, it might be better to not store it at all, and just include the calculation in a query.

    Can you give any data on what the dates are and how they are generated and used?

    It's sequential. Let's say you have piece of manufacturing equipment that makes something and the datetime in column 1 is when it was made. I want to get the difference between the dates of the new product and the revious product. To get how long it took to make that product. Let me know if this doesn't make sense.

    Thanks for your help.

  • Here is some test code for you to work with:

    create table #Temp (

    Column1 datetime,

    Column2 datetime null

    );

    insert into #Temp (Column1)

    select '2009-03-12 07:00' union all

    select '2009-03-12 07:30' union all

    select '2009-03-12 08:00' union all

    select '2009-03-12 08:30' union all

    select '2009-03-12 09:00';

    select

    *

    from

    #Temp;

    with TempUpdate (

    RowNum,

    Column1

    ) as (

    select

    row_number() over (order by Column1),

    Column1

    from

    #Temp

    )

    update #Temp set

    Column2 = tu2.Column1

    from

    TempUpdate tu1

    left outer join TempUpdate tu2

    on (tu1.RowNum = tu2.RowNum + 1)

    inner join #Temp t

    on (t.Column1 = tu1.Column1);

    select

    *

    from

    #Temp;

    drop table #Temp;

  • Thank you Lynn. That works great.

  • Just tried the following for academic reasons -

    With the sample dataset, the execution plan seemed far simpler - you may want to give this a try too.

    update t set

    Column2 = (select max(t2.Column1) from #temp t2 where t2.Column1 < t.Column1)

    from #temp t

    HTH.

  • Manju (3/13/2009)


    Just tried the following for academic reasons -

    With the sample dataset, the execution plan seemed far simpler - you may want to give this a try too.

    update t set

    Column2 = (select max(t2.Column1) from #temp t2 where t2.Column1 < t.Column1)

    from #temp t

    HTH.

    Here is why your solution isn't really a good one. It involves a triangular join to determine what value to store in Column2. For each row N in #Temp it has to query N - 1 rows to determine what value to select and store. I am currently running your code against a 1,000,000 row table and as I write this it has already run 15 minutes on a Dual Processor Quad Core 2.80 GHz x64 Intel Xeon system with 8 GB RAM. Not a very scalable solution. I can actually show you another way to accomplish this same update that is even faster than mine, but unless i kill your query (which I don't want to as I want to capture the statistics io and statistics time for comparision against my code).

    Your solution is not scalable to large numbers of records.

    Stay tuned. Once I have the results of both runs, I'll post them here.

    And here is the code I am currently running:

    create table #Temp (

    Column1 datetime,

    Column2 datetime null

    );

    insert into #Temp (Column1)

    select dateadd(mi, (N - 1) * 30, '2009-03-12')

    from Tally

    ;

    select

    *

    from

    #Temp;

    set statistics time on;

    set statistics io on;

    with TempUpdate (

    RowNum,

    Column1

    ) as (

    select

    row_number() over (order by Column1),

    Column1

    from

    #Temp

    )

    update #Temp set

    Column2 = tu2.Column1

    from

    TempUpdate tu1

    left outer join TempUpdate tu2

    on (tu1.RowNum = tu2.RowNum + 1)

    inner join #Temp t

    on (t.Column1 = tu1.Column1);

    set statistics io off;

    set statistics time off;

    truncate table #Temp;

    insert into #Temp (Column1)

    select dateadd(mi, (N - 1) * 30, '2009-03-12')

    from Tally

    ;

    set statistics time on;

    set statistics io on;

    update t set

    Column2 = (select max(t2.Column1) from #Temp t2 where t2.Column1 < t.Column1)

    from #Temp t

    set statistics io off;

    set statistics time off;

    select

    *

    from

    #Temp;

    update t set

    Column2 = (select max(t2.Column1) from #temp t2 where t2.Column1 < t.Column1)

    from #temp t

    drop table #Temp;

    You can find more info on SSC regarding the Tally table. Mine on this system has the values 1 to 1,000,000.

  • I completely agree with your assesment of my query. I thought I had made that some what clear myself when I said "with the sample dataset"; but I should've been more explicit.

    Having said that, (I know I am not helping my case here), but can you drop an index on Column1 and test the queries?

    I am very curious to see the results.

  • Manju (3/13/2009)


    I completely agree with your assesment of my query. I thought I had made that some what clear myself when I said "with the sample dataset"; but I should've been more explicit.

    Having said that, (I know I am not helping my case here), but can you drop an index on Column1 and test the queries?

    I am very curious to see the results.

    Would you like me to stop my current test? It has been running for over 45 minutes already. While waiting, I did rerun my version along with the version i alluded to earlier. Here is the code and results:

    set nocount on;

    create table #Temp (

    Column1 datetime,

    Column2 datetime null

    );

    create clustered index IX_Column1 on #Temp (

    Column1 asc

    );

    insert into #Temp (Column1)

    select dateadd(mi, (N - 1) * 30, '2009-03-12')

    from Tally

    ;

    set statistics time on;

    set statistics io on;

    with TempUpdate (

    RowNum,

    Column1

    ) as (

    select

    row_number() over (order by Column1),

    Column1

    from

    #Temp

    )

    update #Temp set

    Column2 = tu2.Column1

    from

    TempUpdate tu1

    left outer join TempUpdate tu2

    on (tu1.RowNum = tu2.RowNum + 1)

    inner join #Temp t

    on (t.Column1 = tu1.Column1);

    set statistics io off;

    set statistics time off;

    truncate table #Temp;

    insert into #Temp (Column1)

    select dateadd(mi, (N - 1) * 30, '2009-03-12')

    from Tally

    ;

    declare @Column1 datetime,

    @Column2 datetime;

    set statistics time on;

    set statistics io on;

    update t set

    @Column2 = @Column1,

    Column2 = @Column2,

    @Column1 = Column1

    from #Temp t with (index=1, TABLOCKX);

    set statistics io off;

    set statistics time off;

    select

    *

    from

    #Temp;

    drop table #Temp;

    set nocount off;

    My code (CPU time > elapsed time due to parallelism):

    SQL Server Execution Times:

    CPU time = 0 ms, elapsed time = 0 ms.

    Table '#Temp_______________________________________________________________________________________________________________000000000AD7'. Scan count 11, logical reads 3009404, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.

    Table 'Worktable'. Scan count 8, logical reads 2197376, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.

    SQL Server Execution Times:

    CPU time = 46327 ms, elapsed time = 22131 ms.

    SQL Server Execution Times:

    CPU time = 0 ms, elapsed time = 0 ms.

    "quirky update" code:

    SQL Server Execution Times:

    CPU time = 0 ms, elapsed time = 0 ms.

    Table '#Temp_______________________________________________________________________________________________________________000000000AD7'. Scan count 1, logical reads 3114, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.

    SQL Server Execution Times:

    CPU time = 3016 ms, elapsed time = 3084 ms.

    SQL Server Execution Times:

    CPU time = 0 ms, elapsed time = 0 ms.

    And yes, these ran while your code was still active.

  • Now, you may be askiing "Why?" Check the second article (Running Totals and its variations) I have linked below in my signature block. If it has been republished by Jeff Moden, it will answer all your questions. If it is still unavailable, perhaps Jeff will drop by this thread and provide an excellent desription of the how and why. The first thing I will say, however, is that this only works with the UPDATE statement and when you are using the clustered index. It basically comes down to an ordered clustered index update.

  • After stopping your code (still hadn't finished after 55 minutes), I reran the code I showed above and here are the results.

    My code:

    SQL Server Execution Times:

    CPU time = 0 ms, elapsed time = 0 ms.

    Table '#Temp_______________________________________________________________________________________________________________000000000EDE'. Scan count 11, logical reads 3009404, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.

    Table 'Worktable'. Scan count 8, logical reads 2197376, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.

    SQL Server Execution Times:

    CPU time = 46454 ms, elapsed time = 19634 ms.

    SQL Server Execution Times:

    CPU time = 0 ms, elapsed time = 0 ms.

    "quirky update" code:

    SQL Server Execution Times:

    CPU time = 0 ms, elapsed time = 0 ms.

    Table '#Temp_______________________________________________________________________________________________________________000000000EDE'. Scan count 1, logical reads 3114, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.

    SQL Server Execution Times:

    CPU time = 2891 ms, elapsed time = 2971 ms.

    SQL Server Execution Times:

    CPU time = 0 ms, elapsed time = 0 ms.

  • If you have not done it already, I think you should kill my query without the index; we have established that it is not the best way to do things with a larger dataset; maybe simpler/straighforward code, but not the best.

    If you end up re-running my update with the Index created, can you please post the statistics? (I am only curious to see how badly it does).

    Thanks in advance.

  • I killed it back before my last post. I'm not sure I will try it against a 1,000,000 row table again. I think it will take too long to run. Might consider it against a slightly smaller table though.

  • Fair enough; perhaps with 500,000?

    BTW, I think I can visualize why the quircky update is the best - although I would like to dig deeper into this one...

    Thanks for measuring the statistics.

Viewing 14 posts - 1 through 13 (of 13 total)

You must be logged in to reply to this topic. Login to reply