Update Records Based on Two Concatenated PK Columns

  • I have a query tuning question. I have a table with some 1.7 mil records and the 2 columns (SalesOrder = col1, SalesOrderType = col2) being searched on are set as Primary Keys. We are doing updates at month end using the concatenation of these 2 columns to identify the records that need to be updated and the query is very inefficient. The essence of the query is:

    UPDATE table1,

    SET col3 = 'value',

    col4 = 'value'

    WHERE col1 + col2

    IN ('col1col2', 'col1col2', . . . 'n') <usually a random assortment of some 60 Sales Orders out of 1.7 million records in this list input via an in-house Cold Fusion app.>

    The question is, is there a more efficient way to search on these 2 columns than to concat them together?

  • Ken Sturgeon (11/30/2010)


    I have a query tuning question. I have a table with some 1.7 mil records and the 2 columns (SalesOrder = col1, SalesOrderType = col2) being searched on are set as Primary Keys. We are doing updates at month end using the concatenation of these 2 columns to identify the records that need to be updated and the query is very inefficient. The essence of the query is:

    UPDATE table1,

    SET col3 = 'value',

    col4 = 'value'

    WHERE col1 + col2

    IN ('col1col2', 'col1col2', . . . 'n') <usually a random assortment of some 60 Sales Orders out of 1.7 million records in this list input via an in-house Cold Fusion app.>

    The question is, is there a more efficient way to search on these 2 columns than to concat them together?

    Hi ken, i had typed out an answer for you and lost the whole thing when my session timed out. Anyway, here we go again.

    I assume, there is a clustered index, as part of the primary key. A clustered index helps if your query is like -

    ---

    -----

    where col1 = 'value1' and col2 = 'value2'

    In your case, your search predicate is based on col1 + col2 and there is no index on it. So, the query would cause a clustered index scan, which is not efficient. Please check the execution plan to confirm this. To speed up the query, you have to add a covering index. You can do it in two ways.

    1. Create a computed column -

    create a computed column coln = col1 + col2.

    create a non-clustered index on it.

    query becomes faster.

    2. If you have constraints that prevent you from adding a computed column, go for this approach. For this to work, the table should have a unique column - UC, preferable integer.

    -----------------------------------------------------------------

    if (object_id('tempdb..#test') is not null) drop table #test

    go

    create table #test(UC int, coln varchar(50))

    go

    insert into #test

    select UC, col1 + col2

    from table1

    create nonclustered index IX_Test_Coln on #test(coln) include (UC)

    -----------------------------------------------------------------

    update t1

    set col3 = 'value', col4 = 'value'

    from

    #test t

    join table1 t1

    on t1.UC = t.UC

    where t.coln in ('col1col2', 'col1col2', . . . 'n')

    This will use the index IX_Test_Coln in #test table. So, your query will work faster.

    Please let me know if this does not work.

    - arjun

    https://sqlroadie.com/

  • Thx for your response! Learned a couple of things from you today and thx for that as well. One of the things I did not mention in my first post was that I am dealing with a non-normalized database and with no unique fields in this table1. Also, I don't have the capability of adding the UC column you recommended. I set the query up basically as you recommended but used a composite of 4 columns to create unique querying criteria in place of the UC and the query plan did not improve much.

    I created:

    CREATE TABLE #tmpOrd (col5 varchar(25), col6 float, salesorder varchar(20))

    go

    INSERT INTO #tmpOrd

    SELECT col5, col6, ltrim(rtrim(str(col1))) + rtrim(col2) as salesorder

    FROM table1

    CREATE NONCLUSTERED INDEX IX_tmpOrd on #tmpOrd(salesorder) include (col5, col6)

    UPDATE table1

    SET col5 = '40401A ',

    col6 = 110301

    FROM #tmpOrd t

    JOIN table1 t1

    ON ltrim(rtrim(str(t1.col1))) + rtrim(t1.col2) = t.salesorder

    WHERE t.salesorder

    IN ('value1', 'value2', . . . 'n')

    Should setting up the 4-column composite work the same as using your UC column?

  • Hi ken, i would need more info to give you a good explanation. I am replying from my phone now, so will keep it short.

    It's not surprising that you didn't get an improvement when you used 4 columns to uniquely identify the rows. More columns in the index would slow down queries as the number of levels of the b tree increases.

    1. Can't you use computed column approach?

    2. How about creating a identity column and using that as the unique column?

    3. In the query you have posted, you do the join on col1 + col2, which does not use the index and is not correct.

    4. If you cannot add a computed column and don't have a unique column, you may not be able to improve the query's performance by a sizeable margin as you can't use the benefit of indexes.

    I suggest that you read Gail Shaw's article on indexes in SSC. I don't have the URL right now. Will post that later.

    https://sqlroadie.com/

  • Could I set up the computed column in the temp table? Otherwise, no, I wouldn't be able to create a computed column on the actual table unless it can be done on the fly somehow. We are using a JD Edwards preconfigured prod database and it is being replicated so mgmt is not keen on adding columns or even indexes on the prod copy. I did give you the wrong syntax from what I am using. It should be:

    CREATE TABLE #tmpOrd (col5 varchar(25), col6 float, salesorder varchar(20))

    GO

    INSERT INTO #tmpOrd

    SELECT col5, col6, ltrim(rtrim(str(col1))) + rtrim(col2) as salesorder

    FROM table1

    CREATE NONCLUSTERED INDEX IX_tmpOrd on #tmpOrd(salesorder) include (col5, col6)

    UPDATE table1

    SET col7 = '40401A ',

    col8 = 110301

    FROM #tmpOrd t

    JOIN table1 t1

    ON t1.salesorder = t.salesorder

    WHERE t.salesorder

    IN ('value1', 'value2', . . . 'n')

    You may be correct that the query is doomed due to business constraints but let me approach it from a different angle. I tried the following using a CTE as well and did see some improvement if I could get your input on how to identify the min and max salesorder numbers on the fly. The salesorders are input using the full Sales Order + Sales Order Type concatenated so the input is arbitrary. When I limited the range of SalesOrders, of course, the query ran much faster. The query is:

    SELECT ltrim(rtrim(str(col1))) + rtrim(col2) salesorders

    INTO #tmpOrd

    FROM table1

    WHERE ltrim(rtrim(str(col1))) + rtrim(col2)

    IN ('value','value', . . . 'n')

    AND salesorders between (lower salesorder number) AND (upper salesorder number in the mix entered by the user)

    --This is the point I wanted to try to add maybe a min and max that could accept any arbitrary values.

    --CTE used here to cycle through records needing the update held in #tmpOrd table

    WITH CAccts (col3, col4) AS

    (

    SELECTsh.col3,

    sh.col4

    FROM table1 sh

    JOIN #tmpOrd t

    ON sh.salesorders=t.salesorders

    )

    UPDATE CAccts

    SET sh.col3 = '40404A ',

    sh.col4 = 110301

    GO

    Does this look feasible?

  • What about the obvious answer of changing the query to use the clustered index by removing the IN clause? The IN clause is treated just like a series of OR clauses; but using the contatenated values prevents the index from being used. Why not change the code to build out the query as OR statements instead?

    Like this:

    WHERE (col1 = 'value1' and col2 = 'value2')

    OR (col1 = 'value3' and col2 = 'value4')

    OR (col1 = 'value5' and col2 = 'value6')

    John Rowan

    ======================================================
    ======================================================
    Forum Etiquette: How to post data/code on a forum to get the best help[/url] - by Jeff Moden

  • Ken Sturgeon (11/30/2010)


    I have a query tuning question. I have a table with some 1.7 mil records and the 2 columns (SalesOrder = col1, SalesOrderType = col2) being searched on are set as Primary Keys. We are doing updates at month end using the concatenation of these 2 columns to identify the records that need to be updated and the query is very inefficient. The essence of the query is:

    UPDATE table1,

    SET col3 = 'value',

    col4 = 'value'

    WHERE col1 + col2

    IN ('col1col2', 'col1col2', . . . 'n') <usually a random assortment of some 60 Sales Orders out of 1.7 million records in this list input via an in-house Cold Fusion app.>

    The question is, is there a more efficient way to search on these 2 columns than to concat them together?

    Ken,

    Is it safe to assume that those 60 Sales Orders have the two separate columns also?

    You might want to try:

    UPDATE t1,

    SET col3 = 'value',

    col4 = 'value'

    FROM table1 t1

    JOIN OtherTable t2 -- the random assortment of some 60 Sales Orders

    ON t1.col1 = t2.col1

    AND t1.col2 = t2.col2

    WHERE t2.somecolumn = criteria to select from (if needed)

    Obviously, indexes on col1 and col2 on both tables will help

    Wayne
    Microsoft Certified Master: SQL Server 2008
    Author - SQL Server T-SQL Recipes


    If you can't explain to another person how the code that you're copying from the internet works, then DON'T USE IT on a production system! After all, you will be the one supporting it!
    Links:
    For better assistance in answering your questions
    Performance Problems
    Common date/time routines
    Understanding and Using APPLY Part 1 & Part 2

  • Ken Sturgeon (12/1/2010)


    Could I set up the computed column in the temp table?

    I'm afraid that won't be of much help, as you want to do the update on the parent table and you do not have a simple unique key.

    UPDATE table1

    SET col7 = '40401A ',

    col8 = 110301

    FROM #tmpOrd t

    JOIN table1 t1

    ON t1.salesorder = t.salesorder

    WHERE t.salesorder

    IN ('value1', 'value2', . . . 'n')

    JOIN table1 t1

    ON t1.salesorder = t.salesorder

    This part right here, how do you do this? If I am not wrong, there is no column called salesorder in table1 and you get that value by concatenating col1 and col2. If you are doing that, you are not using indexes again, in spite of adding one. You can try the 'AND - OR' approach suggested by fellow SSCians. That should help a lot.

    --This is the point I wanted to try to add maybe a min and max that could accept any arbitrary values.

    --CTE used here to cycle through records needing the update held in #tmpOrd table

    WITH CAccts (col3, col4) AS

    (

    SELECTsh.col3,

    sh.col4

    FROM table1 sh

    JOIN #tmpOrd t

    ON sh.salesorders=t.salesorders

    )

    UPDATE CAccts

    SET sh.col3 = '40404A ',

    sh.col4 = 110301

    GO

    Does this look feasible?

    Yes, mentioning a range helps. But at this point the best bet is to AND-OR the search criteria.

    Other suggestions:

    1. If table1 is not frequently updated, how about creating a materialized view? If you do not want to modify the table structure, this could be one way. You can add a computed column here; and you can go for an identity column as well.

    2. It will help if you can post the table definition and your current query. People here can work on that. That's always an easier way.

    Check out this article by Gail Shaw. It will help you demystify a lot of things.

    http://www.sqlservercentral.com/articles/Indexing/68439/

    - arjun

    https://sqlroadie.com/

  • Thanks everyone for your feedback. I am looking into your suggestions and will get back to you on my results.

  • Sorry to take so long getting back but had some things come up. Finally decided, along with our app. developer, to go with the suggestion of breaking out the two key columns using the OR statments. I found out after talking with the developer that he could build a loop into his code (wasn't sure what his options were originally) that would separate out the Sales Order number from the Order Type concat. and plug them into the update statement. Problem solved. Thanks for the great assistance you all provided!

  • Cool. Keep coming back to SSC. 🙂

    https://sqlroadie.com/

Viewing 11 posts - 1 through 10 (of 10 total)

You must be logged in to reply to this topic. Login to reply