UPDATE Duplicates Except for One Record in a Table

  • Hi All,
    I need to set a field in a product table for a bunch of products. We have some duplicated products based on SKU and I would like to set the duplicates to Deleted = 1, but keep one record obviously for that product to display on the website. We also have a condition based on another field where this field is LIKE 'stainless%'. So, I only want to update those records.

    Could someone help please?!

    Thank you in advance!

    ~D

  • Here is a sample of how to DELETE or update all but the first row

    -- delete all duplicates but first in each set
    WITH cte AS
    (
    SELECT a.*,
    ROW_NUMBER() OVER(PARTITION BY col1, col2 ORDER BY col1, col2) dupenbr
    FROM dbo.mytable a
    )
    DELETE FROM cte
    WHERE dupenbr > 1
    ;

  • Bill Talada - Tuesday, May 30, 2017 1:21 PM

    Here is a sample of how to DELETE or update all but the first row

    -- delete all duplicates but first in each set
    WITH cte AS
    (
    SELECT a.*,
    ROW_NUMBER() OVER(PARTITION BY col1, col2 ORDER BY col1, col2) dupenbr
    FROM dbo.mytable a
    )
    DELETE FROM cte
    WHERE dupenbr > 1
    ;

    Thank you, Bill!

    What if I wanted to group by SKU and keep the latest created one based on a CreatedOn field and mark the other ones as deleted?

  • donato1026 - Tuesday, May 30, 2017 2:00 PM

    Bill Talada - Tuesday, May 30, 2017 1:21 PM

    Here is a sample of how to DELETE or update all but the first row

    -- delete all duplicates but first in each set
    WITH cte AS
    (
    SELECT a.*,
    ROW_NUMBER() OVER(PARTITION BY col1, col2 ORDER BY col1, col2) dupenbr
    FROM dbo.mytable a
    )
    DELETE FROM cte
    WHERE dupenbr > 1
    ;

    Thank you, Bill!

    What if I wanted to group by SKU and keep the latest created one based on a CreatedOn field and mark the other ones as deleted?

    First, the columns used in PARTITION BY should not be the same as the columns used in your ORDER BY.  Bill's pseudo-code makes it appear that you use the same columns for both.

    You would use the SKU as the PARTITION BY column and the CreatedOn field as the ORDER BY column (probably with a DESC order).

    ALWAYS, ALWAYS, ALWAYS do a SELECT before performing a DELETE to ensure that you are selecting the correct records.

    Drew

    J. Drew Allen
    Business Intelligence Analyst
    Philadelphia, PA

  • donato1026 - Tuesday, May 30, 2017 2:00 PM

    Bill Talada - Tuesday, May 30, 2017 1:21 PM

    Here is a sample of how to DELETE or update all but the first row

    -- delete all duplicates but first in each set
    WITH cte AS
    (
    SELECT a.*,
    ROW_NUMBER() OVER(PARTITION BY col1, col2 ORDER BY col1, col2) dupenbr
    FROM dbo.mytable a
    )
    DELETE FROM cte
    WHERE dupenbr > 1
    ;

    Thank you, Bill!

    What if I wanted to group by SKU and keep the latest created one based on a CreatedOn field and mark the other ones as deleted?

    Here's the approach with guesses for tables and column names.  Substitute your table and column names as needed.

    WITH cte AS (
      SELECT SKU, CreatedOn, Active, RN = ROW_NUMBER() OVER(PARTITION BY SKU ORDER BY CreatedOn DESC)
        FROM dbo.Parts
    )
    UPDATE cte
      SET Active = 0
      WHERE RN > 1;

  • drew.allen - Tuesday, May 30, 2017 2:51 PM

    ALWAYS, ALWAYS, ALWAYS do a SELECT before performing a DELETE to ensure that you are selecting the correct records.

    Good point, Drew.  +1,000,000 to this.

    Donato, just replace the UPDATE with SELECT FROM.  Do this before updating updating your data.

    WITH cte AS (
    SELECT SKU, CreatedOn, Active, RN = ROW_NUMBER() OVER(PARTITION BY SKU ORDER BY CreatedOn DESC)
      FROM dbo.Parts
    )
    SELECT *
      FROM cte
      WHERE RN > 1;

  • Okay. And this will keep the most recent record? I want to keep the most recent record and set the others as Published = 0

  • Ed Wagner - Tuesday, May 30, 2017 2:58 PM

    donato1026 - Tuesday, May 30, 2017 2:00 PM

    Bill Talada - Tuesday, May 30, 2017 1:21 PM

    Here is a sample of how to DELETE or update all but the first row

    -- delete all duplicates but first in each set
    WITH cte AS
    (
    SELECT a.*,
    ROW_NUMBER() OVER(PARTITION BY col1, col2 ORDER BY col1, col2) dupenbr
    FROM dbo.mytable a
    )
    DELETE FROM cte
    WHERE dupenbr > 1
    ;

    Thank you, Bill!

    What if I wanted to group by SKU and keep the latest created one based on a CreatedOn field and mark the other ones as deleted?

    Here's the approach with guesses for tables and column names.  Substitute your table and column names as needed.

    WITH cte AS (
      SELECT SKU, CreatedOn, Active, RN = ROW_NUMBER() OVER(PARTITION BY SKU ORDER BY CreatedOn DESC)
        FROM dbo.Parts
    )
    UPDATE cte
      SET Active = 0
      WHERE RN > 1;

    Okay. And this will keep the most recent record? I want to keep the most recent record and set the others as Published = 0

  • donato1026 - Tuesday, May 30, 2017 6:27 PM

    Ed Wagner - Tuesday, May 30, 2017 2:58 PM

    donato1026 - Tuesday, May 30, 2017 2:00 PM

    Bill Talada - Tuesday, May 30, 2017 1:21 PM

    Here is a sample of how to DELETE or update all but the first row

    -- delete all duplicates but first in each set
    WITH cte AS
    (
    SELECT a.*,
    ROW_NUMBER() OVER(PARTITION BY col1, col2 ORDER BY col1, col2) dupenbr
    FROM dbo.mytable a
    )
    DELETE FROM cte
    WHERE dupenbr > 1
    ;

    Thank you, Bill!

    What if I wanted to group by SKU and keep the latest created one based on a CreatedOn field and mark the other ones as deleted?

    Here's the approach with guesses for tables and column names.  Substitute your table and column names as needed.

    WITH cte AS (
      SELECT SKU, CreatedOn, Active, RN = ROW_NUMBER() OVER(PARTITION BY SKU ORDER BY CreatedOn DESC)
        FROM dbo.Parts
    )
    UPDATE cte
      SET Active = 0
      WHERE RN > 1;

    Okay. And this will keep the most recent record? I want to keep the most recent record and set the others as Published = 0

    The most recent as defined by the ORDER BY clause to assign the row numbers.  The most recent is first.

    Please, check for yourself by using a SELECT instead of an UPDATE.  My goal isn't to just tell you what to do, but to help you understand how it works so you can use the technique again when you need it.  You also want to be able to support this on your system, so understanding it is important.

  • donato1026 - Tuesday, May 30, 2017 1:11 PM

    Hi All,
    I need to set a field in a product table for a bunch of products. We have some duplicated products based on SKU and I would like to set the duplicates to Deleted = 1, but keep one record obviously for that product to display on the website. We also have a condition based on another field where this field is LIKE 'stainless%'. So, I only want to update those records.

    Could someone help please?!

    Thank you in advance!

    ~D

    The request seems odd.

    does your system allow 2 or more different products with the same SKU to be available at the same time?
    As far as I know, the whole idea behind SKU is to have acode which identifies a product.
    In a warehousing it's not allowed to have 2 or more different products with the same SKU.

    A stock keeping unit (SKU) is a product and service identification code for a store or product

    _____________
    Code for TallyGenerator

  • Sergiy - Thursday, June 1, 2017 11:22 PM

    donato1026 - Tuesday, May 30, 2017 1:11 PM

    Hi All,
    I need to set a field in a product table for a bunch of products. We have some duplicated products based on SKU and I would like to set the duplicates to Deleted = 1, but keep one record obviously for that product to display on the website. We also have a condition based on another field where this field is LIKE 'stainless%'. So, I only want to update those records.

    Could someone help please?!

    Thank you in advance!

    ~D

    The request seems odd.

    does your system allow 2 or more different products with the same SKU to be available at the same time?
    As far as I know, the whole idea behind SKU is to have acode which identifies a product.
    In a warehousing it's not allowed to have 2 or more different products with the same SKU.

    A stock keeping unit (SKU) is a product and service identification code for a store or product

    If they're exercising something like a Type 2 SCD and something like the price has changed or a batch number has been fulfilled and expired, then I can see a point to this request.

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

    Change is inevitable... Change for the better is not.


    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)

  • Type 2 SCD would involve some kind of time stamp indicating the the date of change.

    So, marking "expired" product versions as deleted would not be a problem at all.

    It seems like they've got product description and stock availability merged together in a single table (for convenience, of course), and now they are discovering the dark side of denormalisation.

    _____________
    Code for TallyGenerator

  • donato1026 - Tuesday, May 30, 2017 3:14 PM

    Okay. And this will keep the most recent record? I want to keep the most recent record and set the others as Published = 0

    It seriously depends.  Do you actually have a temporal column in any of this that identifies what the lasted rendition of the SKY is?  I might have missed it but, so far, I've not seen a lick of information about such a thing from you.

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

    Change is inevitable... Change for the better is not.


    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)

Viewing 13 posts - 1 through 12 (of 12 total)

You must be logged in to reply to this topic. Login to reply