SQL not working as expected - Deleting duplicates from dataset

  • Hello -

    I cannot figure out what is wrong with the following code. Can anyone see what I am missing? The DELETE statement is not working as I would expect. I have included the pertinent code below:

    DECLARE @MMOrders TABLE (

    order_no char(8),

    oe_po_no char(25),

    sf_order char(8),

    item_no char(15))

    -- Data inserted here

    -- The following SQL statement returns 1711 rows

    SELECT COUNT(*)

    FROM @MMOrders

    -- The following SQL statement returns 8 rows (all groups have a count of 2)

    SELECT oe_po_no,

    item_no

    FROM @MMOrders

    GROUP BY oe_po_no, item_no

    HAVING COUNT(*) > 1

    -- The following is supposed to remove the 8 rows (8 * 2 = 16 total) from the 1711 rows

    DELETE FROM @MMOrders

    WHERE EXISTS

    (SELECT m.oe_po_no,

    m.item_no

    FROM @MMOrders m

    WHERE m.oe_po_no = oe_po_no

    AND m.item_no = item_no

    GROUP BY m.oe_po_no, m.item_no

    HAVING COUNT(*) > 1)

    -- Yet the following returns 0 rows after the delete statement runs.

    -- What is wrong with my delete statement?

    SELECT COUNT(*)

    FROM @MMOrders

  • Not sure what is wrong, but since you are using SQL Server 2008 (based on the forum you posted in) try this:

    DECLARE @MMOrders TABLE (

    order_no char(8),

    oe_po_no char(25),

    sf_order char(8),

    item_no char(15))

    -- Data inserted here

    -- The following SQL statement returns 1711 rows

    SELECT COUNT(*)

    FROM @MMOrders

    -- The following SQL statement returns 8 rows (all groups have a count of 2)

    SELECT oe_po_no,

    item_no

    FROM @MMOrders

    GROUP BY oe_po_no, item_no

    HAVING COUNT(*) > 1;

    -- The following is supposed to remove the 8 rows (8 * 2 = 16 total) from the 1711 rows

    WITH BaseData AS (

    SELECT

    ROW_NUMBER() OVER (PARTITION BY m.oe_po_no, m.item_no ORDER BY order_no) RowNum,

    order_no,

    oe_po_no,

    sf_order,

    item_no

    FROM

    @MMOrders

    )

    DELETE FROM BaseData

    WHERE

    RowNum > 1;

    -- Yet the following returns 0 rows after the delete statement runs.

    -- What is wrong with my delete statement?

    SELECT COUNT(*)

    FROM @MMOrders

  • The references to oe_po_no and item_no do not have an alias to the outer table, so they are bound to the columns of the table inside the EXISTS clause (alias 'm'). Be very careful using this syntax - it is very easy to go wrong as you have shown. The following script illustrates the problem on a simpler version of your table, and shows two ways to write the DELETE correctly. I prefer the IN syntax:

    USE Sandpit

    GO

    DECLARE @MMOrders AS TABLE

    (

    order_no integer NULL

    );

    INSERT @MMOrders

    (order_no)

    VALUES

    (1), (1), (2), (3);

    -- Shows order #1

    SELECT

    mo.order_no

    FROM @MMOrders AS mo

    GROUP BY

    mo.order_no

    HAVING COUNT_BIG(*) > 1;

    -- Deletes all rows!

    DELETE FROM @MMOrders

    WHERE EXISTS

    (

    SELECT

    mo.order_no

    FROM @MMOrders AS mo

    WHERE

    -- order_no reference is not qualified

    -- mo.order_no and order_no both refer to alias 'mo'

    mo.order_no = order_no

    GROUP BY

    mo.order_no

    HAVING COUNT_BIG(*) > 1

    );

    -- Works correctly

    DELETE mo2

    FROM @MMOrders AS mo2

    WHERE EXISTS

    (

    SELECT

    mo.order_no

    FROM @MMOrders AS mo

    WHERE

    -- order_no references are qualified correctly

    mo.order_no = mo2.order_no

    GROUP BY

    mo.order_no

    HAVING COUNT_BIG(*) > 1

    );

    -- Also correct (and preferred)

    DELETE @MMOrders

    WHERE order_no IN

    (

    SELECT

    mo.order_no

    FROM @MMOrders AS mo

    GROUP BY

    mo.order_no

    HAVING COUNT_BIG(*) > 1

    );

  • Lynn -

    Thank you for the response. However, that didn't work.

    First I get an error: Msg 319. I am not sure why on that.

    But, when I went to look it up I see that what it would do is remove only one of the duplicate records. Actually, I need to remove both of them.

    Basically, I have two records in one database with the same values in two fields that are used to create two records in another database. The unique identifier from the first database is not used in the second database to differentiate the records. So, I cannot associate the records from one databse to antoher one for one. Therefore, I need to exclude both records from my query. We are working on a solution to this, but in the meantime I want to produce my reports with a little bit of information missing on those few records.

    If you have any other suggestions, they would be greatly appreciated.

    Thank you.

    Tammy

  • SQL Kiwi -

    Thank you. That resolved my question.

    I had tried aliases, but I had tried it wrong. I am fairly new to SQL Server. I have been working with Oracle for about 10 years. So, I have to get used to the new syntax.

    Thank you.

    Tammy

Viewing 5 posts - 1 through 4 (of 4 total)

You must be logged in to reply to this topic. Login to reply