Removing duplicates from a subset of fields

  • Hi All,

    I'm trying to eliminate duplicate pricing from a price export file we get. I upload it into a staging table and then select new values into the main data table. Both tables have the same schema, the relevant parts are:

    Export Table:

    Transmission_ID = varchar(20)

    Supplier= varchar(40)

    Terminal = varchar(40)

    Branded = varchar(1)

    Product = varchar(40)

    Eff_Date = varchar(8)

    Eff_Time - varchar(5)

    Price = decimal(10,6)

    Move = decimal(10,6)

    The data looks like this:

    '5B6Y9460AH0F','Supplier1','B','Vstl C','ULS #2','20110518','1800','3.225600','0.033000'

    '5B5LJ560AH03','Supplier1','B','Vstl C','ULS 2DY','20110517','1800','3.196600','-0.010000'

    '5B6Y9460AH0F','Supplier1',''Vstl C','ULS 2DY','20110518','1800','3.229600','0.033000'

    '5B5LDQ60AH03','Supplier2','U','Albn M','M89 E10%','20110517','1800','2.972999','-0.042500'

    '5B6K1260AH07','Supplier2','U','Albn M','M89 E10%','20110518','1100','3.003000','0.030000'

    '5B6YGWP0AH05','Supplier2','U','Albn M','M89 E10%','20110518','1800','2.990000','-0.013000'

    '5B6XWXP0AH05','Supplier2','U','Albn M','M89 E10%','20110518','1800','2.980000','-0.023000'

    '5B5LDQ60AH03','Supplier2','U','Albn M','P93 E10%','20110517','1800','3.123000','-0.042500'

    '5B6K1260AH07','Supplier2','U','Albn M','P93 E10%','20110518','1100','3.153000','0.030000'

    '5B6XWXP0AH05','Supplier2','U','Albn M','P93 E10%','20110518','1800','3.129999','-0.023000'

    '5B6YGWP0AH05','Supplier2','U','Albn M','P93 E10%','20110518','1800','3.140000','-0.013000'

    The Primary key for the main table is (Supplier, Terminal, Branded, Product, Eff_Date, Eff_Time)

    You can see in the data that Supplier2 sent a second price change for the same date and time that breaks the primary key. I'm trying to just select the latest price (Max (Transmission_ID) )when this happens. But the final query does not act like I expect. I think it's because Transmission_ID is only unique for a batch of prices that were appended to the file (50 or so at a time).

    This code gives me the correct subset of prices (about 2983 of 3166):

    SELECT MAX(Transmission_ID) AS Transmission_ID

    ,[Supplier]

    ,[Terminal]

    ,[Product]

    ,[Branded]

    ,[Eff_Date]

    ,[Eff_Time]

    FROM [DTN_EXPORT]

    GROUP BY [Supplier]

    ,[Terminal]

    ,[Product]

    ,[Branded]

    ,[Eff_Date]

    ,[Eff_Time]

    Then I try to map in the missing fields.

    In this version I tried selecting into a temp table instead of using a sub query, but the result was the same, it returns 25411 records, because of the "Batch" nature of the Transmission_ID

    SELECT MAX(Transmission_ID) AS Transmission_ID

    ,Supplier

    ,Terminal

    ,Product

    ,Branded

    ,Eff_Date

    ,Eff_Time INTO #EXPORT_TEMP

    FROM dbo.DTN_EXPORT

    GROUP BY Supplier

    ,Terminal

    ,Product

    ,Branded

    ,Eff_Date

    ,Eff_Time

    SELECT E.* From [DTN_EXPORT] E

    Inner Join #EXPORT_TEMP E1

    On E1.Transmission_ID = e.Transmission_ID

    AND E1.Supplier = E.Supplier

    And E1.Terminal = E.Terminal

    And E1.Branded = E.Branded

    And E1.Eff_Date = E.Eff_Date

    And E1.Eff_Time = E.Eff_Time

    If I put a "DISTINCT' in, it removes the duplicates because of the TRANS_ID field but leaves in the duplicate prices from Supplier2 (They have distinct TRANS_ID)

    Not sure how to get the results i'm after.

    Thanks

    Brent

  • brent 82845 (5/19/2011)The Primary key for the main table is (Supplier, Terminal, Branded, Product, Eff_Date, Eff_Time)

    You can see in the data that Supplier2 sent a second price change for the same date and time that breaks the primary key.

    I think your real problem is you are fighting a bad design. It totally makes sense that a supplier might send a second entry for a product because it was either entered incorrectly or somebody changed their mind about the price on xxx date. Also, you really should use datetime fields instead of two separate varchar fields.

    If you can provide some ddl and some sample data (in the form of insert statements) I am certain that many people around here are willing and able to help.

    _______________________________________________________________

    Need help? Help us help you.

    Read the article at http://www.sqlservercentral.com/articles/Best+Practices/61537/ for best practices on asking questions.

    Need to split a string? Try Jeff Modens splitter http://www.sqlservercentral.com/articles/Tally+Table/72993/.

    Cross Tabs and Pivots, Part 1 – Converting Rows to Columns - http://www.sqlservercentral.com/articles/T-SQL/63681/
    Cross Tabs and Pivots, Part 2 - Dynamic Cross Tabs - http://www.sqlservercentral.com/articles/Crosstab/65048/
    Understanding and Using APPLY (Part 1) - http://www.sqlservercentral.com/articles/APPLY/69953/
    Understanding and Using APPLY (Part 2) - http://www.sqlservercentral.com/articles/APPLY/69954/

Viewing 2 posts - 1 through 2 (of 2 total)

You must be logged in to reply to this topic. Login to reply