Joining to tables with bad data

  • I have a table that holds product prices. It was originally designed to hold historical price values of products but this was abandoned after a few years. So now NEW products have only one entry in this table but some older products can have multiple entries. the ones with multiple entries can have a mixture of nulls or dates set in either or both or none of the StartDate and EndDate its totally random. only thing i can seem to rely on is the LastUpdated value.

    Hows best to join to this table to get the LATEST product price for old and new products. At some point ill be going through a data cleansing exercise but this could be some time away.

    ProductPrice table

    --------------------

    [/URL]

  • ps_vbdev (3/5/2015)


    I have a table that holds product prices. It was originally designed to hold historical price values of products but this was abandoned after a few years. So now NEW products have only one entry in this table but some older products can have multiple entries. the ones with multiple entries can have a mixture of nulls or dates set in either or both or none of the StartDate and EndDate its totally random. only thing i can seem to rely on is the LastUpdated value.

    Hows best to join to this table to get the LATEST product price for old and new products. At some point ill be going through a data cleansing exercise but this could be some time away.

    ProductPrice table

    --------------------

    [/URL]

    I would argue that cleansing the data now makes this exercise pointless. If you are going to get rid of all the garbage data in the future it would be much simpler to do that now instead of writing code against this that is going to change anyway. If you are deadset and not fixing the problem and coding to deal with the symptom you might be able to leverage ROW_NUMBER. Without more details I can't offer much in the way of code.

    _______________________________________________________________

    Need help? Help us help you.

    Read the article at http://www.sqlservercentral.com/articles/Best+Practices/61537/ for best practices on asking questions.

    Need to split a string? Try Jeff Modens splitter http://www.sqlservercentral.com/articles/Tally+Table/72993/.

    Cross Tabs and Pivots, Part 1 – Converting Rows to Columns - http://www.sqlservercentral.com/articles/T-SQL/63681/
    Cross Tabs and Pivots, Part 2 - Dynamic Cross Tabs - http://www.sqlservercentral.com/articles/Crosstab/65048/
    Understanding and Using APPLY (Part 1) - http://www.sqlservercentral.com/articles/APPLY/69953/
    Understanding and Using APPLY (Part 2) - http://www.sqlservercentral.com/articles/APPLY/69954/

  • Thanks

    yes totally agree with you, but i inherited this issue and the time scales. we are a seasonal business so data fix has to wait until our low season, frustrating but it what ive to work with. i was just going to do a TOP 1 * order by Lastupdated desc but just thought there may be a better way.

  • ps_vbdev (3/5/2015)


    Thanks

    yes totally agree with you, but i inherited this issue and the time scales. we are a seasonal business so data fix has to wait until our low season, frustrating but it what ive to work with. i was just going to do a TOP 1 * order by Lastupdated desc but just thought there may be a better way.

    Either that or use ROW_NUMBER is about all you can do. I understand what you mean about slow seasons but in this case you have spent as much or more time working around the symptom as it would take to fix the problem. It also is going to make your queries slower. Hope you get it sorted out and hope you get the chance to clean up the mess.

    _______________________________________________________________

    Need help? Help us help you.

    Read the article at http://www.sqlservercentral.com/articles/Best+Practices/61537/ for best practices on asking questions.

    Need to split a string? Try Jeff Modens splitter http://www.sqlservercentral.com/articles/Tally+Table/72993/.

    Cross Tabs and Pivots, Part 1 – Converting Rows to Columns - http://www.sqlservercentral.com/articles/T-SQL/63681/
    Cross Tabs and Pivots, Part 2 - Dynamic Cross Tabs - http://www.sqlservercentral.com/articles/Crosstab/65048/
    Understanding and Using APPLY (Part 1) - http://www.sqlservercentral.com/articles/APPLY/69953/
    Understanding and Using APPLY (Part 2) - http://www.sqlservercentral.com/articles/APPLY/69954/

  • If you know that there will be no more dups added then a quick way to temporarily fix this is to create a single column called [old_data] or something. Write a query to identify the old dups and then any query can using where [old_data] is null will filter out this data. Means you are not removing the data, just tagging it which is less risk.

  • ps_vbdev (3/5/2015)


    I have a table that holds product prices. It was originally designed to hold historical price values of products but this was abandoned after a few years. So now NEW products have only one entry in this table but some older products can have multiple entries. the ones with multiple entries can have a mixture of nulls or dates set in either or both or none of the StartDate and EndDate its totally random. only thing i can seem to rely on is the LastUpdated value.

    Hows best to join to this table to get the LATEST product price for old and new products. At some point ill be going through a data cleansing exercise but this could be some time away.

    ProductPrice table

    --------------------

    [/URL]

    Not knowing which column uniquely identifies each product, try this. (Be advised you may have to change up the subquery if Unid is a wrong guess on my part).

    SELECT pp1.*

    FROM ProductPrice pp1

    INNER JOIN (SELECT Unid, MAX(LastUpdate) AS LastUpdate

    FROM ProductPrice

    GROUP BY Unid) pp2

    ON pp1.Unid = pp2.Unid

    AND pp1.LastUpdate = pp2.LastUpdate

    Brandie Tarvin, MCITP Database AdministratorLiveJournal Blog: http://brandietarvin.livejournal.com/[/url]On LinkedIn!, Google+, and Twitter.Freelance Writer: ShadowrunLatchkeys: Nevermore, Latchkeys: The Bootleg War, and Latchkeys: Roscoes in the Night are now available on Nook and Kindle.

Viewing 6 posts - 1 through 5 (of 5 total)

You must be logged in to reply to this topic. Login to reply