How to avoid duplicating rows in query results when joining 2 tables

  • Thank you for providing tables and data - this takes out a lot of the guesswork.

    Here is a relatively new and efficient way to solve this kind of problem.

    To your query I added

    selectitm.Item, itm.OrderNo ,itm.OrderLine... as rownum

    that adds a row, called rownum, that contains a sequence that re-starts with 1 whenever the Item, OrderNo and InvoiceNo grouping changes.

    Then I wrapped it in an outer query that selects only rows that have rownum = 1.

    Put all that together and here's the query that solves your problem:

    select Item, OrderNo, OrderLine, TxDate, QtyChg, InvoiceNo, InvoiceDate, InvoiceQty, rownum

    from (

    selectitm.Item, itm.OrderNo ,itm.OrderLine

    ,CONVERT(varchar, itm.TxDate, 101) as TxDate

    ,itm.QtyChg, inv.InvoiceNo

    ,CONVERT(varchar, inv.InvoiceDate, 101) as InvoiceDate

    ,inv.InvoiceQty

    ,ROW_NUMBER() OVER (PARTITION BY itm.Item, itm.OrderNo, inv.InvoiceNo

    ORDER BY itm.Item, itm.OrderNo, inv.InvoiceNo ) as rownum

    from dbo.ItemHistory itm join dbo.InvoiceHistory inv on

    (inv.Item = itm.Item

    and inv.OrderNo = itm.OrderNo

    and inv.OrderLine = itm.OrderLine

    and inv.InvoiceDate = itm.TxDate

    and inv.InvoiceQty = itm.QtyChg) ) X

    WHERE rownum = 1

    This solution uses windows functions (because the ROW_NUMBER, OVER, and PARTITION BY are applied the the result set, or "window", after the data has been retrieved).

    This and other ways of solving this problem can be found here:

    http://www.simple-talk.com/content/print.aspx?article=646

    The solution used here is near the bottom of the article in the section titled, "New Techniques for Removing Duplicate Rows in SQL Server 2005".

    Google "sql server windows functions" for more info on this solution.

    - victor di leo

  • Oops.... where I wrote

    To your query I added

    select itm.Item, itm.OrderNo ,itm.OrderLine... as rownum

    I actually meant to write

    To your query I added

    ,ROW_NUMBER() OVER (PARTITION BY ... as rownum

  • Victor,

    I believe this will absolutely resolve my issue. I will test it out shortly.

    I have been working with sql for a few months now and have learned one thing so far. There is ALWAYS more to learn.

    Thank you very much for the assistance.

    Brian Miller

  • If you omit the rownum column from the result, the query plan can use the ANY aggregate:

    SELECT

    X.Item,

    X.OrderNo,

    X.OrderLine,

    X.TxDate,

    X.QtyChg,

    X.InvoiceNo,

    X.InvoiceDate,

    X.InvoiceQty

    FROM

    (

    SELECT

    itm.Item,

    itm.OrderNo,

    itm.OrderLine,

    TxDate = CONVERT(char(10), itm.TxDate, 101),

    itm.QtyChg,

    inv.InvoiceNo,

    InvoiceDate = CONVERT(char(10), inv.InvoiceDate, 101),

    inv.InvoiceQty,

    rownum =

    ROW_NUMBER() OVER (

    PARTITION BY itm.Item, itm.OrderNo, inv.InvoiceNo

    ORDER BY itm.Item, itm.OrderNo, inv.InvoiceNo)

    FROM dbo.ItemHistory itm

    JOIN dbo.InvoiceHistory inv ON

    inv.Item = itm.Item

    AND inv.OrderNo = itm.OrderNo

    AND inv.OrderLine = itm.OrderLine

    AND inv.InvoiceDate = itm.TxDate

    AND inv.InvoiceQty = itm.QtyChg

    ) AS X

    WHERE

    X.rownum = 1;

    You could also just add a DISTINCT to the original query. Both of these approaches are imperfect though; the real solution is to fix the schema. Using a properly relational design will help avoid these issues arising in the first place.

Viewing 4 posts - 1 through 5 (of 5 total)

You must be logged in to reply to this topic. Login to reply