eliminating duplicates from stored procedure joins

  • Hello guys,

    I need to get rid of duplicates from my query in stored procedure. I hope if somebody can help me out. Here is the stored procedure :-

    CREATE PROCEDURE proc_test99

    @param1 nvarchar(20) = null,

    @param2 nvarchar(20) = null,

    @param3 nvarchar(30) = null,

    @param4 nvarchar(20) = null,

    @param5 nvarchar(20) = null,

    @param6 nvarchar(20) = null,

    @param7 SMALLDATETIME = null,

    @param8 SMALLDATETIME = null,

    @param9 SMALLDATETIME = null,

    @param10 SMALLDATETIME = null,

    @param11 SMALLDATETIME = null,

    @param12 nvarchar(20) = null

    AS

    BEGIN

    DECLARE @where_clause varchar(8000),

    @sql varchar(8000),

    @return varchar(2)

    SELECT @where_clause = 'WHERE 1=1',

    @return = char(13)+char(10)

    SELECT @where_clause = @where_clause +

    CASE WHEN @param1 IS NOT NULL THEN @return+' AND Case_no# = '''+@param1+'''' ELSE '' END+

    CASE WHEN @param2 IS NOT NULL THEN @return+' AND last_name = '''+@param2+'''' ELSE '' END+

    CASE WHEN @param3 IS NOT NULL THEN @return+' AND [First name] = '''+@param3+'''' ELSE '' END+

    CASE WHEN @param4 IS NOT NULL THEN @return+' AND Determination = '''+@param4+'''' ELSE '' END+

    CASE WHEN @param5 IS NOT NULL THEN @return+' AND sender = '''+@param5+'''' ELSE '' END+

    CASE WHEN @param6 IS NOT NULL THEN @return+' AND [Case Type] = '''+@param6+'''' ELSE '' END+

    CASE WHEN @param7 IS NOT NULL THEN @return+' AND Date = '''+CAST(@param7 AS nvarchar(20))+'''' ELSE '' END+

    CASE WHEN @param8 IS NOT NULL THEN @return+' AND [Qwik Due] = '''+CAST(@param8 AS nvarchar(20))+'''' ELSE '' END+

    CASE WHEN @param9 IS NOT NULL THEN @return+' AND [Qwik sent date] = '''+CAST(@param9 AS nvarchar(20))+'''' ELSE '' END+

    CASE WHEN @param10 IS NOT NULL THEN @return+' AND [Final due] = '''+CAST(@param10 AS nvarchar(20))+'''' ELSE '' END+

    CASE WHEN @param11 IS NOT NULL THEN @return+' AND [Final sent date] = '''+CAST(@param11 AS nvarchar(20))+'''' ELSE '' END+

    CASE WHEN @param12 IS NOT NULL THEN @return+' AND [on hold code] = '''+@param12+'''' ELSE '' END

    SELECT @sql = 'SELECT DISTINCT(Case30.Case_no#),Case30.Last_name,Case30.[First Name],Case30.Determination,Case30.Sender,Case30.[Case Type],Case30.Date,Case30.[Qwik Due],Case30.[Qwik sent date],Case30.[Final due],Case30.[Final sent date],Case30.[on hold code],tblFileNames.reportid,tblFileNames.type,userviewed.viewid FROM tblFileNames Left Outer Join userviewed ON tblFileNames.reportid = userviewed.reportid Right Outer Join Case30 ON tblFileNames.case_id = Case30.Case_no# '+@return+@where_clause+@return+

    'UNION'+@return+

    'SELECT DISTINCT(CASEOLDR.Case_no#),CASEOLDR.Last_name,CASEOLDR.[First Name],CASEOLDR.Determination,CASEOLDR.Sender,CASEOLDR.[Case Type],CASEOLDR.Date,CASEOLDR.[Qwik Due],CASEOLDR.[Qwik sent date],CASEOLDR.[Final due],CASEOLDR.[Final sent date],CASEOLDR.[on hold code],tblFileNames.reportid,tblFileNames.type,userviewed.viewid FROM tblFileNames Left Outer Join userviewed ON tblFileNames.reportid = userviewed.reportid Right Outer Join CASEOLDR ON CASEOLDR.Case_no# = tblFileNames.case_id '+@return+@where_clause

    EXEC (@sql)

    --SELECT @sql

    END

    Problem lies in the Select statement above which i think is the reason of getting duplicates.

  • What duplicates are you getting, from one of the selects or when they are combined. If when combined then you could try

    select distinct a.* from (select... union select...) a

    Far away is close at hand in the images of elsewhere.
    Anon.

  • In addition, if you want to get rid of case statements you could use

    SELECT @where_clause = @where_clause + 
    
    ISNULL(@return+' AND Case_no# = '''+@param1+'''','')+
    ISNULL(@return+' AND last_name = '''+@param2+'''','')+
    ISNULL(@return+' AND [First name] = '''+@param3+'''','')+
    ISNULL(@return+' AND Determination = '''+@param4+'''','')+
    ISNULL(@return+' AND sender = '''+@param5+'''','')+
    ISNULL(@return+' AND [Case Type] = '''+@param6+'''','')+
    ISNULL(@return+' AND Date = '''+CAST(@param7 AS nvarchar(20))+'''','')+
    ISNULL(@return+' AND [Qwik Due] = '''+CAST(@param8 AS nvarchar(20))+'''','')+
    ISNULL(@return+' AND [Qwik sent date] = '''+CAST(@param9 AS nvarchar(20))+'''','')+
    ISNULL(@return+' AND [Final due] = '''+CAST(@param10 AS nvarchar(20))+'''','')+
    ISNULL(@return+' AND [Final sent date] = '''+CAST(@param11 AS nvarchar(20))+'''','')+
    ISNULL(@return+' AND [on hold code] = '''+@param12+'''','')

    doubt if better performance

    Far away is close at hand in the images of elsewhere.
    Anon.

  • There's more... Try the performance if this (remember the execution plan will be saved for this query in proc).

    SELECTDISTINCT a.*
    
    FROM(
    SELECTCase30.Case_no#,
    Case30.Last_name,
    Case30.[First Name],
    Case30.Determination,
    Case30.Sender,
    Case30.[Case Type],
    Case30.Date,
    Case30.[Qwik Due],
    Case30.[Qwik sent date],
    Case30.[Final due],
    Case30.[Final sent date],
    Case30.[on hold code],
    tblFileNames.reportid,
    tblFileNames.type,
    userviewed.viewid
    FROMtblFileNames
    Left Outer Join userviewed ON tblFileNames.reportid = userviewed.reportid
    Right Outer Join Case30 ON tblFileNames.case_id = Case30.Case_no#
    WHERE (@param1 IS NULL OR (@param1 IS NOT NULL AND Case_no# = @param1))
    AND(@param2 IS NULL OR (@param2 IS NOT NULL AND last_name = @param2))
    AND(@param3 IS NULL OR (@param3 IS NOT NULL AND [First name] = @param3))
    AND(@param4 IS NULL OR (@param4 IS NOT NULL AND Determination = @param4))
    AND(@param5 IS NULL OR (@param5 IS NOT NULL AND sender = @param5))
    AND(@param6 IS NULL OR (@param6 IS NOT NULL AND [Case Type] = @param6))
    AND(@param7 IS NULL OR (@param7 IS NOT NULL AND Date = @param7))
    AND(@param8 IS NULL OR (@param8 IS NOT NULL AND [Qwik Due] = @param8))
    AND(@param9 IS NULL OR (@param9 IS NOT NULL AND [Qwik sent date] = @param9))
    AND(@param10 IS NULL OR (@param10 IS NOT NULL AND [Final due] = @param10))
    AND(@param11 IS NULL OR (@param11 IS NOT NULL AND [Final sent date] = @param11))
    AND(@param12 IS NULL OR (@param12 IS NOT NULL AND [on hold code] = @param12))

    UNION

    SELECTDISTINCT CASEOLDR.Case_no#,
    CASEOLDR.Last_name,
    CASEOLDR.[First Name],
    CASEOLDR.Determination,
    CASEOLDR.Sender,
    CASEOLDR.[Case Type],
    CASEOLDR.Date,
    CASEOLDR.[Qwik Due],
    CASEOLDR.[Qwik sent date],
    CASEOLDR.[Final due],
    CASEOLDR.[Final sent date],
    CASEOLDR.[on hold code],
    tblFileNames.reportid,
    tblFileNames.type,
    userviewed.viewid
    FROMtblFileNames
    Left Outer Join userviewed ON tblFileNames.reportid = userviewed.reportid
    Right Outer Join CASEOLDR ON CASEOLDR.Case_no# = tblFileNames.case_id
    WHERE (@param1 IS NULL OR (@param1 IS NOT NULL AND Case_no# = @param1))
    AND(@param2 IS NULL OR (@param2 IS NOT NULL AND last_name = @param2))
    AND(@param3 IS NULL OR (@param3 IS NOT NULL AND [First name] = @param3))
    AND(@param4 IS NULL OR (@param4 IS NOT NULL AND Determination = @param4))
    AND(@param5 IS NULL OR (@param5 IS NOT NULL AND sender = @param5))
    AND(@param6 IS NULL OR (@param6 IS NOT NULL AND [Case Type] = @param6))
    AND(@param7 IS NULL OR (@param7 IS NOT NULL AND Date = @param7))
    AND(@param8 IS NULL OR (@param8 IS NOT NULL AND [Qwik Due] = @param8))
    AND(@param9 IS NULL OR (@param9 IS NOT NULL AND [Qwik sent date] = @param9))
    AND(@param10 IS NULL OR (@param10 IS NOT NULL AND [Final due] = @param10))
    AND(@param11 IS NULL OR (@param11 IS NOT NULL AND [Final sent date] = @param11))
    AND(@param12 IS NULL OR (@param12 IS NOT NULL AND [on hold code] = @param12))
    ) a

    Far away is close at hand in the images of elsewhere.
    Anon.

  • Hi burrows,

    I tried ur way but it still give me duplicates. It gives 304 rows instead of 80 rows. Can u help some other way in my old proc itself.

  • Ah! Thought that might be the case. Are u getting full row duplication or only a few columns? Can u be more precise as to what is duplicated? Which columns? Can u post some data (non sensitive) that shows the duplication and what u want it to look like without duplication. If data too large, email me and I will look at it for you.

    Far away is close at hand in the images of elsewhere.
    Anon.

  • Sorry to deviate, but a quick question for DaveB:

    "AND(@param12 IS NULL OR (@param12 IS NOT NULL AND [on hold code] = @param12))"

    -Is there any particular reason for including the redundant "@param12 IS NOT NULL AND ", or is it just for readability?

    Tim

    Tim Wilkinson

    "If it doesn't work in practice, you're using the wrong theory"
    - Immanuel Kant

  • stax68,

    Only 'belts & braces' and the way I always do it (bad habit maybe ). In the past I have had bad experiences with matching with nulls and I suspect I just like to make sure. Your right it is redundant.

    Far away is close at hand in the images of elsewhere.
    Anon.

  • OK - sorry, wasn't trying to nitpick, but I use a similar method - I literally and figuratively see red when I open a stored procedure to find it is composed almost entirely of dynamic SQL - and it occurred to me that there might be some performace advantage to your 'belt and braces' approach.

    Tim Wilkinson

    "If it doesn't work in practice, you're using the wrong theory"
    - Immanuel Kant

  • No need to apologise, it's nice to see my code challenged, keeps me on my toes and helps me write better code. I never write dynamic sql in procs unless there is absolutely no alternative.

    Far away is close at hand in the images of elsewhere.
    Anon.

  • hi david,

    The columns which are duplicated are tblfilenames.case_id and because of that case30 and caseoldr.caseNo# also gets duplicated. Other colums which gets duplicated are userviewed.reportid because of which tblfilenames.reportid also gets duplicated. So instead of 80 rows i am getting a display of 304 rows. I have just gone mad figuring out the problem for duplicates but am not successful because in one table there are duplicate case no.s and in other it is as unique. Now how can u join the two different tables and have distinct output.?

  • So you haven't actually got duplicate rows.

    I suspect what you've got is a so-called 'fan trap'.

    This happens when you have two many-to one relationships. Lokking at your query, this may be the reason for the excess rows:

    tblFileNames.reportid = userviewed.reportid tblFileNames.case_id = Case30.Case_no#

    Do the other two tables both have multiple records matching a given record in tblFileNames? If so, you may have a fan trap.

    Example: three tables with their data:

    TAB_MAIN:

    id

    --

    1

    2

    3

    ...

    TAB_REF1:

    main_id ref1

    ------- ----

    1 a

    1 b

    1 c

    TAB_REF2:

    main_id ref2

    ------- ----

    1 z

    1 y

    1 x

    Now consider the following query: how many rows should it return? Clue: it's not 3.

    select m.id, r1.ref1, r2.ref2

    from TAB_MAIN m

    join TAB_REF1 r1

    on r1.main_id = m.id

    join TAB_REF1 r2

    on r2.main_id = m.id

    query results:

    id ref1 ref2

    -- ---- ----

    1 a z

    1 a y

    1 a x

    1 b z

    1 b y

    1 b x

    1 c z

    1 c y

    1 c x

    (9 records returned)

    Because you have two sets of 'child' records, which aren't related to each other, you get all the possible pairs of 'child' values returned for each main record.

    The only way to avoid this is not to have two unrelated child records in your query.

    If the child records are related to each other, you will need to add a join to relate them (i.e. limit the number of combinations returned.)

    This may not be a fan trap problem, but if it is, you need to rethink what you are trying to do. Effectively you are trying to present unnormalised data in 1st normal form.

    To see whether this is happening, try removing each of the 'child' tables from the query in turn. If each of these return a sensible number of rows, but the full query multiplies rows, then it's probably a fan trap.

    Tim

    Tim Wilkinson

    "If it doesn't work in practice, you're using the wrong theory"
    - Immanuel Kant

  • Yes I also noticed what stax68 found and the outer joins could be your problem. However if you have true duplicates in both Case30 and CASEOLDR then this should do it.

    SELECTc.Case30.Case_no#,
    
    c.Last_name,
    c.[First Name],
    c.Determination,
    c.Sender,
    c.[Case Type],
    c.Date,
    c.[Qwik Due],
    c.[Qwik sent date],
    c.[Final due],
    c.[Final sent date],
    c.[on hold code],
    f.reportid,
    f.type,
    u.viewid
    FROM(SELECTDISTINCT c.*
    FROM(SELECTCase30.Case_no#,
    Case30.Last_name,
    Case30.[First Name],
    Case30.Determination,
    Case30.Sender,
    Case30.[Case Type],
    Case30.Date,
    Case30.[Qwik Due],
    Case30.[Qwik sent date],
    Case30.[Final due],
    Case30.[Final sent date],
    Case30.[on hold code]
    FROMCase30
    WHERE (@param1 IS NULL OR (@param1 IS NOT NULL AND Case_no# = @param1))
    AND(@param2 IS NULL OR (@param2 IS NOT NULL AND last_name = @param2))
    AND(@param3 IS NULL OR (@param3 IS NOT NULL AND [First name] = @param3))
    AND(@param4 IS NULL OR (@param4 IS NOT NULL AND Determination = @param4))
    AND(@param5 IS NULL OR (@param5 IS NOT NULL AND sender = @param5))
    AND(@param6 IS NULL OR (@param6 IS NOT NULL AND [Case Type] = @param6))
    AND(@param7 IS NULL OR (@param7 IS NOT NULL AND Date = @param7))
    AND(@param8 IS NULL OR (@param8 IS NOT NULL AND [Qwik Due] = @param8))
    AND(@param9 IS NULL OR (@param9 IS NOT NULL AND [Qwik sent date] = @param9))
    AND(@param10 IS NULL OR (@param10 IS NOT NULL AND [Final due] = @param10))
    AND(@param11 IS NULL OR (@param11 IS NOT NULL AND [Final sent date] = @param11))
    AND(@param12 IS NULL OR (@param12 IS NOT NULL AND [on hold code] = @param12))
    UNION
    SELECTDISTINCT CASEOLDR.Case_no#,
    CASEOLDR.Last_name,
    CASEOLDR.[First Name],
    CASEOLDR.Determination,
    CASEOLDR.Sender,
    CASEOLDR.[Case Type],
    CASEOLDR.Date,
    CASEOLDR.[Qwik Due],
    CASEOLDR.[Qwik sent date],
    CASEOLDR.[Final due],
    CASEOLDR.[Final sent date],
    CASEOLDR.[on hold code]
    FROMCASEOLDR
    WHERE (@param1 IS NULL OR (@param1 IS NOT NULL AND Case_no# = @param1))
    AND(@param2 IS NULL OR (@param2 IS NOT NULL AND last_name = @param2))
    AND(@param3 IS NULL OR (@param3 IS NOT NULL AND [First name] = @param3))
    AND(@param4 IS NULL OR (@param4 IS NOT NULL AND Determination = @param4))
    AND(@param5 IS NULL OR (@param5 IS NOT NULL AND sender = @param5))
    AND(@param6 IS NULL OR (@param6 IS NOT NULL AND [Case Type] = @param6))
    AND(@param7 IS NULL OR (@param7 IS NOT NULL AND Date = @param7))
    AND(@param8 IS NULL OR (@param8 IS NOT NULL AND [Qwik Due] = @param8))
    AND(@param9 IS NULL OR (@param9 IS NOT NULL AND [Qwik sent date] = @param9))
    AND(@param10 IS NULL OR (@param10 IS NOT NULL AND [Final due] = @param10))
    AND(@param11 IS NULL OR (@param11 IS NOT NULL AND [Final sent date] = @param11))
    AND(@param12 IS NULL OR (@param12 IS NOT NULL AND [on hold code] = @param12))
    ) c
    INNER JOIN tblFileNames f ON f.case_id = c.Case_no#
    INNER JOIN userviewed u ON u.reportid = f.reportid

    Far away is close at hand in the images of elsewhere.
    Anon.

  • Just reread your last post, and the answer to your question

    quote:


    in one table there are duplicate case no.s and in other it is as unique. Now how can u join the two different tables and have distinct output.?


    is: basically, you can't, unless you use 'group by' and aggregate functions (see BOL, Transact_SQL help, etc).

    Tim

    Tim Wilkinson

    "If it doesn't work in practice, you're using the wrong theory"
    - Immanuel Kant

  • Tim,

    You can by doing the distinct in a sub query (as in my last post). However this relies on all the columns being duplicated, eg

    Case_no#,Last_name,First Name,Determination,Sender etc...

    1,Bloggs,Fred,Child,Master etc...

    1,Bloggs,Fred,Child,Master etc...

    however if only Case_no# is duplicated then this is harder as it depends on which record is required, eg

    Case_no#,Last_name,First Name,Determination,Sender etc...

    1,Bloggs,Fred,Child,Master etc...

    1,Smith,Jim,xxx,yyyy etc...

    Far away is close at hand in the images of elsewhere.
    Anon.

Viewing 15 posts - 1 through 15 (of 27 total)

You must be logged in to reply to this topic. Login to reply