Get distinct records for paging when joining tables

  • I have a problem with the following (very simplified) SQL Server code. I have a webpage that displays (distinct) cases. The page has paging implemented,

    so each time a user clicks next or previous, sequence start and end for the next records is submitted to the store procedure.

    I have the below tables (again simplified). A Case can contain multiple orderlines and multiple orderlines in same case can be assigned to same user.

    tblCase:

    CaseID, CreateDate

    150, 01/01/2016

    151, 02/02/2016

    152, 03/03/2016

    tblCaseOrderLine

    CaseOrderLine, CaseId, UserID

    1,150,1255

    2,150,1255

    3,151,1432

    4,152,1255

    5,152,1918

    6,152,1918

    I have this SQL for retrieving and filtering the cases shown:

    WITH cte AS

    (

    SELECT C.CaseId,

    CONVERT(INT,ROW_NUMBER() OVER(ORDER BY C.createDate ASC)) AS sequenceId

    FROM tblCase C

    INNER JOIN tblCaseOrderline CO ON C.caseId = CO.CaseId

    WHERE

    (CO.UserId = @userid OR @userid IS NULL)

    )

    SELECT * FROM cte WHERE

    (sequenceId BETWEEN @sequenceStart AND @sequenceEnd)

    Order BY sequenceId

    If I pass these parameters: @sequenceStart = 1, @sequenceEnd = 10, @userid = NULL

    Then I get this result:

    CaseId, SequenceId

    150, 1

    150, 2

    151, 3

    152, 4

    152, 5

    152, 6

    The wanted result is:

    CaseId, SequenceId

    150, 1

    151, 2

    152, 3

    Problem is that CaseID: 150 is twice in the resultset and CaseID: 152 is 3 times in the resultset because they have multiple caseorderlines - but duplicate records destroys paging as only unique caseid's is displayed on the webpage.

    IS there anyway I can join tblCase AND tblCaseOrderline WITHOUT making it produce duplicate records? There's a lot of examples on how to implement paging and sorting using CTE - but it's alwasy on one table only and not on tables that join with other tables.

  • Looks like you're just using the CaseOrderline for filtering. If so, try this

    SELECT C.CaseId,

    CONVERT(INT,ROW_NUMBER() OVER(ORDER BY C.createDate ASC)) AS sequenceId

    FROM tblCase C

    WHERE EXISTS (SELECT 1 FROM tblCaseOrderline CO WHERE C.caseId = CO.CaseId AND (CO.UserId = @userid OR @userid IS NULL)

    Oh, and that where clause is problematic in terms of performance.

    https://www.simple-talk.com/content/article.aspx?article=2280

    Gail Shaw
    Microsoft Certified Master: SQL Server, MVP, M.Sc (Comp Sci)
    SQL In The Wild: Discussions on DB performance with occasional diversions into recoverability

    We walk in the dark places no others will enter
    We stand on the bridge and no one may pass
  • Yes, you're right: CaseOrderLine (and other tables) are only used for filtering since loading of the entire case (including orderlines) is handled in code in the middle layer.

    I'll try your solution - hopefully it dosn't affect performance too much.

  • John Jensen-439690 (4/4/2016)


    I'll try your solution - hopefully it dosn't affect performance too much.

    If you mean improve it, not much. It won't be a noticeable improvement in most cases. Couple percent lower CPU for a semi-join over a full join usually.

    If you're worried about performance problems, please read through the article I referenced, as the WHERE clause form you used (Column = @Parameter or @Parameter IS NULL) can result in horrible performance.

    Gail Shaw
    Microsoft Certified Master: SQL Server, MVP, M.Sc (Comp Sci)
    SQL In The Wild: Discussions on DB performance with occasional diversions into recoverability

    We walk in the dark places no others will enter
    We stand on the bridge and no one may pass
  • Thanks for the suggestions. I've now implemented it, the way you described and it works very well.

  • Because of the Join with tblCaseOrderLines has been removed, I had to rewrite below AND-clause (was left out of the example for simplicity). DelimitedSplitN4K is a Table Valued function that takes a comma seperated list of users (@userIds) and returns a table to perform a join on:

    Original before change

    AND (

    (C.companyId = @companyId OR @companyId IS NULL)

    OR

    (NOT @userIds IS NULL AND CO.UserId IN (SELECT Item FROM [DelimitedSplitN4K](@userIds,',')))

    )

    I have tried to rewrite the above to the following - but all of them gives bad performance - are there any alternatives:

    After change 1:

    AND

    (

    (C.companyId = @companyId OR @companyId IS NULL)

    OR

    (NOT @userIds IS NULL AND EXISTS(SELECT 1 FROM tblCaseOrderline CO

    INNER JOIN [DelimitedSplitN4K](@userIds,',') ON Item = CO.UserId

    WHERE C.caseId = CO.CaseId)

    )

    )

    After change 2:

    AND

    (

    (C.companyId = @companyId OR @companyId IS NULL)

    OR

    (NOT @userIds IS NULL AND EXISTS(SELECT 1 FROM tblCaseOrderline CO

    CROSS APPLY [DelimitedSplitN4K](@userIds,',') U

    WHERE C.caseId = CO.CaseId AND U.Item = CO.UserId)

    )

    )

  • John Jensen-439690 (4/7/2016)


    but all of them gives bad performance

    Yes, they will, for reasons explained in the article I referenced earlier

    https://www.simple-talk.com/content/article.aspx?article=2280

    It's not the split that's the problem, it's the form of your WHERE clause, the '<parameter> is NULL or <column> <expression> <parameter>'

    Please take a read through that article, it gives you multiple options for fixing the problem.

    Gail Shaw
    Microsoft Certified Master: SQL Server, MVP, M.Sc (Comp Sci)
    SQL In The Wild: Discussions on DB performance with occasional diversions into recoverability

    We walk in the dark places no others will enter
    We stand on the bridge and no one may pass
  • I actually read the article, and have also added the OPTION recompile to the procedure - but I don't think it has any effect on the performance. The entire procedure is like this:

    WITH cte AS

    (

    SELECT DISTINCT C.CaseId,

    CASE @sortField

    WHEN 'CreateDateDesc' THEN CONVERT(INT,ROW_NUMBER() OVER(ORDER BY C.createDate DESC))

    WHEN 'CreateDateAsc' THEN CONVERT(INT,ROW_NUMBER() OVER(ORDER BY C.createDate ASC))

    WHEN 'CaseIdDesc' THEN CONVERT(INT,ROW_NUMBER() OVER(ORDER BY C.CaseId DESC))

    WHEN 'CaseIdAsc' THEN CONVERT(INT,ROW_NUMBER() OVER(ORDER BY C.CaseId ASC))

    WHEN 'CompanyNameDesc' THEN CONVERT(INT,ROW_NUMBER() OVER(ORDER BY F.firma DESC))

    WHEN 'CompanyNameAsc' THEN CONVERT(INT,ROW_NUMBER() OVER(ORDER BY F.firma ASC))

    END AS sequenceId,

    COUNT(*) OVER () AS TotalRows

    FROM tblCase C

    INNER JOIN tblCaseDeadline CD ON C.CaseId = CD.CaseId AND CD.CaseDeadLineTypeId = 3

    INNER JOIN tblFirmaer F ON F.firma_id = C.CompanyId

    WHERE

    (C.createDate >= @orderFromDate OR @orderFromDate IS NULL)

    AND

    (C.createDate <= @orderToDate OR @orderToDate IS NULL)

    AND

    (

    (C.companyId = @companyId OR @companyId IS NULL)

    OR

    (NOT @userIds IS NULL AND EXISTS(SELECT 1 FROM tblCaseOrderline CO

    CROSS APPLY [DelimitedSplitN4K](@userIds,',') U

    WHERE C.caseId = CO.CaseId AND U.Item = CO.UserId)

    )

    )

    AND

    EXISTS(SELECT 1 FROM tblCaseOrderline CO WHERE C.caseId = CO.CaseId AND (CO.UserId = @userid OR @userid IS NULL))

    AND

    (isnull(C.deleted,0) = @deleted OR @deleted IS NULL)

    AND

    EXISTS(SELECT 1 FROM tblCaseOrderline CO WHERE C.caseId = CO.CaseId AND (CO.inspectionDate >= @inspectionFromDate OR @inspectionFromDate IS NULL) AND (CO.inspectionDate <= @inspectionToDate OR @inspectionToDate IS NULL))

    AND

    (C.modifyDate >= @modifyFromDate OR @modifyFromDate IS NULL)

    AND

    (C.modifyDate <= @modifyToDate OR @modifyToDate IS NULL)

    AND

    (C.caseTypeId = @CaseTypeId OR @CaseTypeId IS NULL)

    AND

    (

    ((CD.completion IS NULL OR CD.isCompleted = 0 ) AND @completed = 0 )

    OR

    ((NOT CD.completion IS NULL OR CD.isCompleted = 1) AND @completed = 1 )

    OR

    (@completed IS NULL)

    )

    AND

    (C.claimantId = @claimantId OR @claimantId IS NULL)

    )

    SELECT CaseId, TotalRows FROM cte WHERE

    (sequenceId BETWEEN @sequenceStart AND @sequenceEnd)

    OR (@sequenceStart IS NULL AND @sequenceEnd IS NULL)

    Order BY sequenceId

    OPTION(RECOMPILE)

  • May I ask why you are converting Row_Number() to an INT everywhere?


  • I've tried all suggested solutions in the mentioned article - but they dosn't seem to have any huge effect on my problem.

    tblCase contains 6043 rows and tblCaseOrderLine contains 12589 - so currently it's still very little data.

    I made the following tests:

    1. Run the original query, Without passing a list of @UserId's. Execution time: 40ms

    2. Run the original query, passing a list of @UserId's containing 23 userid. Execution time: 820ms

    3. Run the query with Option(recompile) implemented, Without passing a list of @UserId's. Execution time: 323ms

    4. Run the query with Option(recompile) implemented, passing a list of @UserId's containing 23 userid. Execution time: 1093ms

    5. Run the query - changed to dynamic SQL, Without passing a list of @UserId's. Execution time: 7ms

    6. Run the query - changed to dynamic SQL, passing a list of @UserId's containing 23 userid. Execution time: 1004ms

    It seems like original query is fastest when passing a list of userids and dynamic is only faster when not passing a list of userids.

    So it must be the join / cross apply with DelimitedSplitN4K that slows it down. Any other ways of doing it?

  • Can you post execution plans for all of those tests?

    Gail Shaw
    Microsoft Certified Master: SQL Server, MVP, M.Sc (Comp Sci)
    SQL In The Wild: Discussions on DB performance with occasional diversions into recoverability

    We walk in the dark places no others will enter
    We stand on the bridge and no one may pass
  • Seems like I found a solution and I'm pretty sure that the [DelimitedSplitN4K] was the cause of the problem (the catch all query might be a problem later when number of rows increases).

    I ran the query, now with 60 userids in the @userID variable, which increased execution time to over 2 seconds.

    Then I remembered, that all users for a given company is stored in a table: tblCompanyUsers - so instead of this AND-clause:

    AND

    (

    (C.companyId = @companyId OR @companyId IS NULL)

    OR

    (NOT @userIds IS NULL AND EXISTS(SELECT 1 FROM tblCaseOrderline CO

    CROSS APPLY [DelimitedSplitN4K](@userIds,',') U

    WHERE C.caseId = CO.CaseId AND U.Item = CO.UserId)

    )

    )

    I can use this AND-clause:

    AND

    (

    (C.companyId = @companyId OR @companyId IS NULL)

    OR

    (NOT @companyId IS NULL AND EXISTS(SELECT 1 FROM tblCaseOrderline CO

    INNER JOIN tblCompanyUser CU ON Cu.UserID = CO.UserId AND CU.CompanyId = @companyId

    WHERE C.caseId = CO.CaseId)

    )

    )

    This has changed execution time from 2 seconds to 300ms with OPTION(recompile) and only 40ms without OPTION(recompile). With dynamic SQL I can get it down to 30ms.

    Gail, if you still want to investigate why [DelimitedSplitN4K] performed that bad in my case, I will post the execution plans.

Viewing 12 posts - 1 through 12 (of 12 total)

You must be logged in to reply to this topic. Login to reply