Get distinct records for paging when joining tables

Question

Post reply

Get distinct records for paging when joining tables

John Jensen-439690

Mr or Mrs. 500

Points: 545
More actions
April 4, 2016 at 3:13 am

#326242

I have a problem with the following (very simplified) SQL Server code. I have a webpage that displays (distinct) cases. The page has paging implemented,
so each time a user clicks next or previous, sequence start and end for the next records is submitted to the store procedure.
I have the below tables (again simplified). A Case can contain multiple orderlines and multiple orderlines in same case can be assigned to same user.
tblCase:
CaseID, CreateDate
150, 01/01/2016
151, 02/02/2016
152, 03/03/2016
tblCaseOrderLine
CaseOrderLine, CaseId, UserID
1,150,1255
2,150,1255
3,151,1432
4,152,1255
5,152,1918
6,152,1918
I have this SQL for retrieving and filtering the cases shown:
WITH cte AS
(
SELECT C.CaseId,
CONVERT(INT,ROW_NUMBER() OVER(ORDER BY C.createDate ASC)) AS sequenceId
FROM tblCase C
INNER JOIN tblCaseOrderline CO ON C.caseId = CO.CaseId
WHERE
(CO.UserId = @userid OR @userid IS NULL)
)
SELECT * FROM cte WHERE
(sequenceId BETWEEN @sequenceStart AND @sequenceEnd)
Order BY sequenceId
If I pass these parameters: @sequenceStart = 1, @sequenceEnd = 10, @userid = NULL
Then I get this result:
CaseId, SequenceId
150, 1
150, 2
151, 3
152, 4
152, 5
152, 6
The wanted result is:
CaseId, SequenceId
150, 1
151, 2
152, 3
Problem is that CaseID: 150 is twice in the resultset and CaseID: 152 is 3 times in the resultset because they have multiple caseorderlines - but duplicate records destroys paging as only unique caseid's is displayed on the webpage.
IS there anyway I can join tblCase AND tblCaseOrderline WITHOUT making it produce duplicate records? There's a lot of examples on how to implement paging and sorting using CTE - but it's alwasy on one table only and not on tables that join with other tables.

Viewing 12 posts - 1 through 12 (of 12 total)

You must be logged in to reply to this topic. Login to reply

Gail Shaw SSC Guru Points: 1004514 More actions · Answer 1

Looks like you're just using the CaseOrderline for filtering. If so, try this

SELECT C.CaseId,

CONVERT(INT,ROW_NUMBER() OVER(ORDER BY C.createDate ASC)) AS sequenceId

FROM tblCase C

WHERE EXISTS (SELECT 1 FROM tblCaseOrderline CO WHERE C.caseId = CO.CaseId AND (CO.UserId = @userid OR @userid IS NULL)

Oh, and that where clause is problematic in terms of performance.

https://www.simple-talk.com/content/article.aspx?article=2280

Gail Shaw
Microsoft Certified Master: SQL Server, MVP, M.Sc (Comp Sci)
SQL In The Wild: Discussions on DB performance with occasional diversions into recoverability

We walk in the dark places no others will enter
We stand on the bridge and no one may pass

John Jensen-439690 Mr or Mrs. 500 Points: 545 More actions · Answer 2

Yes, you're right: CaseOrderLine (and other tables) are only used for filtering since loading of the entire case (including orderlines) is handled in code in the middle layer.

I'll try your solution - hopefully it dosn't affect performance too much.

Gail Shaw SSC Guru Points: 1004514 More actions · Answer 3

John Jensen-439690 (4/4/2016)
I'll try your solution - hopefully it dosn't affect performance too much.

If you mean improve it, not much. It won't be a noticeable improvement in most cases. Couple percent lower CPU for a semi-join over a full join usually.

If you're worried about performance problems, please read through the article I referenced, as the WHERE clause form you used (Column = @Parameter or @Parameter IS NULL) can result in horrible performance.

Gail Shaw
Microsoft Certified Master: SQL Server, MVP, M.Sc (Comp Sci)
SQL In The Wild: Discussions on DB performance with occasional diversions into recoverability

We walk in the dark places no others will enter
We stand on the bridge and no one may pass

John Jensen-439690 Mr or Mrs. 500 Points: 545 More actions · Answer 4

Thanks for the suggestions. I've now implemented it, the way you described and it works very well.

John Jensen-439690 Mr or Mrs. 500 Points: 545 More actions · Answer 5

Because of the Join with tblCaseOrderLines has been removed, I had to rewrite below AND-clause (was left out of the example for simplicity). DelimitedSplitN4K is a Table Valued function that takes a comma seperated list of users (@userIds) and returns a table to perform a join on:

Original before change

AND (

(C.companyId = @companyId OR @companyId IS NULL)

OR

(NOT @userIds IS NULL AND CO.UserId IN (SELECT Item FROM [DelimitedSplitN4K](@userIds,',')))

)

I have tried to rewrite the above to the following - but all of them gives bad performance - are there any alternatives:

After change 1:

AND

(

(C.companyId = @companyId OR @companyId IS NULL)

OR

(NOT @userIds IS NULL AND EXISTS(SELECT 1 FROM tblCaseOrderline CO

INNER JOIN [DelimitedSplitN4K](@userIds,',') ON Item = CO.UserId

WHERE C.caseId = CO.CaseId)

)

After change 2:

AND

(

(C.companyId = @companyId OR @companyId IS NULL)

OR

(NOT @userIds IS NULL AND EXISTS(SELECT 1 FROM tblCaseOrderline CO

CROSS APPLY [DelimitedSplitN4K](@userIds,',') U

WHERE C.caseId = CO.CaseId AND U.Item = CO.UserId)

)

Gail Shaw SSC Guru Points: 1004514 More actions · Answer 6

John Jensen-439690 (4/7/2016)
but all of them gives bad performance

Yes, they will, for reasons explained in the article I referenced earlier

https://www.simple-talk.com/content/article.aspx?article=2280

It's not the split that's the problem, it's the form of your WHERE clause, the '<parameter> is NULL or <column> <expression> <parameter>'

Please take a read through that article, it gives you multiple options for fixing the problem.

Gail Shaw
Microsoft Certified Master: SQL Server, MVP, M.Sc (Comp Sci)
SQL In The Wild: Discussions on DB performance with occasional diversions into recoverability

We walk in the dark places no others will enter
We stand on the bridge and no one may pass

John Jensen-439690 Mr or Mrs. 500 Points: 545 More actions · Answer 7

I actually read the article, and have also added the OPTION recompile to the procedure - but I don't think it has any effect on the performance. The entire procedure is like this:

WITH cte AS

(

SELECT DISTINCT C.CaseId,

CASE @sortField

WHEN 'CreateDateDesc' THEN CONVERT(INT,ROW_NUMBER() OVER(ORDER BY C.createDate DESC))

WHEN 'CreateDateAsc' THEN CONVERT(INT,ROW_NUMBER() OVER(ORDER BY C.createDate ASC))

WHEN 'CaseIdDesc' THEN CONVERT(INT,ROW_NUMBER() OVER(ORDER BY C.CaseId DESC))

WHEN 'CaseIdAsc' THEN CONVERT(INT,ROW_NUMBER() OVER(ORDER BY C.CaseId ASC))

WHEN 'CompanyNameDesc' THEN CONVERT(INT,ROW_NUMBER() OVER(ORDER BY F.firma DESC))

WHEN 'CompanyNameAsc' THEN CONVERT(INT,ROW_NUMBER() OVER(ORDER BY F.firma ASC))

END AS sequenceId,

COUNT(*) OVER () AS TotalRows

FROM tblCase C

INNER JOIN tblCaseDeadline CD ON C.CaseId = CD.CaseId AND CD.CaseDeadLineTypeId = 3

INNER JOIN tblFirmaer F ON F.firma_id = C.CompanyId

WHERE

(C.createDate >= @orderFromDate OR @orderFromDate IS NULL)

AND

(C.createDate <= @orderToDate OR @orderToDate IS NULL)

AND

(

(C.companyId = @companyId OR @companyId IS NULL)

OR

(NOT @userIds IS NULL AND EXISTS(SELECT 1 FROM tblCaseOrderline CO

CROSS APPLY [DelimitedSplitN4K](@userIds,',') U

WHERE C.caseId = CO.CaseId AND U.Item = CO.UserId)

)

AND

EXISTS(SELECT 1 FROM tblCaseOrderline CO WHERE C.caseId = CO.CaseId AND (CO.UserId = @userid OR @userid IS NULL))

AND

(isnull(C.deleted,0) = @deleted OR @deleted IS NULL)

AND

EXISTS(SELECT 1 FROM tblCaseOrderline CO WHERE C.caseId = CO.CaseId AND (CO.inspectionDate >= @inspectionFromDate OR @inspectionFromDate IS NULL) AND (CO.inspectionDate <= @inspectionToDate OR @inspectionToDate IS NULL))

AND

(C.modifyDate >= @modifyFromDate OR @modifyFromDate IS NULL)

AND

(C.modifyDate <= @modifyToDate OR @modifyToDate IS NULL)

AND

(C.caseTypeId = @CaseTypeId OR @CaseTypeId IS NULL)

AND

(

((CD.completion IS NULL OR CD.isCompleted = 0 ) AND @completed = 0 )

OR

((NOT CD.completion IS NULL OR CD.isCompleted = 1) AND @completed = 1 )

OR

(@completed IS NULL)

)

AND

(C.claimantId = @claimantId OR @claimantId IS NULL)

)

SELECT CaseId, TotalRows FROM cte WHERE

(sequenceId BETWEEN @sequenceStart AND @sequenceEnd)

OR (@sequenceStart IS NULL AND @sequenceEnd IS NULL)

Order BY sequenceId

OPTION(RECOMPILE)

Phil Parkin SSC Guru Points: 247180 More actions · Answer 8

May I ask why you are converting Row_Number() to an INT everywhere?

John Jensen-439690 Mr or Mrs. 500 Points: 545 More actions · Answer 9

I've tried all suggested solutions in the mentioned article - but they dosn't seem to have any huge effect on my problem.

tblCase contains 6043 rows and tblCaseOrderLine contains 12589 - so currently it's still very little data.

I made the following tests:

1. Run the original query, Without passing a list of @UserId's. Execution time: 40ms

2. Run the original query, passing a list of @UserId's containing 23 userid. Execution time: 820ms

3. Run the query with Option(recompile) implemented, Without passing a list of @UserId's. Execution time: 323ms

4. Run the query with Option(recompile) implemented, passing a list of @UserId's containing 23 userid. Execution time: 1093ms

5. Run the query - changed to dynamic SQL, Without passing a list of @UserId's. Execution time: 7ms

6. Run the query - changed to dynamic SQL, passing a list of @UserId's containing 23 userid. Execution time: 1004ms

It seems like original query is fastest when passing a list of userids and dynamic is only faster when not passing a list of userids.

So it must be the join / cross apply with DelimitedSplitN4K that slows it down. Any other ways of doing it?

Gail Shaw SSC Guru Points: 1004514 More actions · Answer 10

Can you post execution plans for all of those tests?

Gail Shaw
Microsoft Certified Master: SQL Server, MVP, M.Sc (Comp Sci)
SQL In The Wild: Discussions on DB performance with occasional diversions into recoverability

We walk in the dark places no others will enter
We stand on the bridge and no one may pass

John Jensen-439690 Mr or Mrs. 500 Points: 545 More actions · Answer 11

Seems like I found a solution and I'm pretty sure that the [DelimitedSplitN4K] was the cause of the problem (the catch all query might be a problem later when number of rows increases).

I ran the query, now with 60 userids in the @userID variable, which increased execution time to over 2 seconds.

Then I remembered, that all users for a given company is stored in a table: tblCompanyUsers - so instead of this AND-clause:

AND

(

(C.companyId = @companyId OR @companyId IS NULL)

OR

(NOT @userIds IS NULL AND EXISTS(SELECT 1 FROM tblCaseOrderline CO

CROSS APPLY [DelimitedSplitN4K](@userIds,',') U

WHERE C.caseId = CO.CaseId AND U.Item = CO.UserId)

)

I can use this AND-clause:

AND

(

(C.companyId = @companyId OR @companyId IS NULL)

OR

(NOT @companyId IS NULL AND EXISTS(SELECT 1 FROM tblCaseOrderline CO

INNER JOIN tblCompanyUser CU ON Cu.UserID = CO.UserId AND CU.CompanyId = @companyId

WHERE C.caseId = CO.CaseId)

)

This has changed execution time from 2 seconds to 300ms with OPTION(recompile) and only 40ms without OPTION(recompile). With dynamic SQL I can get it down to 30ms.

Gail, if you still want to investigate why [DelimitedSplitN4K] performed that bad in my case, I will post the execution plans.