Custom Pagination with Complex Queries

  • Hi all,

    This is repost from T-SQL (SS2K5) due to no reply, perhaps I posted in wrong section. 🙂

    http://www.sqlservercentral.com/Forums/Topic736192-338-1.aspx

    Currently my application is facing slow performance due to the growth of data. After reading some articles, I have a brief understanding on how pagination works.

    However, I would like to ask for some advice on implementing pagination for complex SQL with lots of JOIN and UNION.

    For simple SQL, I use the following approach.

    -- * Pagination with Simple Query --

    SELECT * FROM (

    SELECT field1, field2, field 3, ROW_NUMBER() OVER (ORDER BY field1) AS RowNum

    FROM table1

    ) AS Result

    WHERE Result.RowNum BETWEEN 1 AND 10

    For complex SQL

    -- * Pagination with Complex Query --

    SELECT * FROM (

    SELECT *, ROW_NUMBER() OVER (ORDER BY field1) AS RowNum FROM (

    SELECT field1, field2, field3 FROM table1

    UNION

    SELECT field1, field2, field3 FROM table2

    UNION

    SELECT field1, field2, field3 FROM table3

    ) AS UnionData) AS Result

    WHERE Result.RowNum BETWEEN 1 AND 10

    I would appreciate if someone could provide piece of advice, thanks.

  • I saw this before.

    It didn't meet my personal 'bar' for time and effort required. Maybe if you had provided real table definitions, indexes, sample data creation, and so on...? Without knowing anything about the data, I don't know what you are looking to optimize for.

    If your complex SQL example runs in 5ms and produces 150 rows, there is probably no need to do any more than you already have.

    Paul

  • Basically the query is working as expected when the page index is small. I found that the performace will break down when user selects the >> (Last Page) of the pager. I guess this is due to the records reside at the most bottom part of the selected query and thus affect the performance while the 1st page load is quite fast.

    Anyway, I'm stilll thinking how to create a sample based on my scenario. As the query involves about 3 unions and 7 joins.

  • the method we use is to store the results of the initial search (limited to 500 rows) in a generic search results table. (Normally just the p.k. ) We the have a second proc for getting just a range of results by joining to the table. We then have a nightly routine for clearing it out)

    Another approach is to return all the results to the client as cache/filter them there.

    David

  • Firstly, you have not given enough information about your problem.

    If you give some sample DDL and DATA people would be more than willing to try to help you out.

    Some questions to think about:

    1. Can you use UNION ALL instead of UNION?

    2. Will using CTE improve performance? It would certainly make your query easier to read.

    3. Can you do pagination inside your application? Database pagination is usually more performance intensive.

  • Thanks for your advice, David.

    Hi Goldie,

    Sorry, I was trying to create some sample data with similar conditions but no luck, the structure is too complicated.

    1. I have to use UNION to avoid duplicate records.

    2. Tried with CTE aslo, still facing performance issue in the last page.

    3. I'm using .Net GridView with Pagination enabled and passing the startRowIndex, maximumRows and sortExpression as parameters. Is this considered Application Pagination?

    Thanks.

  • yaya1899 (6/29/2009)


    Thanks for your advice, David.

    Hi Goldie,

    Sorry, I was trying to create some sample data with similar conditions but no luck, the structure is too complicated.

    1. I have to use UNION to avoid duplicate records.

    2. Tried with CTE aslo, still facing performance issue in the last page.

    3. I'm using .Net GridView with Pagination enabled and passing the startRowIndex, maximumRows and sortExpression as parameters. Is this considered Application Pagination?

    Thanks.

    You don't need to create complicated data structures.

    Just create some table variables and load them with some data.

    1. UNION is quite expensive. This may be your bottleneck. I don't know your data, but there should be some way to get distinct rows without using UNION. Also, try using DISTINCT or GROUP BY instead of UNION and see what you get.

    2. I had a feeling that wouldn't help performance, but it is still easier from a readability standpoint.

    3. Application pagination means that the application is the one that does the paging. Basically, you would return all of the data and then the application would page it. One way I've seen that done is to return all of the primary keys in the dataset, then the application pages, and calls to the database for the detailed information about those specific IDs.

    How are you getting the maximum number of rows? Are you setting that as a static number or are you getting a COUNT from the database? If you are getting an actual COUNT then that is probably causing you some issues. A way to get around that is to cap your total amount. (Not sure how much sense that made, let me know if you need more details)

  • Hi Goldie,

    If just create some table variables with some data, how could I simulate the situation without the same structure? Sorry, just couldn't find a way to create the sample data, and need your advice ont this.

    Yes, I have another function which will return the record count. I couldn't put a static count as the user may select different date range of records.

    Thanks.

  • yaya1899 (7/2/2009)


    Hi Goldie,

    If just create some table variables with some data, how could I simulate the situation without the same structure? Sorry, just couldn't find a way to create the sample data, and need your advice ont this.

    Yes, I have another function which will return the record count. I couldn't put a static count as the user may select different date range of records.

    Thanks.

    The basic concept is to create a few table variables with a few rows of data.

    Only include the keys and one or two other fields in the table.

    For example: If I have a table called "Students" which stores a lot of information which links to courses, I would do something like this:

    DECLARE @Students TABLE

    (

    StudentID INT IDENTITY(1,1)

    ,LastName VARCHAR(50)

    )

    INSERT INTO @Students

    SELECT 'Green'

    UNION ALL SELECT 'Smith'

    UNION ALL SELECT 'White'

    DECLARE @Courses TABLE

    (

    CourseID INT IDENTITY(1,1)

    ,CourseCode VARCHAR(5)

    )

    INSERT INTO @Courses

    SELECT 'CSC 101'

    UNION ALL SELECT 'CSC 201'

    UNION ALL SELECT 'BIO 101'

    DECLARE @StudentCourses TABLE

    (

    StudentID

    ,CourseID

    )

    INSERT INTO @StudentCourses

    SELECT 1,1

    UNION ALL SELECT 1,2

    This allows people to play with your table structure without overloading them with information.

    Can you post the function which gets the count? I bet that's where your hang-up is.

    To confirm that, try just using a hard-coded value and see if performance improves.

    Then we can work on optimizing your count function. Maybe even use a trick or two 😉

  • Hi Goldie,

    My count function is almost the same as the SELECT function without the ROW NUMBER().

    Finally I came out with some samples, kindly advise.

    -- * Employee Table --

    DECLARE @Employee TABLE

    (

    EmpID INT IDENTITY(1,1) ,Name VARCHAR(50)

    )

    INSERT INTO @Employee VALUES('John')

    INSERT INTO @Employee VALUES('Peter')

    INSERT INTO @Employee VALUES('Sam')

    -- * Medical Claims --

    DECLARE @MedicalClaim TABLE

    (

    ClaimRef CHAR(12), EmpID_fk INT , ClaimDate DATETIME, Amount DECIMAL(5,0)

    )

    INSERT INTO @MedicalClaim VALUES('MC01', '1', GETDATE(), 50)

    INSERT INTO @MedicalClaim VALUES('MC02', '2', GETDATE(), 60)

    INSERT INTO @MedicalClaim VALUES('MC03', '3', GETDATE(), 40)

    -- * OT Claims --

    DECLARE @OTClaim TABLE

    (

    ClaimRef CHAR(12), EmpID_fk INT , ClaimDate DATETIME, OTRate DECIMAL(2,0)

    )

    INSERT INTO @MedicalClaim VALUES('OT01', '1', GETDATE(), 1.5)

    INSERT INTO @MedicalClaim VALUES('OT02', '2', GETDATE(), 1.5)

    INSERT INTO @MedicalClaim VALUES('OT03', '3', GETDATE(), 2.0)

    -- * Get All Claims for All Employees with Pagination --

    SELECT * FROM

    (

    SELECT *, ROW_NUMBER() OVER (ORDER BY EmpID) AS RowNum FROM

    (

    SELECT EmpID, Name, ClaimRef, ClaimDate, Amount AS ClaimAmount FROM @MedicalClaim

    INNER JOIN @Employee ON EmpID_fk = EmpID

    UNION

    SELECT EmpID, Name, ClaimRef, ClaimDate, OTRate * 2 AS ClaimAmount FROM @OTClaim

    INNER JOIN @Employee ON EmpID_fk = EmpID

    -- * UNION Other Claims --

    ) AS UnionData

    ) AS Result

    WHERE Result.RowNum BETWEEN 1 AND 10

    -- * Get COUNT of All Claims for All Employees --

    SELECT COUNT(*) FROM

    (

    SELECT EmpID, Name, ClaimRef, ClaimDate, Amount AS ClaimAmount FROM @MedicalClaim

    INNER JOIN @Employee ON EmpID_fk = EmpID

    UNION

    SELECT EmpID, Name, ClaimRef, ClaimDate, OTRate * 2 AS ClaimAmount FROM @OTClaim

    INNER JOIN @Employee ON EmpID_fk = EmpID

    ) AS RecCount

    Thanks again. 🙂

  • Good post. I'm looking into it.

    Question: Is there any chance of normalizing these tables? Your problem would be a lot easier to solve if you had a Claims table with a ClaimType field.

    There was an error with the INSERT scripts, I am inserting the new code here in case someone wants to look into it.

    -- * Employee Table --

    DECLARE @Employee TABLE

    (

    EmpID INT IDENTITY(1,1) ,Name VARCHAR(50)

    )

    INSERT INTO @Employee VALUES('John')

    INSERT INTO @Employee VALUES('Peter')

    INSERT INTO @Employee VALUES('Sam')

    -- * Medical Claims --

    DECLARE @MedicalClaim TABLE

    (

    ClaimRef CHAR(12), EmpID_fk INT , ClaimDate DATETIME, Amount DECIMAL(5,0)

    )

    INSERT INTO @MedicalClaim VALUES('MC01', 1, GETDATE(), 50)

    INSERT INTO @MedicalClaim VALUES('MC02', 2, GETDATE(), 60)

    INSERT INTO @MedicalClaim VALUES('MC03', 3, GETDATE(), 40)

    -- * OT Claims --

    DECLARE @OTClaim TABLE

    (

    ClaimRef CHAR(12), EmpID_fk INT , ClaimDate DATETIME, OTRate DECIMAL(2,0)

    )

    INSERT INTO @OTClaim VALUES('OT01', 1, GETDATE(), 1.5)

    INSERT INTO @OTClaim VALUES('OT02', 2, GETDATE(), 1.5)

    INSERT INTO @OTClaim VALUES('OT03', 3, GETDATE(), 2.0)

  • Also, you definitely do not need to use UNION here.

    Because you are selecting from multiple tables, your data will always be DISTINCT.

    Switch to UNION ALL and you will see improved performance immediately.

  • OK. Here's what I have so far.

    Try it out and let me know how it goes.

    Explanations are in the comments.

    ;WITH AllClaims AS

    -- first get all of the claims for all employess

    -- use UNION ALL because results will be unique

    (

    SELECT EmpID_fk, ClaimRef, ClaimDate, Amount AS ClaimAmount

    FROM @MedicalClaim

    UNION ALL

    SELECT EmpID_fk, ClaimRef, ClaimDate, OTRate * 2 AS ClaimAmount

    FROM @OTClaim

    ), OrderedClaims AS

    -- now join to employees table to get employee name

    -- this eliminates multiple joins to that table

    (

    SELECT AC.*

    ,E.Name

    , ROW_NUMBER() OVER (ORDER BY EmpID) AS RowNum

    FROM AllClaims AC

    JOIN @Employee E ON AC.EmpID_fk = E.EmpID

    )

    -- finally, select the rows we want

    SELECT *

    FROM OrderedClaims

    WHERE RowNum BETWEEN 1 AND 10

    ;WITH AllClaims AS

    -- same concept as above

    -- we only need to return one column because of the count

    -- returning extra data slows us down

    -- we don't care that EmpID_fk is not unique. We know the existing data is unique.

    (

    SELECT EmpID_fk

    FROM @MedicalClaim

    UNION ALL

    SELECT EmpID_fk

    FROM @OTClaim

    ), TopClaims AS

    -- counts are expensive

    -- pick a "threshold" number. you should choose it based on your data

    -- say your threshold is 1,000 you will only ever show on your web page the top 1,000 results

    -- this is usually not a problem because it is not practical for a user to page through that much data

    (

    SELECT TOP 3 *

    FROM AllClaims

    )

    -- now return the count and threshold number

    -- remember, the count will not exceed the threshold

    -- return the threshold for display purposes

    -- if you return the same count as the threshold display "Showing 1 - 10 of OVER 1000"

    SELECT COUNT(*) AS TotalNum

    ,1 AS Threshold

    FROM TopClaims

  • I'm just taking some fields from each table, the actual one has a very different structures.

    I could'nt use UNION ALL as it may retrieve duplicated records and I want to avoid this. (For some cases)

    Do you mean by setting a Threshold value, the users will always see the top 100 records?

    Thanks again for your advice. 🙂

  • I'm just taking some fields from each table, the actual one has a very different structures.

    I could'nt use UNION ALL as it may retrieve duplicated records and I want to avoid this. (For some cases)

    Do you mean by setting a Threshold value, the users will always see the top 100 records?

    Thanks again for your advice. 🙂

Viewing 15 posts - 1 through 15 (of 15 total)

You must be logged in to reply to this topic. Login to reply