Using OFFSET for Paging

  • rick-1071254 wrote:

    It's all a matter of how likely a possibility.

    Recipe for disaster? A better recipe would be trying to retrieve all 10 million records to your client in a "static" result set, which won't be "static" anyway because behind the scenes, in the database, the set is changing...so if I want to see what's new/changed, I have to retrieve, again, all 10 million+ records, every time I want to see what changed? No, I don't think so.

    You can stop right after "retrieve all 10 million records to your client". 

    Why, so you can not read the rest of what I said? I think you'll find that I'm not disagreeing with you. 🙂

  • rick-1071254 wrote:

    Jeff -

    There are other options out there.  Databases that use multi-version read consistency do exactly what you are asking - present an accurate picture of the state of data at a particular point in time, be it a simple count or a complete result set, even through paging.  Your DBMS of choice may not offer this, but others do.

    Yep.  SQL Server has many ways of doing just that.  It's also pretty easy to pull off if there's no built in method.  The point of my post wasn't aimed at you or DMBaker... it was aimed at the folks reading the article or this thread that might not understand the problems with paging, row counts. whatever and the fact that paging itself is not the fix (and, yeah... the two of you are covering that but I just wanted to simplify it for folks).

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

    Change is inevitable... Change for the better is not.


    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)

  • Jeff Moden wrote:

    rick-1071254 wrote:

    Jeff -

    There are other options out there.  Databases that use multi-version read consistency do exactly what you are asking - present an accurate picture of the state of data at a particular point in time, be it a simple count or a complete result set, even through paging.  Your DBMS of choice may not offer this, but others do.

    Yep.  SQL Server has many ways of doing just that.  It's also pretty easy to pull off if there's no built in method.  The point of my post wasn't aimed at you or DMBaker... it was aimed at the folks reading the article or this thread that might not understand the problems with paging, row counts. whatever and the fact that paging itself is not the fix.

    You're spot-on I think. Paging (or other remote options, sorting, filtering. etc.), however it's accomplished, should not be used without consideration for the issues it brings (especially potential inconsistency, concurrency issues). You really have to design your system for it from the beginning, whatever method you use.

  • dmbaker wrote:

    You're spot-on I think. Paging (or other remote options, sorting, filtering. etc.), however it's accomplished, should not be used without consideration for the issues it brings (especially potential inconsistency, concurrency issues). You really have to design your system for it from the beginning, whatever method you use.

    Exactly.  Or, if it's an existing system, understand that it's not going to be just a bolt on.  You have to plan for it and possibly make some significant changes.

    I believe that's also what you and Rick are both ultimately trying to get at and, I agree, that it seems like you both might be talking about the same thing but in a slightly different manner.

     

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

    Change is inevitable... Change for the better is not.


    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)

  • "Design your system for it"

    Whoa, slow down, let's not get all crazy here.  <g>  Violent agreement.

  • rick-1071254 wrote:

    Oh, what a recipe for disaster!  Since you are requerying the data every time, you have NO WAY to know if the offset is still valid.  You are going to the 10th row, for instance, but what if another row has been inserted between row 1 and 10, or deleted?  At that point, your data integrity is completely gone.  Please, never do this for data unless the data set is guaranteed to be completely static.

     

    Right but in most of the use cases I've seen for using this technique it's actually desirable that the results reflect the current data and it's entirely expected it might change run after run.

  • This article mentions the "cte trick" to pagination.  I wonder how how fast it would be locating the page and then using a tally table on rownums and then without the recompile option.

    https://sqlperformance.com/2015/01/t-sql-queries/pagination-with-offset-fetch

    Aus dem Paradies, das Cantor uns geschaffen, soll uns niemand vertreiben können

  • ZZartin -

    I think this is another example of a basic misunderstanding.  If the result set used to build the pages is different from one set to another, it most definitely does NOT reflect the current data.  It reflects data from multiple different times.  It is not "current" - or, if it is, the previous result was not.  It's just wrong, and does not reflect a unified state of data, possibly not at any time.

    If you retrieved the first 10 rows, had someone delete one of those rows, and then retrieved the next 10 rows with this method, you would never see the 11th row, as it would be skipped since it was now the 10th row.  That is not current - that is wrong.

    There are use cases where this does not matter, but you better make sure you have one by thinking it through.

  • Right they won't see the 11th row because it's no longer the 11th row and displaying it as such is no longer accurate.  They will see that row if they go back to the first 10 rows.

    I just think most of the real world use cases I see for this type of pagination in sql desire exactly that behavior, that the results reflect the data at the time of query, not the data as it was originally.  And that's a general issue with re querying volatile data not a specific issue with this technique.

  • Or maybe not even locate the page.  Just a tally function and table cardinality from dm view.

    Aus dem Paradies, das Cantor uns geschaffen, soll uns niemand vertreiben können

  • ZZartin wrote:

    Right they won't see the 11th row because it's no longer the 11th row and displaying it as such is no longer accurate.  They will see that row if they go back to the first 10 rows.

    I just think most of the real world use cases I see for this type of pagination in sql desire exactly that behavior, that the results reflect the data at the time of query, not the data as it was originally.  And that's a general issue with re querying volatile data not a specific issue with this technique.

    Exactly.  "multi-version read consistency" introduces phantom rows.

    Aus dem Paradies, das Cantor uns geschaffen, soll uns niemand vertreiben können

  • Nope, just wrong.  Multi-version read consistency does not introduce phantom rows - it gives a consistent view of data at a single point in time.  Databases are state machines - they track changes in state over time.  Mixing states is just wrong data.

    And if the user goes back to the previous 10 rows, they will see that row, but not the row that has been deleted.  So the previous page of data no longer matches the same page on its last retrieval.  And this can go on and on and on.

    That is not seeing the data at the time of the query.  That is seeing the data at the time of the retrieval of the second page, which is different from the time of the query.  If you are saying the the retrieval of the second page is a second query, then why even start with the 10th row?  Or the 11th, 9th or whatever?

    Seriously, this is what databases are for - to provide a consistent view of the state of data at a point in time.  What you are describing does not.

    • This reply was modified 4 years, 5 months ago by  rick-1071254.
    • This reply was modified 4 years, 5 months ago by  rick-1071254.
  • Showing a user the results of a query based on the data in the database is the point of a database.  Re running any query against a volatile data set will potentially give different results on back to back runs.  That is 100% expected behavior of a database.  Maybe in some cases a user would want a static and potentially out of date data set but in almost all cases I've run across where people want to use pagination they want what they're seeing represent the actual data present not invalid data.

  • ZZartin -

    Getting a second page of data is not rerunning a query.  It's getting more data from a single query.  That is what is under discussion in this article and these posts.

    If you contend that data volatility means that getting a consistent view of data is not possible, you are, unfortunately, just incorrect.

  • rick-1071254 wrote:

    Nope, just wrong.  Multi-version read consistency does not introduce phantom rows - it gives a consistent view of data at a single point in time.  Databases are state machines - they track changes in state over time.  Mixing states is just wrong data.

    And if the user goes back to the previous 10 rows, they will see that row, but not the row that has been deleted.  So the previous page of data no longer matches the same page on its last retrieval.  And this can go on and on and on.

    That is not seeing the data at the time of the query.  That is seeing the data at the time of the retrieval of the second page, which is different from the time of the query.  If you are saying the the retrieval of the second page is a second query, then why even start with the 10th row?  Or the 11th, 9th or whatever?

    Seriously, this is what databases are for - to provide a consistent view of the state of data at a point in time.  What you are describing does not.

    Do you expect your bank account to be locked while you page through your transaction history?

    Aus dem Paradies, das Cantor uns geschaffen, soll uns niemand vertreiben können

Viewing 15 posts - 16 through 30 (of 43 total)

You must be logged in to reply to this topic. Login to reply