Returning a Subset of a Recordset

  • Ok, here is another attempt, see what you think. Sean & I were discussing this today, other ideas we had were to use the paging ability of ADO, use a keyset cursor which would basically do what Im doing here, hold the keys in a temp table. Anyone have thoughts on those?

    If you didnt drop the temp table each time (I did, just to be clean) you could leverage the work you did on the first query on subsequent 'pages'. We've implemented an internal process that does that and have been very pleased with the results.

    Andy

    create proc usp_GetSubset @NameOfPKeyColumn varchar(50), @Select varchar(500), @From varchar(500), @BatchSize int = 10, @BatchesToSkip as int = 0 as

    --9/17/01 Andy Warren

    --Second try at offering an alternative to a cursor based solution, this is designed

    --to be more generic, if somewhat clunky to use.

    --NameofPkeyColumnObviously, your primarykey on the table you're selecting from

    --SelectColumns you want to return, you MUST include the primarykey you

    --referenced in NameofPkeyColumn

    --FromThe rest of your select including your where and order by

    --BatchSizeNumber of rows to return

    --BatchesToSkipNumber of 'pages' to skip

    declare @StartRow int

    declare @EndRow int

    declare @Sql varchar(1000)

    --get all the rows into a temp table in the order we plan to use them, could add an

    --index here if you had a LOT of rows

    Create Table ##Temp (SortOrderID int identity(1,1), PKey sql_variant)

    set @SQL = 'insert into ##temp (Pkey) select ' + @NameOfPKeyColumn + ' ' + @From

    exec(@SQL)

    --find the starting point for the 'page' we want, remembering that we'll be doing

    --a 'between'

    set @StartRow = (@BatchSize * @BatchesToSkip) + 1

    --then the end point

    set @EndRow = (@StartRow + @BatchSize) - 1

    --get the data

    set @sql='Select * from (' + @Select + ' ' + @From + ') O inner join ##Temp T on o.' + @NameOfPkeyColumn + ' = t.pkey where t.SortOrderID between ' + convert(varchar(10),@StartRow) + ' and ' + convert(varchar(10), @EndRow) + ' '

    exec(@sql)

    --clean up

    drop table ##temp

  • Before I start, let me state that I'm not criticizing your code. I'm just pointing out how it doesn't fulfill the requirements of my applications. I had intended to post those requirements over the weekend, but didn't get to it. So let me start by listing (most) of those:

    - Page up and down through the records.

    - Go to the first or last page of records.

    - Handle null values.

    - All of the above while ordered by any of the columns (including multiple column sorts)

    - Locate a row using a search value for the primary column of the sort order, and return the page of records starting with that row.

    - Adjust to changing rows. Rows could be inserted or deleted, or a column in the current sort order modified which causes the row to "move" to a different position. These changes could be from other users/processes or even your own application. So, every page of records returned should consist of the most current data.

    Your latest example could satisfy (with some additions) all of these requirements except the last one - accounting for row changes. In all of the examples so far, it's assumed that the recordset is static, and therefore can be traversed by row number reliably. And this procedure would work great with a static recordset.

    But for a dynamic recordset, you basically have to specify your current position by column values, not by row number. And you have to refresh the recordset on every "fetch" of records, which rules out the possibility of saving your "work" between calls.

    When I starting trying to program a solution to all of this, I thought Dynamic Cursors were the answer. But you can't use a dynamic cursor if the Order By is not based solely on unique indexes, nor if any of the tables have triggers. So dynamic cursors were out. Then I worked on a set-based solution, which I achieved, but it was so specific to that browse that reusing it for other applications or just changing the columns being used meant a lot of recoding. So I settled on a set and cursor combination, which I felt provided a reasonable level of performance with a lot less coding, and it is easier to implement in the applications. I also told myself that I would go back and build either a self-configuring set-based procedure, or write a code generator that would do all of the coding (I've done this with other procedures). But I just haven't had the time to do it.

    So, for the purpose of this discussion, here is an example of one of my set-based only procedures, followed by a couple of example calls. Note that it also does not use a temp table. And to illustrate my struggle with using this method, there would be five procedures (Page up, Page down, First page, Last page, Locate page) for EACH sortable column in the application browse. And most of my applications have 8 to 12 columns. That's 40 to 60 procedures for one browse!

    Jay Madren

    P.S. You mentioned using ADO. ADO on SQL Server is actually just using a global cursor.

    CREATE PROC usp_OrdersByShipRegionNextPage

    @PageSize int,

    @CurrentShipRegion varchar(15),

    @CurrentOrderID int

    AS

    /* --- 9/18/01 Jay Madren -------------------------------------------------------------------

    An example of my set-based method of returning a subset of records. This procedure returns

    the page of rows that come after the row identified by the values for all columns in the

    ORDER BY clause, in this case ShipRegion & OrderID.

    This procedure is designed for a browse on the Orders table in the Northwind sample database,

    ordered by ShipRegion and OrderID.

    -------------------------------------------------------------------------------------------*/

    DECLARE @sql varchar(1000)

    IF @CurrentShipRegion IS Null

    SET @sql='SELECT TOP ' + CONVERT(varchar(10), @PageSize) + ' OrderID, OrderDate, ShipName, ShipCity, ShipRegion

    FROM Orders

    WHERE (ShipRegion IS Null AND OrderID >= ' + CONVERT(varchar(10),@CurrentOrderID) + ')

    OR ShipRegion IS NOT Null

    ORDER BY ShipRegion, OrderID'

    ELSE

    SET @sql='SELECT TOP ' + CONVERT(varchar(10), @PageSize) + ' OrderID, OrderDate, ShipName, ShipCity, ShipRegion

    FROM Orders

    WHERE ShipRegion >= ''' + @CurrentShipRegion + '''

    AND (ShipRegion > ''' + @CurrentShipRegion + ''' OR OrderID >= ' + CONVERT(varchar(10),@CurrentOrderID) + ')

    ORDER BY ShipRegion, OrderID'

    SELECT @sql --For testing purposes to verify the generated statement

    EXEC (@sql)

    GO

    EXEC usp_OrdersByShipRegionNextPage 20, Null, 11075

    EXEC usp_OrdersByShipRegionNextPage 20, 'BC', 10742 -- This should be the next page

     


    Jay Madren

  • Oops. I just realized that I forgot to remove the equal signs in the OrderID comparisons. With the equal signs, the page returned includes the "current" row. I use this sometimes to display the last row from the previous page as the first row on the next page. Just remove the equal sign in the OrderID comparison of each Select statement to produce the behavior as stated in the procedure.

    Jay


    Jay Madren

  • Two problems here, one significant, one not so. The significant problem seems to be that the proc works based on the assumption that the state of the Order table doesn't change (i.e., no inserts, deletes, updates to cluster key values). Obviously if the state of the table changes, working with relative rowcounts is invalid (although the order by option MIGHT save you here). Second, the lesser of the two, is simply the performance, resource and locking implications of using a cursor, an approach which might be entirely valid under several scenarios.

  • I have had to deal with this need repeatedly for websites where we do not want to return the entire result set to the user for both preservation of network load, response time (sending that many rows to a browser) and the simple fact that a user

    cannot reasonably work with that much data.

    To make matters worse, most requirements ask that the result set be sortable by clicking on columns of the resulting grid to dynamically sort by them.

    In addition, users expect links at the bottom and top of the result set to easily page through their data ... thanks to Google and other similar browsing models.

    On my projects we have implemented several approaches to deal with this problem.

    Later versions of Oracle kindly provided a ROWNUM or similar variable that could be included in an inner query and limited by the desired range (i.e. SELECT .... WHERE ROWNUM >= 25 and ROWNUM <=50).

    We have implemented two approaches for SQL Server depending on the size of the dataset we are dealing with and the normal patterns of use.

    Server Side:

    We wrote a utility function in SQL Server

    ALTER FUNCTION dbo.fGetPagedSqlText

    (

    @KeyField varchar(100),

    @PageNum int,

    @RecsPerPage int,

    @Select varchar(4000),

    @FromForFilter varchar(4000),

    @WhereForFilter varchar(4000),

    @FromForSelect varchar(4000),

    @WhereForSelect varchar(4000),

    @Order varchar(4000)

    )

    RETURNS varchar(8000) AS

    BEGIN

    declare @sql varchar(8000);

    set @sql = 'set nocount on;'

    set @sql = @sql + ' DECLARE @FirstRec int, @LastRec int;'

    set @sql = @sql + ' SET @FirstRec = (' + convert(varchar, @PageNum) + '- 1) * ' + convert(varchar, @RecsPerPage) + ' + 1;'

    set @sql = @sql + ' SET @LastRec = (' + convert(varchar, @PageNum) + '* ' + convert(varchar,@RecsPerPage) + ');'

    set @sql = @sql + ' declare @TempPaged table'

    set @sql = @sql + ' (ID int identity,'

    set @sql = @sql + ' KeyId uniqueidentifier '

    set @sql = @sql + ' );'

    --set @sql = @sql + ' create index idx_tmp on #TempPaged (ID, KeyId); '

    set @sql = @sql + ' insert into @TempPaged'

    set @sql = @sql + ' (KeyId)'

    set @sql = @sql + ' select ' + @KeyField + ' from ' + @FromForFilter

    if (LEN(@WhereForFilter) > 0)

    set @sql = @sql + ' where ' + @WhereForFilter

    if (LEN(@Order) > 0)

    set @sql = @sql + ' order by ' + @order + ';'

    else

    set @sql = @sql + ';'

    --Changed rowcount to use @@rowcount variable instead of tablescan of temp table

    --set @sql = @sql + ' select @@rowcount; set nocount off;'

    set @sql = @sql + ' select count(1) from @TempPaged;'

    set @sql = @sql + ' select ' + @select

    set @sql = @sql + ' from ' + @FromForSelect + ' join @TempPaged tempPaged on (' + @keyField + ' = tempPaged.KeyId) '

    set @sql = @sql + ' where '

    if LEN(@WhereForSelect) > 0

    set @sql = @sql + @WhereForSelect + ' and '

    set @sql = @sql + ' ID between @FirstRec and @LastRec

    ORDER BY ID'

    return @sql;

    END

    when called:

    select dbo.fGetPagedSQLText('ItemId', 3, 50, 'ItemId, ItemName, ItemDesc', 'Items', 'ItemName like ''Widget%''', 'Items', '', 'ItemName')

    It returns (formatted manually for readability):

    set nocount on;

    DECLARE @FirstRec int,

    @LastRec int;

    SET @FirstRec = (3- 1) * 50 + 1;

    SET @LastRec = (3* 50);

    declare @TempPaged table

    (ID int identity,

    KeyId uniqueidentifier );

    insert into @TempPaged (KeyId)

    select ItemId

    from Items

    where ItemName like 'Widget%'

    order by ItemName;

    select count(1) from @TempPaged;

    select ItemId,

    ItemName,

    ItemDesc

    from Items

    join @TempPaged tempPaged on (ItemId = tempPaged.KeyId)

    where ID between @FirstRec and @LastRec

    ORDER BY ID

    This is then dynamically executed.

    Pros:

    It works

    It is easy for developers to use

    Any column is orderable

    It returns the number of total rows to allow for "x of y" display to the user and

    other navigational hints

    Cons:

    Dynamic SQL

    Temporary Table use

    Single column primary key only for this version

    Under heavy load this approach can put a large load on the database due to temp tables... additional considerations should be taken to try to limit the size of the data put in the temporary table

    Alternative:

    For some of our problematic queries we are attempting a version of paging referenced in a the Microsoft Data Application Block V2 reference architecture.

    In their example a client side implementation is used.

    .Net provides two primary methods for pulling data out of SQL Server. One is a DataReader that is a forward only stream that only holds the current record in memory at any time. The other is a DataSet that contains an Xml representation of the entire result set.

    In our data layer, we loop through the rows of the data reader and populate a dataset. The DataSet is then sent to the Business or Presentation layer.

    Here is a small fragment of the code:

    ...

    DataRow fillRow;

    string fieldName;

    int recNumber = 0;

    int totalRecords = from + count;

    while( dataReader.Read() )

    {

    if( recNumber++ >= from )

    {

    fillRow = dataTable.NewRow();

    for( int fieldIdx = 0; fieldIdx < dataReader.FieldCount; fieldIdx++ )

    {

    fieldName = dataReader.GetName( fieldIdx );

    if( dataTable.Columns.IndexOf( fieldName ) == -1 )

    dataTable.Columns.Add( fieldName, dataReader.GetValue(fieldIdx ).GetType() );

    fillRow[ fieldName ] = dataReader.GetValue( fieldIdx );

    }

    dataTable.Rows.Add( fillRow );

    }

    if( count != 0 && totalRecords <= recNumber )

    break;

    }

    dataSet.AcceptChanges();

    dataReader.Close();

    ...

    Pros:

    Does not depend on temporary tables

    Any number of columns is allowable for primary key

    Cons:

    Sorting is responsibility of the developer (is this really a con??)

    At this point, we are not positive exactly how many records SQL Server truly returns to the client. We have heard but not found in documentation that .NET buffers some of the returned rows that the DataReader loops through. This could mean that a larger number of rows than desired are being sent from the database server to the web server.

    Also, usage patterns show that our users do not often go to the end of result sets. Usually they will narrow the results and search again. This means that if .NET is closing the result set after reading towards the top, we are not taking the hit of

    pulling the entire set across to only read the last 'page'.

    Our performance testing should reveal more information and I will be glad to share the results with anyone interested.

    Sorry for the extremely long message. This has been a challenging dilema for us on several projects and I hope that this information will help. If you have improvements or alternatives, please let me know as well.

  • The first time this article posted, I got quite a response. Due to those responses, I wrote a follow up article addressing several other methods for solving the same problem. These in addition to the ones you mentioned above should provide users with a wide variety of potential solutions. Thanks for the input.

    Additional Articel Link: http://www.sqlservercentral.com/columnists/jwiner/moreonreturningasubsetofarecordset_1.asp

  • I implemented a solution with the same goals in mind in the past although using a slightly different method. The issue that this method was addressing is not to re-run the query every time.

    I do not have to code available, but I will describe how it works. Instead of using a cursor, the full select for the query is used to populate a scratch (not temp) table that is created on the fly. You can either do this with Select...Into syntax or creating a table and then populating it.

    Select into is nice because you don't have to know the schema ahead of time, though the downside is the potential locking of system schema tables.

    In the creation of the table, you add an identity field and make it the primary key.

    An additional component to this method is a master catalog table say Reports, that knows about each of the 'snapshot' tables that is populated and has an expiration date associated with each. The key on catalog table is an id combined with a hash of the where clause and order by.

    The stored procedure takes an SQL statement as a parameter (I  believe it was as ntext), along with the report id, and the hashcode. It then checks against the master catalog to see if the snapshot exists, and if it has expired.

    If it has expired, then the sp will rerun the query to populate the table (The tables are named with guid) and return the table name. If the table exists and has not expired, then it will just return the table name.

    Once the table name has been returned, an ad-hoc query can be issued against the table, using the identity field to filter out just the intended records, without repeatedly re-running the query.

    One last component of this model, is a background SQL Server job that every 8 hours or so, looks at the master catalog, and drops tables that have not been accessed within the last hour. This is to prevent the db from bloating, as alot of data is constantly pouring into the tables.

    The downside, is that the data is a static snapshot at a point in time. But the expiration times can be tweaked for more time-critical data. Also the architecture allows forcing a refresh if necessary. Another downside is that for a large query there is an initial hit as data is populated. We planned to implement a multi-threaded mechanism to handle this, but never got around to it.

    The upside, is that once the table is populated successive calls happen almost instantaneously. You can go through millions of recrods instantly via the primary key to get to the results you want for the current page. The data is also stored according to the requested sort order which also speeds data delivery.

    Glenn Block

    Prinicipal Architect

    Streamline Solutions.

     

Viewing 7 posts - 16 through 21 (of 21 total)

You must be logged in to reply to this topic. Login to reply