RE: Returning a Subset of a Recordset

Grasshopper

Points: 13

August 29, 2003 at 7:51 am

#471985

I have had to deal with this need repeatedly for websites where we do not want to return the entire result set to the user for both preservation of network load, response time (sending that many rows to a browser) and the simple fact that a user

cannot reasonably work with that much data.

To make matters worse, most requirements ask that the result set be sortable by clicking on columns of the resulting grid to dynamically sort by them.

In addition, users expect links at the bottom and top of the result set to easily page through their data ... thanks to Google and other similar browsing models.

On my projects we have implemented several approaches to deal with this problem.

Later versions of Oracle kindly provided a ROWNUM or similar variable that could be included in an inner query and limited by the desired range (i.e. SELECT .... WHERE ROWNUM >= 25 and ROWNUM <=50).

We have implemented two approaches for SQL Server depending on the size of the dataset we are dealing with and the normal patterns of use.

Server Side:

We wrote a utility function in SQL Server

ALTER FUNCTION dbo.fGetPagedSqlText

(

@KeyField varchar(100),

@PageNum int,

@RecsPerPage int,

@Select varchar(4000),

@FromForFilter varchar(4000),

@WhereForFilter varchar(4000),

@FromForSelect varchar(4000),

@WhereForSelect varchar(4000),

@Order varchar(4000)

)

RETURNS varchar(8000) AS

BEGIN

declare @sql varchar(8000);

set @sql = 'set nocount on;'

set @sql = @sql + ' DECLARE @FirstRec int, @LastRec int;'

set @sql = @sql + ' SET @FirstRec = (' + convert(varchar, @PageNum) + '- 1) * ' + convert(varchar, @RecsPerPage) + ' + 1;'

set @sql = @sql + ' SET @LastRec = (' + convert(varchar, @PageNum) + '* ' + convert(varchar,@RecsPerPage) + ');'

set @sql = @sql + ' declare @TempPaged table'

set @sql = @sql + ' (ID int identity,'

set @sql = @sql + ' KeyId uniqueidentifier '

set @sql = @sql + ' );'

--set @sql = @sql + ' create index idx_tmp on #TempPaged (ID, KeyId); '

set @sql = @sql + ' insert into @TempPaged'

set @sql = @sql + ' (KeyId)'

set @sql = @sql + ' select ' + @KeyField + ' from ' + @FromForFilter

if (LEN(@WhereForFilter) > 0)

set @sql = @sql + ' where ' + @WhereForFilter

if (LEN(@Order) > 0)

set @sql = @sql + ' order by ' + @order + ';'

else

set @sql = @sql + ';'

--Changed rowcount to use @@rowcount variable instead of tablescan of temp table

--set @sql = @sql + ' select @@rowcount; set nocount off;'

set @sql = @sql + ' select count(1) from @TempPaged;'

set @sql = @sql + ' select ' + @select

set @sql = @sql + ' from ' + @FromForSelect + ' join @TempPaged tempPaged on (' + @keyField + ' = tempPaged.KeyId) '

set @sql = @sql + ' where '

if LEN(@WhereForSelect) > 0

set @sql = @sql + @WhereForSelect + ' and '

set @sql = @sql + ' ID between @FirstRec and @LastRec

ORDER BY ID'

return @sql;

END

when called:

select dbo.fGetPagedSQLText('ItemId', 3, 50, 'ItemId, ItemName, ItemDesc', 'Items', 'ItemName like ''Widget%''', 'Items', '', 'ItemName')

It returns (formatted manually for readability):

set nocount on;

DECLARE @FirstRec int,

@LastRec int;

SET @FirstRec = (3- 1) * 50 + 1;

SET @LastRec = (3* 50);

declare @TempPaged table

(ID int identity,

KeyId uniqueidentifier );

insert into @TempPaged (KeyId)

select ItemId

from Items

where ItemName like 'Widget%'

order by ItemName;

select count(1) from @TempPaged;

select ItemId,

ItemName,

ItemDesc

from Items

join @TempPaged tempPaged on (ItemId = tempPaged.KeyId)

where ID between @FirstRec and @LastRec

ORDER BY ID

This is then dynamically executed.

Pros:

It works

It is easy for developers to use

Any column is orderable

It returns the number of total rows to allow for "x of y" display to the user and

other navigational hints

Cons:

Dynamic SQL

Temporary Table use

Single column primary key only for this version

Under heavy load this approach can put a large load on the database due to temp tables... additional considerations should be taken to try to limit the size of the data put in the temporary table

Alternative:

For some of our problematic queries we are attempting a version of paging referenced in a the Microsoft Data Application Block V2 reference architecture.

In their example a client side implementation is used.

.Net provides two primary methods for pulling data out of SQL Server. One is a DataReader that is a forward only stream that only holds the current record in memory at any time. The other is a DataSet that contains an Xml representation of the entire result set.

In our data layer, we loop through the rows of the data reader and populate a dataset. The DataSet is then sent to the Business or Presentation layer.

Here is a small fragment of the code:

...

DataRow fillRow;

string fieldName;

int recNumber = 0;

int totalRecords = from + count;

while( dataReader.Read() )

{

if( recNumber++ >= from )

{

fillRow = dataTable.NewRow();

for( int fieldIdx = 0; fieldIdx < dataReader.FieldCount; fieldIdx++ )

{

fieldName = dataReader.GetName( fieldIdx );

if( dataTable.Columns.IndexOf( fieldName ) == -1 )

dataTable.Columns.Add( fieldName, dataReader.GetValue(fieldIdx ).GetType() );

fillRow[ fieldName ] = dataReader.GetValue( fieldIdx );

}

dataTable.Rows.Add( fillRow );

}

if( count != 0 && totalRecords <= recNumber )

break;

}

dataSet.AcceptChanges();

dataReader.Close();

...

Pros:

Does not depend on temporary tables

Any number of columns is allowable for primary key

Cons:

Sorting is responsibility of the developer (is this really a con??)

At this point, we are not positive exactly how many records SQL Server truly returns to the client. We have heard but not found in documentation that .NET buffers some of the returned rows that the DataReader loops through. This could mean that a larger number of rows than desired are being sent from the database server to the web server.

Also, usage patterns show that our users do not often go to the end of result sets. Usually they will narrow the results and search again. This means that if .NET is closing the result set after reading towards the top, we are not taking the hit of

pulling the entire set across to only read the last 'page'.

Our performance testing should reveal more information and I will be glad to share the results with anyone interested.

Sorry for the extremely long message. This has been a challenging dilema for us on several projects and I hope that this information will help. If you have improvements or alternatives, please let me know as well.