Click here to monitor SSC
SQLServerCentral is supported by Red Gate Software Ltd.
 
Log in  ::  Register  ::  Not logged in
 
 
 
        
Home       Members    Calendar    Who's On


Add to briefcase ««123»»

Returning a Subset of a Recordset Expand / Collapse
Author
Message
Posted Friday, September 14, 2001 3:55 PM
SSCertifiable

SSCertifiableSSCertifiableSSCertifiableSSCertifiableSSCertifiableSSCertifiableSSCertifiableSSCertifiableSSCertifiable

Group: Moderators
Last Login: Yesterday @ 7:39 AM
Points: 6,705, Visits: 1,677
Good eye on the Top 0 - I was trying to just get an alternative posted quickly and obviously didnt test well enough.

I'll grant your point that my example doesnt meet your requirements in all cases (or heck, even in many maybe). I'll take another shot at it this weekend if time permits - I still believe in trying to achieve a set based solution first. Server side cursors are expensive. Keep in mind that I wasnt arguing against your solution - as I mentioned in my earlier post it depends on a lot of things before you call it a 'bad' solution.

My job here is to try to evangelize best practices crossed with common sense and try to offer ideas for alternatives. In this case IF you could accomplish the goal with either a cursor or a set based solution, Im betting the set based one will use less resources and execute faster. The trick is - can you figure out a set based solution in the same amount of time you can figure out a cursor based one? Most developers know how to write loops, so when a problem looks complex they tend to revert to what they know. Sometimes that is the only way - might be in this case! The second part of the discussion is if you can't build a set based solution, I'd vote for doing the loop in the middle tier, not the data tier.

We've got a similar discussion under way in the T-SQL forum, you might want to look at that as well. Hope you'll continue to participate, you'd started a discussion that I think a lot of our readers will benefit from. And remember, you've got Jon to thank for getting you into this!

Andy



Andy
SQLAndy - My Blog!
Connect with me on LinkedIn
Follow me on Twitter
Post #20622
Posted Friday, September 14, 2001 5:30 PM
SSC-Enthusiastic

SSC-EnthusiasticSSC-EnthusiasticSSC-EnthusiasticSSC-EnthusiasticSSC-EnthusiasticSSC-EnthusiasticSSC-EnthusiasticSSC-Enthusiastic

Group: General Forum Members
Last Login: Thursday, September 09, 2004 9:43 PM
Points: 101, Visits: 1
Andy,

I agree with you - set based solutions are usually better than cursors, and I wasn't calling your solution a 'bad' one. I'm just relating from experience that when given the complexity of the code needed to fully support the goal as I see it, the cursor solution is so much easier and, in my opinion, not much different in performance or resources, within limits.

I write a lot of appliations and have to keep setting up these 'browse' procedures for each one. I haven't built a 'univeral' set based procedure that could handle any set of columns from any table. If I could, then it would be a much better solution for me.

Of course, now that I've mentioned it, you and everyone else will probably want to see it attempted. And I would like to build the 'universal' procedure, just haven't had the time to devote to it. Maybe with the assitance of others we can get it done.

Also, which topic in the T-SQL forum are you talking about?

Jay Madren



Jay Madren
Post #20623
Posted Friday, September 14, 2001 6:23 PM
SSCertifiable

SSCertifiableSSCertifiableSSCertifiableSSCertifiableSSCertifiableSSCertifiableSSCertifiableSSCertifiableSSCertifiable

Group: Moderators
Last Login: Yesterday @ 7:39 AM
Points: 6,705, Visits: 1,677
A universal browse function sounds interesting (and yes, we may have to try it!), but when you try to make a proc all things to all users, you lose the compilation benefit in most cases. You do keep in on the server (good or bad depending on your view and your needs) and the code is centralized.

Performance on a pure execution basis is a bit different for those two solutions we have so far(and I havent forgotten that mine doesnt yet meet the spec). Running the orig solution generates about 6 times the disk reads that my solution does.

An interesting problem.

Andy


This is the most recent one:
http://www.sqlservercentral.com/forum/topic.asp?TOPIC_ID=1014&FORUM_ID=8&CAT_ID=1&Topic_Title=Selecting+the+n%27th+to+the+n%27th+%2B+x+set+of+rows&Forum_Title=T%2DSQL

And an earlier one, a little different:
http://www.sqlservercentral.com/forum/topic.asp?TOPIC_ID=989&FORUM_ID=8&CAT_ID=1&Topic_Title=T%2DSQL+for+%22Last+n+records%22+in+a+table&Forum_Title=T%2DSQL



Andy
SQLAndy - My Blog!
Connect with me on LinkedIn
Follow me on Twitter
Post #20624
Posted Saturday, September 15, 2001 10:20 AM


SSC-Dedicated

SSC-DedicatedSSC-DedicatedSSC-DedicatedSSC-DedicatedSSC-DedicatedSSC-DedicatedSSC-DedicatedSSC-DedicatedSSC-DedicatedSSC-DedicatedSSC-DedicatedSSC-DedicatedSSC-Dedicated

Group: Administrators
Last Login: Yesterday @ 1:49 PM
Points: 32,768, Visits: 14,929
I wouldn't call this ludicrous either. I, in general, hate cursors and temp tables, but they have value. While I wouldn't use this approach, it appears to work.

Personally, I'd burden the programmer and develop a set of stored procedures (one for each sort) and then use a master proc to call one of them. This way the user still get's one proc, but I can optimize all of the individual procs. For those procs that are on unindexed columns, I'd probably combine them into one proc since a scan would occur anyway.

Steve Jones
steve@dkranch.net







Follow me on Twitter: @way0utwest

Forum Etiquette: How to post data/code on a forum to get the best help
Post #20625
Posted Saturday, September 15, 2001 3:13 PM
SSCertifiable

SSCertifiableSSCertifiableSSCertifiableSSCertifiableSSCertifiableSSCertifiableSSCertifiableSSCertifiableSSCertifiable

Group: Moderators
Last Login: Yesterday @ 7:39 AM
Points: 6,705, Visits: 1,677
Thats not a bad idea, though a bit of overhead to generate them all. Code generator would ease the pain.

Andy



Andy
SQLAndy - My Blog!
Connect with me on LinkedIn
Follow me on Twitter
Post #20626
Posted Monday, September 17, 2001 6:49 PM
SSCertifiable

SSCertifiableSSCertifiableSSCertifiableSSCertifiableSSCertifiableSSCertifiableSSCertifiableSSCertifiableSSCertifiable

Group: Moderators
Last Login: Yesterday @ 7:39 AM
Points: 6,705, Visits: 1,677
Ok, here is another attempt, see what you think. Sean & I were discussing this today, other ideas we had were to use the paging ability of ADO, use a keyset cursor which would basically do what Im doing here, hold the keys in a temp table. Anyone have thoughts on those?

If you didnt drop the temp table each time (I did, just to be clean) you could leverage the work you did on the first query on subsequent 'pages'. We've implemented an internal process that does that and have been very pleased with the results.

Andy


create proc usp_GetSubset @NameOfPKeyColumn varchar(50), @Select varchar(500), @From varchar(500), @BatchSize int = 10, @BatchesToSkip as int = 0 as

--9/17/01 Andy Warren
--Second try at offering an alternative to a cursor based solution, this is designed
--to be more generic, if somewhat clunky to use.

--NameofPkeyColumn Obviously, your primarykey on the table you're selecting from
--Select Columns you want to return, you MUST include the primarykey you
-- referenced in NameofPkeyColumn
--From The rest of your select including your where and order by
--BatchSize Number of rows to return
--BatchesToSkip Number of 'pages' to skip

declare @StartRow int
declare @EndRow int
declare @Sql varchar(1000)

--get all the rows into a temp table in the order we plan to use them, could add an
--index here if you had a LOT of rows
Create Table ##Temp (SortOrderID int identity(1,1), PKey sql_variant)
set @SQL = 'insert into ##temp (Pkey) select ' + @NameOfPKeyColumn + ' ' + @From
exec(@SQL)

--find the starting point for the 'page' we want, remembering that we'll be doing
--a 'between'
set @StartRow = (@BatchSize * @BatchesToSkip) + 1

--then the end point
set @EndRow = (@StartRow + @BatchSize) - 1

--get the data
set @sql='Select * from (' + @Select + ' ' + @From + ') O inner join ##Temp T on o.' + @NameOfPkeyColumn + ' = t.pkey where t.SortOrderID between ' + convert(varchar(10),@StartRow) + ' and ' + convert(varchar(10), @EndRow) + ' '
exec(@sql)

--clean up
drop table ##temp




Andy
SQLAndy - My Blog!
Connect with me on LinkedIn
Follow me on Twitter
Post #20627
Posted Tuesday, September 18, 2001 11:45 AM
SSC-Enthusiastic

SSC-EnthusiasticSSC-EnthusiasticSSC-EnthusiasticSSC-EnthusiasticSSC-EnthusiasticSSC-EnthusiasticSSC-EnthusiasticSSC-Enthusiastic

Group: General Forum Members
Last Login: Thursday, September 09, 2004 9:43 PM
Points: 101, Visits: 1
Before I start, let me state that I'm not criticizing your code. I'm just pointing out how it doesn't fulfill the requirements of my applications. I had intended to post those requirements over the weekend, but didn't get to it. So let me start by listing (most) of those:

- Page up and down through the records.
- Go to the first or last page of records.
- Handle null values.
- All of the above while ordered by any of the columns (including multiple column sorts)
- Locate a row using a search value for the primary column of the sort order, and return the page of records starting with that row.
- Adjust to changing rows. Rows could be inserted or deleted, or a column in the current sort order modified which causes the row to "move" to a different position. These changes could be from other users/processes or even your own application. So, every page of records returned should consist of the most current data.

Your latest example could satisfy (with some additions) all of these requirements except the last one - accounting for row changes. In all of the examples so far, it's assumed that the recordset is static, and therefore can be traversed by row number reliably. And this procedure would work great with a static recordset.

But for a dynamic recordset, you basically have to specify your current position by column values, not by row number. And you have to refresh the recordset on every "fetch" of records, which rules out the possibility of saving your "work" between calls.

When I starting trying to program a solution to all of this, I thought Dynamic Cursors were the answer. But you can't use a dynamic cursor if the Order By is not based solely on unique indexes, nor if any of the tables have triggers. So dynamic cursors were out. Then I worked on a set-based solution, which I achieved, but it was so specific to that browse that reusing it for other applications or just changing the columns being used meant a lot of recoding. So I settled on a set and cursor combination, which I felt provided a reasonable level of performance with a lot less coding, and it is easier to implement in the applications. I also told myself that I would go back and build either a self-configuring set-based procedure, or write a code generator that would do all of the coding (I've done this with other procedures). But I just haven't had the time to do it.

So, for the purpose of this discussion, here is an example of one of my set-based only procedures, followed by a couple of example calls. Note that it also does not use a temp table. And to illustrate my struggle with using this method, there would be five procedures (Page up, Page down, First page, Last page, Locate page) for EACH sortable column in the application browse. And most of my applications have 8 to 12 columns. That's 40 to 60 procedures for one browse!

Jay Madren

P.S. You mentioned using ADO. ADO on SQL Server is actually just using a global cursor.


CREATE PROC usp_OrdersByShipRegionNextPage
@PageSize int,
@CurrentShipRegion varchar(15),
@CurrentOrderID int
AS

/* --- 9/18/01 Jay Madren -------------------------------------------------------------------
An example of my set-based method of returning a subset of records. This procedure returns
the page of rows that come after the row identified by the values for all columns in the
ORDER BY clause, in this case ShipRegion & OrderID.

This procedure is designed for a browse on the Orders table in the Northwind sample database,
ordered by ShipRegion and OrderID.
-------------------------------------------------------------------------------------------*/

DECLARE @sql varchar(1000)

IF @CurrentShipRegion IS Null
SET @sql='SELECT TOP ' + CONVERT(varchar(10), @PageSize) + ' OrderID, OrderDate, ShipName, ShipCity, ShipRegion
FROM Orders
WHERE (ShipRegion IS Null AND OrderID >= ' + CONVERT(varchar(10),@CurrentOrderID) + ')
OR ShipRegion IS NOT Null
ORDER BY ShipRegion, OrderID'
ELSE
SET @sql='SELECT TOP ' + CONVERT(varchar(10), @PageSize) + ' OrderID, OrderDate, ShipName, ShipCity, ShipRegion
FROM Orders
WHERE ShipRegion >= ''' + @CurrentShipRegion + '''
AND (ShipRegion > ''' + @CurrentShipRegion + ''' OR OrderID >= ' + CONVERT(varchar(10),@CurrentOrderID) + ')
ORDER BY ShipRegion, OrderID'

SELECT @sql --For testing purposes to verify the generated statement
EXEC (@sql)
GO


EXEC usp_OrdersByShipRegionNextPage 20, Null, 11075

EXEC usp_OrdersByShipRegionNextPage 20, 'BC', 10742 -- This should be the next page
 



Jay Madren
Post #20628
Posted Tuesday, September 18, 2001 11:57 AM
SSC-Enthusiastic

SSC-EnthusiasticSSC-EnthusiasticSSC-EnthusiasticSSC-EnthusiasticSSC-EnthusiasticSSC-EnthusiasticSSC-EnthusiasticSSC-Enthusiastic

Group: General Forum Members
Last Login: Thursday, September 09, 2004 9:43 PM
Points: 101, Visits: 1
Oops. I just realized that I forgot to remove the equal signs in the OrderID comparisons. With the equal signs, the page returned includes the "current" row. I use this sometimes to display the last row from the previous page as the first row on the next page. Just remove the equal sign in the OrderID comparison of each Select statement to produce the behavior as stated in the procedure.

Jay



Jay Madren
Post #20629
Posted Friday, August 29, 2003 6:07 AM
Forum Newbie

Forum NewbieForum NewbieForum NewbieForum NewbieForum NewbieForum NewbieForum NewbieForum Newbie

Group: General Forum Members
Last Login: Thursday, February 19, 2004 5:50 AM
Points: 2, Visits: 1
Two problems here, one significant, one not so. The significant problem seems to be that the proc works based on the assumption that the state of the Order table doesn't change (i.e., no inserts, deletes, updates to cluster key values). Obviously if the state of the table changes, working with relative rowcounts is invalid (although the order by option MIGHT save you here). Second, the lesser of the two, is simply the performance, resource and locking implications of using a cursor, an approach which might be entirely valid under several scenarios.




Post #20630
Posted Friday, August 29, 2003 7:51 AM
Forum Newbie

Forum NewbieForum NewbieForum NewbieForum NewbieForum NewbieForum NewbieForum NewbieForum Newbie

Group: General Forum Members
Last Login: Wednesday, December 22, 2004 11:40 AM
Points: 1, Visits: 1
I have had to deal with this need repeatedly for websites where we do not want to return the entire result set to the user for both preservation of network load, response time (sending that many rows to a browser) and the simple fact that a user

cannot reasonably work with that much data.

To make matters worse, most requirements ask that the result set be sortable by clicking on columns of the resulting grid to dynamically sort by them.

In addition, users expect links at the bottom and top of the result set to easily page through their data ... thanks to Google and other similar browsing models.

On my projects we have implemented several approaches to deal with this problem.


Later versions of Oracle kindly provided a ROWNUM or similar variable that could be included in an inner query and limited by the desired range (i.e. SELECT .... WHERE ROWNUM >= 25 and ROWNUM <=50).

We have implemented two approaches for SQL Server depending on the size of the dataset we are dealing with and the normal patterns of use.

Server Side:

We wrote a utility function in SQL Server

ALTER FUNCTION dbo.fGetPagedSqlText
(
@KeyField varchar(100),
@PageNum int,
@RecsPerPage int,
@Select varchar(4000),
@FromForFilter varchar(4000),
@WhereForFilter varchar(4000),
@FromForSelect varchar(4000),
@WhereForSelect varchar(4000),
@Order varchar(4000)
)
RETURNS varchar(8000) AS
BEGIN
declare @sql varchar(8000);
set @sql = 'set nocount on;'
set @sql = @sql + ' DECLARE @FirstRec int, @LastRec int;'
set @sql = @sql + ' SET @FirstRec = (' + convert(varchar, @PageNum) + '- 1) * ' + convert(varchar, @RecsPerPage) + ' + 1;'
set @sql = @sql + ' SET @LastRec = (' + convert(varchar, @PageNum) + '* ' + convert(varchar,@RecsPerPage) + ');'
set @sql = @sql + ' declare @TempPaged table'
set @sql = @sql + ' (ID int identity,'
set @sql = @sql + ' KeyId uniqueidentifier '
set @sql = @sql + ' );'
-- set @sql = @sql + ' create index idx_tmp on #TempPaged (ID, KeyId); '
set @sql = @sql + ' insert into @TempPaged'
set @sql = @sql + ' (KeyId)'
set @sql = @sql + ' select ' + @KeyField + ' from ' + @FromForFilter
if (LEN(@WhereForFilter) > 0)
set @sql = @sql + ' where ' + @WhereForFilter
if (LEN(@Order) > 0)
set @sql = @sql + ' order by ' + @order + ';'
else
set @sql = @sql + ';'

-- Changed rowcount to use @@rowcount variable instead of tablescan of temp table
-- set @sql = @sql + ' select @@rowcount; set nocount off;'
set @sql = @sql + ' select count(1) from @TempPaged;'
set @sql = @sql + ' select ' + @select
set @sql = @sql + ' from ' + @FromForSelect + ' join @TempPaged tempPaged on (' + @keyField + ' = tempPaged.KeyId) '
set @sql = @sql + ' where '
if LEN(@WhereForSelect) > 0
set @sql = @sql + @WhereForSelect + ' and '
set @sql = @sql + ' ID between @FirstRec and @LastRec
ORDER BY ID'
return @sql;
END


when called:
select dbo.fGetPagedSQLText('ItemId', 3, 50, 'ItemId, ItemName, ItemDesc', 'Items', 'ItemName like ''Widget%''', 'Items', '', 'ItemName')

It returns (formatted manually for readability):

set nocount on;

DECLARE @FirstRec int,
@LastRec int;

SET @FirstRec = (3- 1) * 50 + 1;
SET @LastRec = (3* 50);

declare @TempPaged table
(ID int identity,
KeyId uniqueidentifier );

insert into @TempPaged (KeyId)
select ItemId
from Items
where ItemName like 'Widget%'
order by ItemName;

select count(1) from @TempPaged;

select ItemId,
ItemName,
ItemDesc
from Items
join @TempPaged tempPaged on (ItemId = tempPaged.KeyId)
where ID between @FirstRec and @LastRec
ORDER BY ID


This is then dynamically executed.

Pros:
It works
It is easy for developers to use
Any column is orderable
It returns the number of total rows to allow for "x of y" display to the user and

other navigational hints


Cons:
Dynamic SQL
Temporary Table use
Single column primary key only for this version
Under heavy load this approach can put a large load on the database due to temp tables... additional considerations should be taken to try to limit the size of the data put in the temporary table


Alternative:

For some of our problematic queries we are attempting a version of paging referenced in a the Microsoft Data Application Block V2 reference architecture.

In their example a client side implementation is used.

.Net provides two primary methods for pulling data out of SQL Server. One is a DataReader that is a forward only stream that only holds the current record in memory at any time. The other is a DataSet that contains an Xml representation of the entire result set.

In our data layer, we loop through the rows of the data reader and populate a dataset. The DataSet is then sent to the Business or Presentation layer.

Here is a small fragment of the code:

...

DataRow fillRow;
string fieldName;
int recNumber = 0;
int totalRecords = from + count;
while( dataReader.Read() )
{
if( recNumber++ >= from )
{
fillRow = dataTable.NewRow();
for( int fieldIdx = 0; fieldIdx < dataReader.FieldCount; fieldIdx++ )
{
fieldName = dataReader.GetName( fieldIdx );
if( dataTable.Columns.IndexOf( fieldName ) == -1 )
dataTable.Columns.Add( fieldName, dataReader.GetValue(fieldIdx ).GetType() );
fillRow[ fieldName ] = dataReader.GetValue( fieldIdx );
}
dataTable.Rows.Add( fillRow );
}
if( count != 0 && totalRecords <= recNumber )
break;
}
dataSet.AcceptChanges();
dataReader.Close();

...


Pros:
Does not depend on temporary tables
Any number of columns is allowable for primary key

Cons:
Sorting is responsibility of the developer (is this really a con??)

At this point, we are not positive exactly how many records SQL Server truly returns to the client. We have heard but not found in documentation that .NET buffers some of the returned rows that the DataReader loops through. This could mean that a larger number of rows than desired are being sent from the database server to the web server.

Also, usage patterns show that our users do not often go to the end of result sets. Usually they will narrow the results and search again. This means that if .NET is closing the result set after reading towards the top, we are not taking the hit of

pulling the entire set across to only read the last 'page'.

Our performance testing should reveal more information and I will be glad to share the results with anyone interested.

Sorry for the extremely long message. This has been a challenging dilema for us on several projects and I hope that this information will help. If you have improvements or alternatives, please let me know as well.






Post #20631
« Prev Topic | Next Topic »

Add to briefcase ««123»»

Permissions Expand / Collapse