Summary/Filter Counts for a large result set with Paging

Question

Summary/Filter Counts for a large result set with Paging

shaneUK

SSC Journeyman

Points: 78
More actions
May 15, 2012 at 3:33 am

#392262

Good morning
I am looking for a efficient way to get summary counts for a entire data set on certain columns and then to page the data.
A lot of sites that return large data sets allow you to refine the result sets, they show a list of values for each column along with the counts.
e.g. eBuyer memory search
or Right move property search
What is the most efficient way of doing this without doing sub queries for each column. To do the paging is simple enough with another query for total count but not sure how to get the distinct values for each column without doing another group by on the entire dataset.
I can obviously have one sp with multiple result sets returned but not sure if a better way ?
I could put the initial data with the original filter in a temp table and then do group bys on this but memory consumption could be large if searching a lot of rows.
Any input ?
Thanks
Shane

Viewing 7 posts - 1 through 7 (of 7 total)

You must be logged in to reply to this topic. Login to reply

HowardW One Orange Chip Points: 29109 More actions · Answer 1

Depends on the average size of the data set. Both of the examples you've shown will be caching this on the application server - e.g. the database engine returns the whole dataset and the application layer does the paging/grouping/counting.

Obviously if you're producing a data set in the millions, then you might have to do this activity on the database side, but generally, RAM's quite cheap, it's easier to scale out application server capacity than database and it gives a faster end-user experience. .Net has built in in-memory paging and group-by methods, so it's pretty easy to implement.

shaneUK SSC Journeyman Points: 78 More actions · Answer 2

If we are using a very large data set then and each server has to support hundreds of concurrent users then the amount of memory required on the app server is very large.

Also the bandwidth to return the entire data set back to the application server will be quite large and time to transmit it will also be too big I would imagine.

Not sure that is the most efficient. Maybe I am wrong, not sure!! 🙂

Shane

HowardW One Orange Chip Points: 29109 More actions · Answer 3

You're going to have to define "very large dataset".

If it's really huge, then yes, you'll have to do this in the database, but it obviously has complications and costs.

You may be able to experiment with the WITH CUBE option to get the counts of each column from a single query rather than multiple group bys, but it would still be a seperate query to the overall dataset and database-side paging can be very costly if your front-end supports multiple sort orders.

shaneUK SSC Journeyman Points: 78 More actions · Answer 4

Thanks for your response

Just tried With Cube and works quite well. Just got to decide wether to requery the data set again for boththe 'with cube' operation and the paging operation or wether to use a temp table (normally hate this) and use that for paging and the with cube command so I dont have to query the larger data set twice.

Will have to test both options.

Thanks

drew.allen SSC Guru Points: 77014 More actions · Answer 5

The non-standard WITH CUBE has been deprecated as of SQL 2008 in favor of the standard CUBE(<composite element list>) syntax that was introduced in SQL 2008.

GROUP BY (Transact-SQL)
WITH CUBE
This feature will be removed in a future version of Microsoft SQL Server. Avoid using this feature in new development work, and plan to modify applications that currently use this feature.

Drew

J. Drew Allen
Business Intelligence Analyst
Philadelphia, PA

shaneUK SSC Journeyman Points: 78 More actions · Answer 6

Thanks for the update, I modified my query to use the GROUPING SETS feature instead which gives me a better count of all different field options individually (but not all combinations) which is perfect.

e.g.

SELECT Col1, Col2, Col3, Count(*) as TotalCount

FROM Table1

GROUP BY GROUPING SETS(Col1, Col2, Col3)

Have not tested performance on a large dataset yet