May 15, 2012 at 3:33 am
Good morning
I am looking for a efficient way to get summary counts for a entire data set on certain columns and then to page the data.
A lot of sites that return large data sets allow you to refine the result sets, they show a list of values for each column along with the counts.
e.g. eBuyer memory search
What is the most efficient way of doing this without doing sub queries for each column. To do the paging is simple enough with another query for total count but not sure how to get the distinct values for each column without doing another group by on the entire dataset.
I can obviously have one sp with multiple result sets returned but not sure if a better way ?
I could put the initial data with the original filter in a temp table and then do group bys on this but memory consumption could be large if searching a lot of rows.
Any input ?
Thanks
Shane
May 15, 2012 at 3:51 am
Depends on the average size of the data set. Both of the examples you've shown will be caching this on the application server - e.g. the database engine returns the whole dataset and the application layer does the paging/grouping/counting.
Obviously if you're producing a data set in the millions, then you might have to do this activity on the database side, but generally, RAM's quite cheap, it's easier to scale out application server capacity than database and it gives a faster end-user experience. .Net has built in in-memory paging and group-by methods, so it's pretty easy to implement.
May 15, 2012 at 3:59 am
If we are using a very large data set then and each server has to support hundreds of concurrent users then the amount of memory required on the app server is very large.
Also the bandwidth to return the entire data set back to the application server will be quite large and time to transmit it will also be too big I would imagine.
Not sure that is the most efficient. Maybe I am wrong, not sure!! 🙂
Shane
May 15, 2012 at 4:09 am
You're going to have to define "very large dataset".
If it's really huge, then yes, you'll have to do this in the database, but it obviously has complications and costs.
You may be able to experiment with the WITH CUBE option to get the counts of each column from a single query rather than multiple group bys, but it would still be a seperate query to the overall dataset and database-side paging can be very costly if your front-end supports multiple sort orders.
May 15, 2012 at 4:36 am
Thanks for your response
Just tried With Cube and works quite well. Just got to decide wether to requery the data set again for boththe 'with cube' operation and the paging operation or wether to use a temp table (normally hate this) and use that for paging and the with cube command so I dont have to query the larger data set twice.
Will have to test both options.
Thanks
May 15, 2012 at 8:39 am
The non-standard WITH CUBE has been deprecated as of SQL 2008 in favor of the standard CUBE(<composite element list>) syntax that was introduced in SQL 2008.
WITH CUBEThis feature will be removed in a future version of Microsoft SQL Server. Avoid using this feature in new development work, and plan to modify applications that currently use this feature.
Drew
J. Drew Allen
Business Intelligence Analyst
Philadelphia, PA
May 15, 2012 at 9:34 am
Thanks for the update, I modified my query to use the GROUPING SETS feature instead which gives me a better count of all different field options individually (but not all combinations) which is perfect.
e.g.
SELECT Col1, Col2, Col3, Count(*) as TotalCount
FROM Table1
GROUP BY GROUPING SETS(Col1, Col2, Col3)
Have not tested performance on a large dataset yet
Viewing 7 posts - 1 through 7 (of 7 total)
You must be logged in to reply to this topic. Login to reply