Group by on Large Database 750 GB+

  • I have a large database. Mostly all the tables are having records more than 50, 60.. near to 100 millions.

    I have to select only one column from the query. But the query is very complex. And there are lot many such queries which involves one or the other tables.

    The Query involves.

    1. Joins on two tables where the records are very large

    2. Joins are based on the uniqueidentifiers

    3. Group by on date, uniqueidentifier, varchar, int having large number of records

    4. There is having keyword in the group by to limit the no of records

    5. Tables are having non required columns and the number goes near to 30. Where as we require only one columns

    Current Approach.

    1. Create new database

    2. Create one tables for each table with required columns ( so that the number of SQL Pages are less) from the original

    doubt: Will that benefit in the Join? I guess yes, as the memory utilization will be less

    3. Put joins on these newly created tables.

    4. Put Group By on these newly created tables.

    Doubts?

    1. Should I create Indexes on the columns which i need to join? will that add benefit? I doubt as I know 90% of data is gonna satiesfy the condition.

    2. Is it a good idea to have Group by on large table? what is the other alternatives to group by on large table

    3. The group will that create a memory problem as the no of records being large.

    4. If I want to store the output of group by in a table, which approach is better to reduce the memory utilisation.

    Select * into

    OR

    Insert into table_name...

    5. Will there be memory related issues?

    Regards,

    Shailesh Gajare

  • It's going to be very difficult to answer the questions with any degree of authority without seeing the structures, sample data & queries run against it.

    Shailesh Gajare (2/9/2009)


    Doubts?

    1. Should I create Indexes on the columns which i need to join? will that add benefit? I doubt as I know 90% of data is gonna satiesfy the condition.

    Well, depending on the query, yes, indexes on the join columns will help. As a matter of fact, the foreign key column(s) can be a good choice for the clustered index, depending on the queries used, etc.

    2. Is it a good idea to have Group by on large table? what is the other alternatives to group by on large table

    GROUP BY is a query construct, not a thing that you do to the table. You can SELECT... FROM... WHERE... GROUP BY... Or are you asking if you should pre-aggregate data, which is something different entirely. The really short answer to your question, it depends. You can run aggregations against tables with really large amounts of data if the tables are well indexed and the queries and aggregations will take advantages of those indexes.

    3. The group will that create a memory problem as the no of records being large.

    Sorry, I don't understand this one

    4. If I want to store the output of group by in a table, which approach is better to reduce the memory utilisation.

    Select * into

    OR

    Insert into table_name...

    SELECT INTO creates a new table. Are you going to create a new table for each query? That doesn't make much sense. If you're asking how to pre-aggregate queries such that you run the aggregations at night and then the users will access the new table with the pre-aggregates, then I'd build the table and INSERT INTO. That allows you to maintain indexes, control the structure, etc.

    5. Will there be memory related issues?

    Yes. No. Again, it's impossible to answer such a general question based on such general information. Yes, memory could be an issue, but without more knowledge of your system, it's hard to say if it will be.

    "The credit belongs to the man who is actually in the arena, whose face is marred by dust and sweat and blood"
    - Theodore Roosevelt

    Author of:
    SQL Server Execution Plans
    SQL Server Query Performance Tuning

  • 2. Is it a good idea to have Group by on large table? what is the other alternatives to group by on large table

    GROUP BY is a query construct, not a thing that you do to the table. You can SELECT... FROM... WHERE... GROUP BY... Or are you asking if you should pre-aggregate data, which is something different entirely. The really short answer to your question, it depends. You can run aggregations against tables with really large amounts of data if the tables are well indexed and the queries and aggregations will take advantages of those indexes.

    Yes, you are right. I am looking for preaggregating the data, and the data being large (say near to 60 GB table size) will that group by will cause any problem with respect to memory? I guess there will be lots of IO (logical as well as physical), how do i minimize it? One more confirmation I am looking at ... if I have large data and have indexes on the columns which are being used in group by will that be utilized while doing a group by. [ i am just wondering if the select is traversing the whole table.. then whether table scan will be faster or will it really utilized the index and do a index seek operation? ]

    3. The group will that create a memory problem as the no of records being large.

    Sorry, I don't understand this one

    The questions is this: The group by will that create a memory problem as the no of records being large. ( table size is nera to 60 GB and the RAM is 8 GB)

    4. If I want to store the output of group by in a table, which approach is better to reduce the memory utilisation.

    Select * into

    OR

    Insert into table_name...

    SELECT INTO creates a new table. Are you going to create a new table for each query? That doesn't make much sense. If you're asking how to pre-aggregate queries such that you run the aggregations at night and then the users will access the new table with the pre-aggregates, then I'd build the table and INSERT INTO. That allows you to maintain indexes, control the structure, etc.

    Will the insert into operation will eat up of my RAM completly?

    Should do an insert with few records at a time? will that benefit in terms of memory utilization and time spent for complete insertion.

    5. Will there be memory related issues?

    Yes. No. Again, it's impossible to answer such a general question based on such general information. Yes, memory could be an issue, but without more knowledge of your system, it's hard to say if it will be.

    My Server Details are as follows: SQL SERVER 2005 64bit, 8 GB RAM

    the database size is 650GB , and table size is 60-70GB..

    Thanks for all your answers....

  • OK. So you want to pre-aggregate the data. This is an operation that is best done in the off hours. I'm not sure why you're so obsessed with the amount of memory it's going to use. It's likely to use quite a bit, but it's also going to use a lot of processing power and tempdb space. You're running on a 64bit machine, so the memory should be well managed. Concentrate on getting good queries, use the execution plans so you know what's happening, use indexes where indexes will help, run it in the off hours... You should be fine.

    If you're really concerned about processing time, you might consider looking into partitioning the preaggregate table so that you can add new aggregations by adding new partitions, but that really depends on how the data changes, what kind of data, etc.

    "The credit belongs to the man who is actually in the arena, whose face is marred by dust and sweat and blood"
    - Theodore Roosevelt

    Author of:
    SQL Server Execution Plans
    SQL Server Query Performance Tuning

  • Shailesh,

    You seem to be farily new to the complexities of managing large SQL Server databases. Grant is giving good advice, and if you put your questions into Google you will find other places that also give good advice. However, this is a very complex topic and IMHO you need to understand more about some of the fundamentals.

    I suggest you take the time to read about table aggregations and partitioning. After that, try to find a course that specialises in this area, or get your management to hire a consultant who can give you some targeted training.

    If you have an active SQL Server user group in your area, it can be good to attend. Often you will find others who have overcome some of these problems, and who can give more direct help than is possible in a internet forum. If you do this, after a few months you will find yourself answering questions other people are asking, and growing your career in SQL Server.

    Original author: https://github.com/SQL-FineBuild/Common/wiki/ 1-click install and best practice configuration of SQL Server 2019, 2017 2016, 2014, 2012, 2008 R2, 2008 and 2005.

    When I give food to the poor they call me a saint. When I ask why they are poor they call me a communist - Archbishop Hélder Câmara

Viewing 5 posts - 1 through 5 (of 5 total)

You must be logged in to reply to this topic. Login to reply