Huge size of Reporting tables

  • Hi All

    We have a DB for reporting which has tables which are more than 5 TB in size. As the business grew, we reached at these sizes. These tables have 1 month worth of data and we need this much data to generate many of our reports. So we can archive or purge these tables.

    Because of this huge size, many of our reporting procedures have become very slow(more than 45 min) although a very thorough index optimization is already done and query plans are all good.

    Now some questions to the experts:

    1. If the proc is optimized in best possible way and data size can't be reduced any further, what are my other options? We are already having enough memory and best SSDs to store this data. I mean , we don't face high CPU, Memory or IO latency.
    2. Are there some other alternatives where not storing this data in SQL Server helped in some other form like MongoDB,Hadoop etc?

    These tables are transactional replicated tables in SQL Server 2016. If anyone has any suggestions and guidelines to improve reporting data storage and performance, pls do share.

    Thanks

     

    Online Trainer For SQL DBA and Developer @RedBushTechnologies with 18 yrs exp.

  • Let's assume that the databases & queries are tuned (that's always a question for me though, tuning is an ongoing process, not a one & done proposition). For example, since we're talking 2016 and large scale systems, are you using columnstore indexes or clustered columnstore indexes? If not, are your queries analytical with lots of aggregations? Then you may have some query & index tuning opportunities remaining.

    However, if they're tuned, your options are simple. Reduce the data being processed. Can't do that? Get bigger/faster/more memory, cpu & disks. Guaranteed, if things are running that slow, there's work being done somewhere. Whether it's memory or disk or cpu, find where the work is, and spend money on making it faster.

    As to MongoDB or Hadoop, these are data collection mechanisms. They're great at it. Bar none, amazing. They are not reporting engines. In fact, most of the time, what you'll see is the data gets collected in MongoDB/Hadoop/Whatever and then it gets transformed into another database management system that's good at reporting, like SQL Server. So, no, I would absolutely not recommend moving to MongoDB to make your reporting queries run faster. They'll probably run longer, in all likelihood, a lot longer.

    Nope. It's always back to the basics. What kind of queries? What kind of data structures? Where is the hardware bottleneck? Statistics maintenance. Index structures. The code, the code, the code. This stuff is what's necessary. There's no magic bullet or secret switch. It's always about the basics.

    ----------------------------------------------------
    The credit belongs to the man who is actually in the arena, whose face is marred by dust and sweat and blood...
    Theodore Roosevelt

    The Scary DBA
    Author of: SQL Server 2017 Query Performance Tuning, 5th Edition and SQL Server Execution Plans, 3rd Edition
    Product Evangelist for Red Gate Software

  • I mean , we don't face high CPU, Memory or IO latency.

    What kind of issues do you face then?

    What's the bottleneck which makes your procedures slow?

  • My eyes sparkled when I saw "Grant Fritchey" replied my post. I have been a big fan of your books and thank you very much for your elaborated response.

    Thanks for pointing out to columnstore indexes or clustered columnstore indexes. We are actually not using them in our reporting DB. I'll explore the queries we have and see if their use could help us.

    I'll rule out mongodb and Hadoop.

    Hardware I feel we have enough to support our queries. I won't get more of it unless I prove it with stats that I need more of it. And I fear stats are not supporting me there. Memory, CPU and Disk all looks fine.

    We make very heavy use of temp tables in these procs. I have often seen queries mostly getting stuck in Insert into #temp tables.

    Although DBA's have followed all prevalent practices for tempdb like multiple secondary data files, enough space allocation etc.., but I have asked them to look deep if tempdb is a bottleneck.

     

    Online Trainer For SQL DBA and Developer @RedBushTechnologies with 18 yrs exp.

  • Sergiy wrote:

    I mean , we don't face high CPU, Memory or IO latency.

    What kind of issues do you face then?

    What's the bottleneck which makes your procedures slow?

    Hi Sergiy

    The issue we face is that some of the queries, which has joins with multiple large tables and pulls large amount of data to load in some temp tables takes more than 10 minutes. The plan for these queries is good with indexes seeked properly. We don't see any pressure on memory, disk or CPU. IT is just that query takes long to process to pick ,say 10 million, records from a join.

    Could it be that we have the best hardware, but it's limit is to process only that much of data in 10 minutes. I mean it can't be optimized further.

    Online Trainer For SQL DBA and Developer @RedBushTechnologies with 18 yrs exp.

  • Thanks for the kind words.

    Don't get me wrong, temp tables as a way to break up or break down data prior to other processing is a valid approach. However, it is also an approach that can be overused or used inappropriately. Not seeing any of the code, I can't make that call here. It absolutely is something that would draw my eye though. Another question to ask, is it possible that you need indexes on the temp tables because you're filtering or joining to them?

    I still lean heavily on it being about the basics. Although, it could be hardware, etc.. Check the wait statistics and queues for the queries you're running. Why are they slow? Where are they slow? Even if it's all flawless index seeks with merge joins (and no sorts), processing is occurring somewhere. That's the bottleneck. Identify it.

    ----------------------------------------------------
    The credit belongs to the man who is actually in the arena, whose face is marred by dust and sweat and blood...
    Theodore Roosevelt

    The Scary DBA
    Author of: SQL Server 2017 Query Performance Tuning, 5th Edition and SQL Server Execution Plans, 3rd Edition
    Product Evangelist for Red Gate Software

  • Grant, yes we have indexes on temp tables wherever needed.

    I'll deep dive into wait stats and see if I could get some help.

    Besides this, is it also possible that I have hit the limit of current hardwares and they can't process any faster? Some of my queries return upto 15 million records which gets pushed to a temp table. For such huge processing, 10 minutes could be my best bet.

    Have you ever faced the situation where you felt limited by the processing capabilities of the hardware even though you had it in abundance?

    Problem with SQL server I see if that it is vertically scalable and all I can do is add the best hardware to a single machine.

    Online Trainer For SQL DBA and Developer @RedBushTechnologies with 18 yrs exp.

  • Yep. That's the problem. There's no simple way to federate this kind of querying across multiple servers.

    However, with loads and loads of effort, you could. So that's an option, albeit an unattractive one.

     

    ----------------------------------------------------
    The credit belongs to the man who is actually in the arena, whose face is marred by dust and sweat and blood...
    Theodore Roosevelt

    The Scary DBA
    Author of: SQL Server 2017 Query Performance Tuning, 5th Edition and SQL Server Execution Plans, 3rd Edition
    Product Evangelist for Red Gate Software

  • S_Kumar_S wrote:

    Some of my queries return upto 15 million records which gets pushed to a temp table.

    Does not look like a good choice of querying strategy.

    It's never a good idea to create multi-million rows temp tables.

    I bet there is a better (smarter) way to get the desired reports.

  • (1) As always, the first step is to make sure you have the very best clustered index on every table of significant size.  That is not an easy thing to do, but it's a vital one.

    (2) Have you moved CLOB/BLOB columns out of row to better compact the main tables?

    (3) Have you page compressed the data in large tables yet?

    although a very thorough index optimization is already done and query plans are all good.

    That's also very hard to do.  I'm not at all convinced you've done that yet.  Would you mind posting the DDL, including all index definitions, for the (biggest) tables?

    SQL DBA,SQL Server MVP(07, 08, 09) Prosecutor James Blackburn, in closing argument in the Fatal Vision murders trial: "If in the future, you should cry a tear, cry one for them [the murder victims]. If in the future, you should say a prayer, say one for them. And if in the future, you should light a candle, light one for them."

  • Sergiy wrote:

    S_Kumar_S wrote:

    Some of my queries return upto 15 million records which gets pushed to a temp table.

    Does not look like a good choice of querying strategy.

    It's never a good idea to create multi-million rows temp tables.

    I bet there is a better (smarter) way to get the desired reports.

    I agree that huge temp tables are not a great idea. But unfortunately, the kind of data that is needed to generate report needs large amount of data to be pushed to temp table.

    Online Trainer For SQL DBA and Developer @RedBushTechnologies with 18 yrs exp.

  • ScottPletcher wrote:

    (1) As always, the first step is to make sure you have the very best clustered index on every table of significant size.  That is not an easy thing to do, but it's a vital one.

    (2) Have you moved CLOB/BLOB columns out of row to better compact the main table?

    (3) Have you page compressed the data in large tables yet?

    although a very thorough index optimization is already done and query plans are all good.

    That's also very hard to do.  I'm not at all convinced you've done that yet.  Would you mind posting the DDL, including all index definitions, for the (biggest) tables?

    1.Yes , each table has a clustered index and we probably selected the best suited column for it.

    2. There are no BLOB/CLOB columns

    3. Thanks for this idea. Not sure if DBA have used page compression. Need to check. Won't it consume more CPU?

    We have dedicated Sr. People for optimization and they have been doing it for more than a decade for some good clients. So we trust their understanding for indexes and execution plans.  Will see if it will be possible to put the indexes and schema here.

    Online Trainer For SQL DBA and Developer @RedBushTechnologies with 18 yrs exp.

  • S_Kumar_S wrote:

    3. Thanks for this idea. Not sure if DBA have used page compression. Need to check. Won't it consume more CPU?

    Yes, you're quite right, it will use more CPU.  Typically CPU is available while I/O is not, but every system is different.  And, be warned, the initial conversion to page compression takes a long time.

    Edit: Didn't you already state the you don't face CPU issues?

    • This reply was modified 3 days, 5 hours ago by  ScottPletcher.

    SQL DBA,SQL Server MVP(07, 08, 09) Prosecutor James Blackburn, in closing argument in the Fatal Vision murders trial: "If in the future, you should cry a tear, cry one for them [the murder victims]. If in the future, you should say a prayer, say one for them. And if in the future, you should light a candle, light one for them."

  • I still have my doubts on the clus index keys.  But you could be right.

    By the way, are most of the tables clustered on an IDENTITY column?  If so, you almost certainly don't have the best clus index keys on your tables.

    SQL DBA,SQL Server MVP(07, 08, 09) Prosecutor James Blackburn, in closing argument in the Fatal Vision murders trial: "If in the future, you should cry a tear, cry one for them [the murder victims]. If in the future, you should say a prayer, say one for them. And if in the future, you should light a candle, light one for them."

  • S_Kumar_S wrote:

    We have dedicated Sr. People for optimization and they have been doing it for more than a decade for some good clients. So we trust their understanding for indexes and execution plans.  Will see if it will be possible to put the indexes and schema here.

    If that's true, then why are you asking here? 😉

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.
    "Change is inevitable... change for the better is not".
    "Dear Lord... I'm a DBA so please give me patience because, if you give me strength, I'm going to need bail money too!"

    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)

Viewing 15 posts - 1 through 15 (of 25 total)

You must be logged in to reply to this topic. Login to reply