Report / Query performance problems...

  • Quick background:

    Monthly scheduled reports are provided to our customers, using SSRS/SQL Server 2008R2. Thousands of EACH report are generated. The process takes DAYS, which, for "beginning of month" reports is not acceptable. I am attempting to resolve performance problems with one report in particular.

    Running the core stored procedure in SSMS against the production database on the production server is taking 15+ minutes. Multiply that by thousands and, even with parallel processing, and some sense of the scope of the problem arises. I have identified the biggest bottleneck, which is a SELECT against a 56 million row table. Attached are the relevant table structures with indexes, along with the simple query. (There's actually a bit more to the query, but what's attached is where all the ugly is.) The StudentID is actually a GUID, but our ETL process brings it in as a NVARCHAR(50). This data is effectively in a datawarehouse (more like an open air data storage yard), so there are zero concerns about inserts, deletes, etc. The ETL completely rebuilds the tables nightly.

    Also attached is the actual execution plan.

    I need some direction regarding what we can do to improve the report delivery. I am going to experiment with some form of preprocessing, but cannot do so at the moment with 2,000+ reports still queued up. (No, we don't have these tables in a clean dev environment, although I have requested such and it may manifest.) Preprocessing will involve joining a dataset of 227,000 records to the 56 million records. I attempted to do so yesterday, killed it after 4 hours.... will take another shot after this month's reports have finished.

    Hardware, configuration, query hints, basically anything is on the table, at least for consideration.

    Your assistance and consideration in this matter is appreciated.

  • Before looking at the execution plans, one more generic query - is it possible to rewrite the reporting process to execute the "expensive" query once, save the results, and then send out the thousands of reports based on that saved result?

    A second generic recommendation - since you have a data warehouse type workload, and the table has over 56 million rows, have you considered using a columnstore index?

    Your specific query is slow because you force an implicit data type conversion. The StudentID in the temporary table is nvarchar(50); in the DW table it is varchar(50). When you compare it SQL Server has to convert the varchar values to nvarchar (because that is how the rules for data type comparison between varchar and nvarchar are defined), which means it has to scan every row in the data, convert it, then do an aggregation to get the distinct StudentID values, which can then be matched to the temp table.

    If you can change the temporary table to use varchar(50), so that it matches the data type in the permanent table, I expect that the plan shape will change to a scan of the temporary table into a nested loops join into an index seek on the big table, and an execution time of just a few milliseconds.


    Hugo Kornelis, SQL Server/Data Platform MVP (2006-2016)
    Visit my SQL Server blog: https://sqlserverfast.com/blog/
    SQL Server Execution Plan Reference: https://sqlserverfast.com/epr/

  • Nice catch on the implicit cast, Hugo.

    Biker Dad, when you make this change, the impact on performance is going to be incredible. However, please don't discount Hugo's first suggestion. It could be a big job to write, but you could significantly reduce the pain caused by the current process.

    My guess is that your current setup started out small and has grown into what you have today without a major restructuring to handle the current load. If this is correct, you certainly won't be the first to have experienced this phenomenon. I once had to deal with a daily snapshot creation to support web-based reporting. It had grown to the point where the snapshot took longer than 24 hours to run, so it couldn't complete before the next one had to start. The daily reporting could never be current to the previous day. I started from scratch and got it to run in about 20 minutes. It took about a few days to figure out and about a week to write, but it was worth it. I don't know if you're in a similar situation or not, but it is certainly possible to pull off such performance improvements.

  • The implicit CAST was The Ugly. I changed my #temp structures to use VARCHAR() rather than NVARCHAR() and voila, things started moving along. While performance isn't down into the milliseconds, it's acceptable for running the individual reports from the report manager, and I can preprocess the entire dataset in roughly 30 minutes, then for the monthly report dump, just query that and things should move along quite nicely. Well, except for the actual report rendering time, which I'm going to look at improving as well. The RDL is one I inherited, with a far more complex structure than I would have used.

Viewing 4 posts - 1 through 3 (of 3 total)

You must be logged in to reply to this topic. Login to reply