Sql View - Performance Tuning

  • Hello Everyone,

    I have got this Sql View that is running slowly. It takes two(2) hours or more to run, so we are getting application timeouts especially on our reports.

    Aside from this, while executing the actual query that is using the Sql view, we are getting SQL Server Database Error that says the transaction log for database 'tempdb' is full...

    I went through the sql query inside the view and there is a CROSS JOIN to a table that has 500++ million records which I think is the reason for the tempdb error above.

    I have attached all the table definition and index in the query, as well as the definition of the view and the actual query that is using the view.

    All the tables involved has the following data statistics:

    DayTrans Table (5 records)

    DayTransMapping Table (5 records)

    AdvStats Table (523,121,020 records)

    Trip Table (53,321,230 records)

    tAdvertisements Table (4,231 records)

    I haven't attached the execution plan since I only have limited access to our production server.

    I hope the information that I posted is enough to understand the problem that I am having.

    Thanks so much!

  • Ugghhh!!! Those are quite heavy tables...

    I think the major issue is the join between the 500 mill table to the 50mill table and performing an aggregation afterwards.

    The CROSS JOIN is actually an INNER JOIN due to the WHERE clause. I'd expect the query analyzer to be able to sort it out.

    I'm pretty sure pre-aggreagtion will be the key for performance improvement...

    The following query should aggregate the values from AdvStats as a first step. Please check if the row count is significant lower than 500mill.

    Also, please note how I changed the date conversion for AdvStats.create_date. It should be faster than the varchar conversion.

    SELECT COUNT(*)

    FROM AdvStats

    GROUP BY

    AdvStats.advertisement_id,

    AdvStats.shift_number,

    AdvStats.trip_number

    DATEADD(dd,DATEDIFF(dd,0,AdvStats.create_date),0),

    AdvStats.adName



    Lutz
    A pessimist is an optimist with experience.

    How to get fast answers to your question[/url]
    How to post performance related questions[/url]
    Links for Tally Table [/url] , Cross Tabs [/url] and Dynamic Cross Tabs [/url], Delimited Split Function[/url]

  • Yes, absolutely they are heavy tables.

    Yeah, I missed out the WHERE clause for it to behave like an INNER JOIN. What do you think was the cause of the SQL error on tempdb? and how can I use pre-aggregation and combine it to a query inside a view that will output the desired result? Since I need to query check from the Trip table if the start hour is between DayTransMappings start and end hour and should be in one result also

    I tried to execute the code snippet that you have provide and it is still running counting the 14th Million data (please see attached file).

    Thanks for the help.

  • Ouch!! You'd better stop that query....

    I was hoping to prea-aggregate AdvStats and reduce the size of that part of your query.

    But, unfortunately, it seems like it won't help at all (seeing all that count=1).

    It looks like we need to see the execution plan...

    The error on tempdb is most probably due to the size of the tables used in that query. Since ther eis no WHERE clause to actually limit the size of AdvStats (e.g. AdvStats.shift_number =SomeNumber AND AdvStats.trip_number=SomeOtherNumber) you'll end up with table scans for both huge tables.

    I'm sure you finally apply some WHERE condition when using that view.

    Depending on the columns used in that WHERE condition I would write a sproc that would help limiting the rows for each of those big tables and see if I could separate the large query.

    Based on your sample query you limit the result set based on a date range.

    If that's always the case you could write a proc with the date range as input parameter to reduce the number of rows in the Trip table (e.g. using an indexed temp table). Select the related rows from AdvStats into another temp table (required columns only). Use those temp tables together with your current join.

    Side note: How many rows will a SELECT COUNT(*) FROM viewAdsStats usually return (you don't have to run it, just a rough estimate..)?



    Lutz
    A pessimist is an optimist with experience.

    How to get fast answers to your question[/url]
    How to post performance related questions[/url]
    Links for Tally Table [/url] , Cross Tabs [/url] and Dynamic Cross Tabs [/url], Delimited Split Function[/url]

  • Hi James,

    Yes it does seem like your query (and ultimately the underlying view) is filling up the log file and the CROSS JOIN is the problem. Is it really needed or can the view be rewritten, or perhaps a new view created for your needs? If not you may have to look at other options.

    Is it possible to get someone to give you an execution plan if you aren't able?

    This sounds a lot like a system someone tested on a small subset of data and then released into production. You mentioned not having much access to PROD so I am guessing either someone only tested this view on a small subset, or imported a ton of data without proper testing.

    But I think you are on the right track and it's the CROSS JOIN that is the issue.

  • the cross join is one thing

    the function on the datetime column is another one. This causes no datetime index to be used !!

    This conversion is also member of the group by , so causing your view to be fully materialized !! Hence .... there she blows for tempdb.

    Better would be you'd post the full ddl of the tables, views and indexes.

    and the graphical execution plan (.sqlplan) as attachment.

    This also provide plan statistics info that is very useful

    Johan

    Learn to play, play to learn !

    Dont drive faster than your guardian angel can fly ...
    but keeping both feet on the ground wont get you anywhere :w00t:

    - How to post Performance Problems
    - How to post data/code to get the best help[/url]

    - How to prevent a sore throat after hours of presenting ppt

    press F1 for solution, press shift+F1 for urgent solution 😀

    Need a bit of Powershell? How about this

    Who am I ? Sometimes this is me but most of the time this is me

  • LutzM (9/26/2010)


    Ouch!! You'd better stop that query....

    I was hoping to prea-aggregate AdvStats and reduce the size of that part of your query.

    But, unfortunately, it seems like it won't help at all (seeing all that count=1).

    It looks like we need to see the execution plan...

    The error on tempdb is most probably due to the size of the tables used in that query. Since ther eis no WHERE clause to actually limit the size of AdvStats (e.g. AdvStats.shift_number =SomeNumber AND AdvStats.trip_number=SomeOtherNumber) you'll end up with table scans for both huge tables.

    I'm sure you finally apply some WHERE condition when using that view.

    Depending on the columns used in that WHERE condition I would write a sproc that would help limiting the rows for each of those big tables and see if I could separate the large query.

    Based on your sample query you limit the result set based on a date range.

    If that's always the case you could write a proc with the date range as input parameter to reduce the number of rows in the Trip table (e.g. using an indexed temp table). Select the related rows from AdvStats into another temp table (required columns only). Use those temp tables together with your current join.

    Side note: How many rows will a SELECT COUNT(*) FROM viewAdsStats usually return (you don't have to run it, just a rough estimate..)?

    LutzM,

    I already stopped the query since it pops out an out of memory exception already :(.

    To answer your side note, a rough estimate would be around 50M+.

    You're correct, as state in the ActualQuery.txt attachment we usually limit the result based on date range and it will always be that case but even so as of the moment, it will still process the data for 2 hours or more.

    I suggested actually to use a stored procedure because I do more compare with views but the higher **** is saying if this can be resolve using views then the better since the one calling the view is are business intelligence not a CLR or C# .NET.

  • raistlinx (9/26/2010)


    Hi James,

    Yes it does seem like your query (and ultimately the underlying view) is filling up the log file and the CROSS JOIN is the problem. Is it really needed or can the view be rewritten, or perhaps a new view created for your needs? If not you may have to look at other options.

    Is it possible to get someone to give you an execution plan if you aren't able?

    This sounds a lot like a system someone tested on a small subset of data and then released into production. You mentioned not having much access to PROD so I am guessing either someone only tested this view on a small subset, or imported a ton of data without proper testing.

    But I think you are on the right track and it's the CROSS JOIN that is the issue.

    raistlinx,

    The view can actually be rewritten, if theres a better solution then that would great.

    I am actually trying to communicate in order to get the execution plan, I'll post it here once I already have it.

    Thanks for the help.

  • ALZDBA (9/26/2010)


    the cross join is one thing

    the function on the datetime column is another one. This causes no datetime index to be used !!

    This conversion is also member of the group by , so causing your view to be fully materialized !! Hence .... there she blows for tempdb.

    Better would be you'd post the full ddl of the tables, views and indexes.

    and the graphical execution plan (.sqlplan) as attachment.

    This also provide plan statistics info that is very useful

    Hi ALZDBA,

    I already provided the attachments on my first post for the table definition, views and indexes. I'll work out on the graphical execution plan and get back to you guys as soon as I can.

    Thanks!

  • James Tech (9/26/2010)


    ALZDBA (9/26/2010)


    the cross join is one thing

    the function on the datetime column is another one. This causes no datetime index to be used !!

    This conversion is also member of the group by , so causing your view to be fully materialized !! Hence .... there she blows for tempdb.

    Better would be you'd post the full ddl of the tables, views and indexes.

    and the graphical execution plan (.sqlplan) as attachment.

    This also provide plan statistics info that is very useful

    Hi ALZDBA,

    I already provided the attachments on my first post for the table definition, views and indexes. I'll work out on the graphical execution plan and get back to you guys as soon as I can.

    Thanks!

    Please have a look at "how to post performance problems" in my signature and it'll be clear why I requested the actual ddl, ...

    Johan

    Learn to play, play to learn !

    Dont drive faster than your guardian angel can fly ...
    but keeping both feet on the ground wont get you anywhere :w00t:

    - How to post Performance Problems
    - How to post data/code to get the best help[/url]

    - How to prevent a sore throat after hours of presenting ppt

    press F1 for solution, press shift+F1 for urgent solution 😀

    Need a bit of Powershell? How about this

    Who am I ? Sometimes this is me but most of the time this is me

  • Hello all,

    I would suggest two changes to the original view, maybe these changes help.

    First: really there is no need for a CROSS JOIN, the query could be rewritten using INNER JOIN only.

    Second: whit the CONVERT this query clears hours, minutes and seconds from the original datetime, there is another option to do this that might be less expensive.

    So I changed the original view to

    SELECT

    dateadd(d, datediff(d,0,AdvStats.create_date), 0) AS createDate,

    AdvStats.ad_name AS adName,

    AdvStats.advertisement_id AS adId,

    COUNT(AdvStats.trip_ad_id) AS numAdPlayed,

    DayTransMapping.DayTransId AS dayTransId,

    DayTrans.dayTransDesc,

    Trip.vehicle_id AS vehicleId

    FROM DayTrans

    INNER JOIN DayTransMapping

    ON DayTrans.DayTransId = DayTransMapping.DayTransId

    INNER JOIN Trip

    ON Trip.StartHour BETWEEN DayTransMapping.StartHour AND DayTransMapping.EndHour

    INNER JOIN AdvStats

    ON AdvStats.shift_number = Trip.shift_number

    AND AdvStats.trip_number = Trip.trip_number

    GROUP BY dateadd(d, datediff(d,0,AdvStats.create_date), 0),

    AdvStats.advertisement_id,

    AdvStats.adName,

    DayTransMapping.DayTransId,

    DayTrans.DayPartDesc,

    Trip.vehicle_id

    Hope this helps,

    Francesc

  • Hello James and all,

    there is another way to try to improve performance in this query; you can create an index for AdvStats using the columns

    shift_number

    trip_number

    create_date

    advertisement_id

    adName

    , actually changing your index by shift_number + trip_number adding three more fields.

    The reason:

    now the query will join AdvStats and Trip tables using this shift_number + trip_number index, then access the AdvStats table to obtain all other data (I suppose this will be the plan). With all needed data within the index this query will only use the index and not the table, improving performance.

    That's a technique I dislike but in some cases I used it with good results.

    Regards,

    Francesc

  • Guess I will be the first to ask about your hardware. Clearly it is insufficient, but I want to know just how much so. IMNSHO you can tune this all you want but at the end of the day you are going to have to have some good hardware to get a) the query to complete at all and b) to get anything approaching reasonable performance.

    I don't have time to look deep into this right now, but if you are using some form of integer for date/time stuff you might be able to eliminate a join or two. I do this type of stuff regularly with proper time dimensions in start schema scenarios.

    Another possiblity is to have proper indexing that will allow for index seeks/scans on small chunks of data in interative batch processing which you could set up to run in parallel on multiple threads with non-overlapping ranges. This obviously requires that your aggregates are discreet and don't require processing all data to resolve (which I didn't spend time to determine). I have had big win's in total processing time with this type of arrangement as well.

    Best,
    Kevin G. Boles
    SQL Server Consultant
    SQL MVP 2007-2012
    TheSQLGuru on googles mail service

  • Hi Everyone,

    Please see attachment for the Actual Execution Plan including the index details and etc.. This is both for the Sql query that is using the view and the underlying view itself (but I just limit the result by providing the date range).

    Thanks for your time helping!

  • I focused on the execution plan of the view:

    Since there is no index for the trip table starting with create_date you end up with an index scan that fits best (at least per query analyzer logic).

    If possible, add the following index to the trip table:

    create_date include (start_hour,shift_number,trip_number,vehicle_id)

    The focus on the create_date column as the only "true" index column is to get the index as narrow as possible. So we should see a rather fast index seek. I included the vehicle_id since it's the only column added with the key lookup consuming 40% of the overall cost. Not really efficient 😉

    Furthermore, change index #4 of the AdvStats table:

    shift_number,trip_number include (advertisement_id,ad_name,create_date)

    Again, to include those columns instead of adding to the index itself should help getting a narrow index and to actually use this index.

    If you need to safe some space on your system to add those indexes, you could combine index 1 and 2 of AdvStats by adding ad_name,create_date as included columns of the first index and get rid of index 2 (it's too wide as per my personal preference...).

    This should help to "slightly" improve performance, assuming you'll always use a date range for this view as a where clause...



    Lutz
    A pessimist is an optimist with experience.

    How to get fast answers to your question[/url]
    How to post performance related questions[/url]
    Links for Tally Table [/url] , Cross Tabs [/url] and Dynamic Cross Tabs [/url], Delimited Split Function[/url]

Viewing 15 posts - 1 through 15 (of 17 total)

You must be logged in to reply to this topic. Login to reply