Need help optimizing a query

  • Just fair warning, I am definitely a newbie, I fell into this work so to speak.

    I have a query that is taking between 30 seconds to 30 minutes depending on how much data is being run at the current time. I used SQL Server Management Studio Express to analyze the query and show me the execution plan with statistics ON.

    I have since discovered where the problem is. I have one table in my database where we collect information, this information is updated daily. In fact it's not really updated, it's brand new data everyday collected from various websites.

    Because this table collects brand new data daily it is not indexed, it's a HEAP. There are between 4-6 million rows of data on any given day depending on how much data is run that day. We clear all records older then 3 days from this table every night.

    Now that I've explained the table a little, here's where the problem comes in: (This is not the entire stored procedure just the part that takes 98% operator cost time)

    -------------------------------------------------------------------

    -- GET THE RATES IN TEMP TABLE

    -------------------------------------------------------------------

    SELECT u.*,

    NULL AS RateTypeID,

    NULL AS RoomTypeID,

    NULL ASRatePriority,

    NULL AS RoomPriority

    INTO #RESULTS

    FROM TFX_V2_UnfilteredRates u

    WHERE SiteID IN

    (

    SELECT * FROM #SITES

    )

    ANDHotelID IN

    (

    SELECT * FROM #HOTELS

    )

    AND timeInserted > @JobDate

    DROP TABLE #SITES

    DROP TABLE #HOTELS

    ----------------------------------------------

    It takes far too long to scan this table for the appropriate rates. Any suggestions/ideas on how to make this process run more efficiently would greatly be appreciated. I've thought about creating an index for the TFX_v2_unfiltedrates table which is the one that is 5million rows give or take. However I am worried that it would slow down my inserts on that table, since we need to insert data very quickly.

    If you need more info please let me know, however I was trying to prevent from pasting the entire stored procedure.

    Thanks

    -Josh

  • Heya Josh,

    You're most likely on the right track with the heap, but for optimization items there's a bunch of data most of us prefer to see before we start bounding through possibilities... primarily because we'd like to show you the right way on the first try.

    If you take a look in my signature below, in the 'help on indexing/tuning' link there's a list of what we'll want and it walks you through getting it for us. Short form: All tables involved and their indexes (including the other tables), the query itself in full form, and the .sqlplan from an actual execution (as the heap stands now).

    From there, we'll be able to analyze its connections and expectations and get you on track pretty quickly.


    - Craig Farrell

    Never stop learning, even if it hurts. Ego bruises are practically mandatory as you learn unless you've never risked enough to make a mistake.

    For better assistance in answering your questions[/url] | Forum Netiquette
    For index/tuning help, follow these directions.[/url] |Tally Tables[/url]

    Twitter: @AnyWayDBA

  • Here you go, I tried to include all relevant information.

    Again thanks for helping!

    -Josh

  • jrestuccio (4/8/2013)


    Here you go, I tried to include all relevant information.

    Again thanks for helping!

    -Josh

    Nope, not what I'd want to see. I'd want to see the following:

    1) DDL for the tables including the DDL for indexes

    2) Actual execution plan as a .sqlplan file, not XML plans you included in the spreadsheets.

    3) your current code

  • Did you look at all the sheets within the excel workbook?

    It has the stored procedure (my code), also it is a SQL 2000 database and the save execution plan is greyed out when I try to save it.

    Sorry for the silly question but what are you needing when you say DDL for all tables and indexes?

    What's a quick way to pull that?

    I thought I had this info on the "Relevant Tables and Indexes" sheet.

    -Josh

  • jrestuccio (4/9/2013)


    Did you look at all the sheets within the excel workbook?

    It has the stored procedure (my code), also it is a SQL 2000 database and the save execution plan is greyed out when I try to save it.

    Sorry for the silly question but what are you needing when you say DDL for all tables and indexes?

    What's a quick way to pull that?

    I thought I had this info on the "Relevant Tables and Indexes" sheet.

    -Josh

    What we want to see is ddl (create table statements and create index statements), sample data (insert statements). This is all detailed in the link in my signature about best practices when posting questions. The idea here is so that we can recreate your tables and such in our environment. We also need to see the actual execution plan.

    _______________________________________________________________

    Need help? Help us help you.

    Read the article at http://www.sqlservercentral.com/articles/Best+Practices/61537/ for best practices on asking questions.

    Need to split a string? Try Jeff Modens splitter http://www.sqlservercentral.com/articles/Tally+Table/72993/.

    Cross Tabs and Pivots, Part 1 – Converting Rows to Columns - http://www.sqlservercentral.com/articles/T-SQL/63681/
    Cross Tabs and Pivots, Part 2 - Dynamic Cross Tabs - http://www.sqlservercentral.com/articles/Crosstab/65048/
    Understanding and Using APPLY (Part 1) - http://www.sqlservercentral.com/articles/APPLY/69953/
    Understanding and Using APPLY (Part 2) - http://www.sqlservercentral.com/articles/APPLY/69954/

  • try this index on your heap table......not quite sure why you are worried about data load speeds...seems you are already deleting data over three days old and adding new. Therefore I am going to assume you have a period at some time to perform this task. You could drop the index first and then recreate once loaded...that may help...but an index on your date may also assist the delete.

    anyways...food for thought...here is some test data to paly with...its not a replica of your system ...but it does have 5M rows and some hotels/sites.

    kind regards

    USE [tempdb]

    GO

    DROP TABLE [dbo].[Hotel]

    GO

    DROP TABLE [dbo].[Site]

    GO

    DROP TABLE [dbo].[TransData]

    GO

    SELECT TOP 5000000 ---- NOTE 5 MILLION rows

    TranID = IDENTITY(INT, 1, 1),

    SiteID = 1 + CAST(Abs(Checksum(Newid()) % 10000) AS INT),

    HotelID = 1 + CAST(Abs(Checksum(Newid()) % 1000 ) AS INT),

    SalesAmount = CAST(Rand(Checksum(Newid())) * 99 + 1 AS DECIMAL(5, 2)),

    TransDate = Dateadd(dd, Abs(Checksum(Newid()))%Datediff(dd, '2010', '2013'), '2010')

    INTO TransData

    FROM sys.all_columns ac1

    CROSS JOIN sys.all_columns ac2

    CROSS JOIN sys.all_columns ac3

    SELECT TOP 500

    SiteID = 1 + CAST(Abs(Checksum(Newid()) % 10000) AS INT)

    INTO Site

    FROM sys.all_columns ac1

    CROSS JOIN sys.all_columns ac2

    GO

    SELECT TOP 50

    HotelID = 1 + CAST(Abs(Checksum(Newid()) % 1000) AS INT)

    INTO Hotel

    FROM sys.all_columns ac1

    CROSS JOIN sys.all_columns ac2

    GO

    set statistics time, IO on

    declare @jobdate as datetime

    set @jobdate = '2011-07-09'

    SELECT Hotel.HotelID, Site.SiteID, TransData.TransDate

    FROM TransData INNER JOIN

    Site ON TransData.SiteID = Site.SiteID INNER JOIN

    Hotel ON TransData.HotelID = Hotel.HotelID

    WHERE (TransData.TransDate = @jobdate)

    set statistics time, IO OFF

    /*create index*/

    CREATE NONCLUSTERED INDEX [NIX_transdata] ON [dbo].[TransData]

    (

    [TransDate] ASC,

    [HotelID] ASC,

    [SiteID] ASC

    ) ON [PRIMARY]

    GO

    set statistics time, IO on

    declare @jobdate as datetime

    set @jobdate = '2011-07-09'

    SELECT Hotel.HotelID, Site.SiteID, TransData.TransDate

    FROM TransData INNER JOIN

    Site ON TransData.SiteID = Site.SiteID INNER JOIN

    Hotel ON TransData.HotelID = Hotel.HotelID

    WHERE (TransData.TransDate = @jobdate)

    set statistics time, IO OFF

    ________________________________________________________________
    you can lead a user to data....but you cannot make them think
    and remember....every day is a school day

  • Also, if you are adding and deleting data, you really should create a clustered index on the table. As a heap it will just continue to grow and not reuse empty space allocated to the table.

  • Lynn Pettis (4/9/2013)


    Also, if you are adding and deleting data, you really should create a clustered index on the table. As a heap it will just continue to grow and not reuse empty space allocated to the table.

    Thanks Lynn

    ________________________________________________________________
    you can lead a user to data....but you cannot make them think
    and remember....every day is a school day

  • Thank you SO MUCH for the replies, I am trying to take it all in.

    Let me describe a little bit about why the data is deleted so regularly and why it's important to not slow down insert speeds.

    Imagine 1000 searches being performed at one time, and as soon as the search is done, it collects the data and inserts the data into the TFX_v2_unfilteredrates table. The reason we need this to happen quickly is because we have a massive amount of searches to complete, and since the data needs to be FRESH it needs to be collected within 1 day.

    The reason for deleteing the data is merely to keep the size of the table down to a workable size. Since I believe our queries are poorly written........

    -Josh

  • jrestuccio (4/9/2013)


    Thank you SO MUCH for the replies, I am trying to take it all in.

    Let me describe a little bit about why the data is deleted so regularly and why it's important to not slow down insert speeds.

    Imagine 1000 searches being performed at one time, and as soon as the search is done, it collects the data and inserts the data into the TFX_v2_unfilteredrates table. The reason we need this to happen quickly is because we have a massive amount of searches to complete, and since the data needs to be FRESH it needs to be collected within 1 day.

    The reason for deleteing the data is merely to keep the size of the table down to a workable size. Since I believe our queries are poorly written........

    -Josh

    Assuming that your hardware is spec'ed appropriately for the workload, even 1,000 inserts of 5,000 rows each (5 million rows total) probably shouldn't take more than five or ten minutes, even with a couple or three indexes to update. This code inserts 5 million rows into a test table - it ran on my DEV server in about 30 seconds:

    create table dbo.insert_test (date_collected datetime, hotel_name varchar(20), hotel_id int, rate decimal(6,2))

    go

    set nocount on

    declare @i int = 1

    while @i<=1000

    begin

    insert into dbo.insert_test

    select dateadd(day, (@i*-1), getdate()), 'xxxxxxxxxxxxxxxxxxxx', t.N, 1.32 * t.N

    from .dbo.tally t

    where t.N <= 5000

    set @i = @i+1

    end

    If you're avoiding indexes to save a few minutes once a day when the table is loaded with new data, but the absence of indexes is costing you several minutes every time you run query, you're shooting yourself in the foot. Before you suffer through the poor performance of your query much longer, I'd investigate the impact of a proper indexing scheme on both the insert/delete process and the query performance to make sure you're striking an appropriate balance.

    Jason Wolfkill

  • Sorry, got inundated yesterday, but wanted to catch up with this. As mentioned, we really need everything for tuning assistance. The article describes the easiest ways to get it for us.

    jrestuccio (4/9/2013)


    Let me describe a little bit about why the data is deleted so regularly and why it's important to not slow down insert speeds.

    Even with a clustered index you won't see a significant difference in insert speeds. Start doing indexed views and triggers and that's a different story, or overloading the non-clustered indexing.

    Imagine 1000 searches being performed at one time, and as soon as the search is done, it collects the data and inserts the data into the TFX_v2_unfilteredrates table.

    Not really an issue. I/O is I/O. You might even get better performance with a split index to avoid hotspotting and using a really loose fillfactor, but that's a different story.

    The reason we need this to happen quickly is because we have a massive amount of searches to complete, and since the data needs to be FRESH it needs to be collected within 1 day.

    The reason for deleteing the data is merely to keep the size of the table down to a workable size. Since I believe our queries are poorly written........

    It's hard not to have poorly written queries when those queries have no indexes to work from. Get indexes into place and you may find you can actually keep your archive.


    - Craig Farrell

    Never stop learning, even if it hurts. Ego bruises are practically mandatory as you learn unless you've never risked enough to make a mistake.

    For better assistance in answering your questions[/url] | Forum Netiquette
    For index/tuning help, follow these directions.[/url] |Tally Tables[/url]

    Twitter: @AnyWayDBA

  • Hey everyone, thanks again for the replies!

    I am still a little confused on the best way to supply the DDL information, in the articles you link to it says the SP_help output should work, which I included in my spreadsheet.

    Also, the "save execution plan" is greyed out in SSMS express, so what's the best way to get you the execution plan?

    Again this is a SQL Server 2000 database.

    Thanks again! Also, not sure if I am breaking any forum rules or anything (hope not) but how much would any of you charge as a consultant to take a hands on look?

    -Josh

  • jrestuccio (4/12/2013)


    Also, the "save execution plan" is greyed out in SSMS express, so what's the best way to get you the execution plan?

    See the article in Craig's sig. Describes how to get an exec plan in SQL 2000, easiest way to save it is in a spreadsheet. Haven't looked at your spreadsheet to see what's in there, a little busy with stuff.

    Thanks again! Also, not sure if I am breaking any forum rules or anything (hope not) but how much would any of you charge as a consultant to take a hands on look?

    If by hands-on you mean remote, disconnected and across a couple of oceans, roughly $100/hr with a minimum of 8 hours. If you want in-person then I can't help you there.

    Gail Shaw
    Microsoft Certified Master: SQL Server, MVP, M.Sc (Comp Sci)
    SQL In The Wild: Discussions on DB performance with occasional diversions into recoverability

    We walk in the dark places no others will enter
    We stand on the bridge and no one may pass
  • Lynn Pettis (4/8/2013)


    jrestuccio (4/8/2013)


    Here you go, I tried to include all relevant information.

    Again thanks for helping!

    -Josh

    2) Actual execution plan as a .sqlplan file, not XML plans you included in the spreadsheets.

    SQL 2000 forum. Hence assumed to be a SQL 2000 server unless specified other. Hence no XML plans.

    Gail Shaw
    Microsoft Certified Master: SQL Server, MVP, M.Sc (Comp Sci)
    SQL In The Wild: Discussions on DB performance with occasional diversions into recoverability

    We walk in the dark places no others will enter
    We stand on the bridge and no one may pass

Viewing 15 posts - 1 through 15 (of 18 total)

You must be logged in to reply to this topic. Login to reply