Need help understanding performance issue

  • Hello All,

    I'm looking for some help understanding a performance issue. This is just a general question at the moment, and if the collective "we" decide that I need to provide a sample database, I'll do my best, but can't promise due to the usual proprietary constraints.

    Environment: SQL Server 2008 R2, running in a hosted virtual machine. 4 GB ram.

    Situation:

    We have a data warehouse-ish database from which a lot of calculations are done, and the results moved to a user-visible application database via an SSIS package. There are just over 62K rows in the results that are to be moved (split roughly evenly by 3 Divisions). Yesterday afternoon I started the SSIS package to move the data to the client-side database. When I came into work this morning, it was still running (roughly 17 hours). When there were only 2 divisions, this package took about 1/2 hour to run.

    The SSIS package (original):

    Source data is a stored procedure - psuedocode:

    Create a dynamic SELECT statement (columns to be moved are configurable, not all columns are selected)

    INSERT @TableVariable

    EXEC @DynamicSql

    SELECT * FROM @TableVariable

    Destination data is a table

    I modified the above by creating a staging table, and inserting into the staging table. Then, the source data becomes the staging table. Started running the package again, and monitoring the results, estimated it was still going to take 12-15 hours.

    After being hit by a flaming bolt of inspiration, I changed the stored procedure once again to do the INSERTs one division at a time (yes, create a cursor to get the division ids, and modify the SELECT with a WHERE clause DivisionId = @division, where @division is populated from the cursor).

    Ran the package again. Total processing time < 20 minutes.

    This just seems to go against everything we try to teach about NOT using cursors. With the information provided, can somebody provide an explanation?

    Regards,

    Tom

  • Did you check if there was any blocking on the table during INSERTs?

  • In that case a cursor wouldn't kill performance much as you are only using it to move through a handful of "divisions". Its when people use it to plough through hundreds of thousands you can have issues.

    How much data is in each row roughly? Was it the EXECUTE step that had taken so long or the data flow task?

  • Have you tried looking at the activity monitor of the source db (maybe the dest db as well)?

    You can also run sys.dm_tran_locks to view the locks.

  • To answer your questions:

    There didn't appear to be any blocking when I looked. The query in question selects from seven tables (combination of full and left joins), and inserts into an eighth. I would have been very surprised if there had been blocking.

    Shark:

    The average size of a row is about 2000 bytes. It's the EXECUTE statement that takes a long time, not the data flow task. Once the data is staged, it flies from one database to the other.

    FYI: The reason SP was written as it originally was is because of the need of the data flow task to read the metadata of the source. I believe we would have done something different if that wasn't a requirement.

  • Environment: SQL Server 2008 R2, running in a hosted virtual machine. 4 GB ram.

    Does the 4 gb belong to the server hosting the virtual machine or to the virtual machine?

    I don't know if you looked at the activity monitor as you ran the package (or copy and paste the select statement and run it), but I've ran into issues where the CPU and memory usage shot through the roof once the source tables reached a certain size.

  • Does the 4 gb belong to the server hosting the virtual machine or to the virtual machine?

    It belongs to the VM.

    I don't know if you looked at the activity monitor as you ran the package (or copy and paste the select statement and run it), but I've ran into issues where the CPU and memory usage shot through the roof once the source tables reached a certain size.

    Yes, I've seen both CPU and memory at or near 100%. I have a very strong suspicion that we need more memory on this machine. Is there any way to quantify the effects of adding memory to performance, before adding the memory?

  • I can take a stab at it, althougth I've never needed to predict performance improvements per hardware upgrades...

    I would start by making sure all the proper indexes are present. (I ususally just run DTA on the select statement, and check the recommendations)

    If all the indexes are there, I would check the index fragmentation percentage.

    (even if proper index creation and reorganzation fixes the performance, depending your db size and activity, it might be just a matter of time before you have to upgrade the hardware...)

    If everything looks good...

    Try selecting a smaller, subset of the data. (like what you did by selecting one division, instead of three) and do a comparision of CPU, memory, and I/O usages (vs all three divisions).

    Estimate the source table size if it were to contain only one division, and again compare with the full table size.

    These two steps should give you an idea of if you hit a performance "threshhold".

    I sure hope I'm making sense...:-)

  • I would start by making sure all the proper indexes are present. (I ususally just run DTA on the select statement, and check the recommendations)

    If all the indexes are there, I would check the index fragmentation percentage.

    (even if proper index creation and reorganzation fixes the performance, depending your db size and activity, it might be just a matter of time before you have to upgrade the hardware...)

    I did check for appropriate indexes, but forgot to look at fragmentation.

    If everything looks good...

    Try selecting a smaller, subset of the data. (like what you did by selecting one division, instead of three) and do a comparision of CPU, memory, and I/O usages (vs all three divisions).

    Estimate the source table size if it were to contain only one division, and again compare with the full table size.

    These two steps should give you an idea of if you hit a performance "threshhold".

    I sure hope I'm making sense...:-)

    No worries. Makes perfect sense. It's been 15 years since I did this type of performance tuning. In the intervening years, I've been one of those product developers that dumped a database on you guys and said "Don't touch" (not really. I've always had the philosophy that the customer knows their environment better than the product developer, so make it easy for them to at least allocate tables to appropriate disks).

  • This should take seconds, not minutes. Any chance of you posting what the dynamic SQL renders out to during run time?

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

    Change is inevitable... Change for the better is not.


    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)

  • What we do is generate a lot of sales metrics (sales, activity, and activity type, current count is around 650 when you consider the various timeframes, and there's dollars sold or units sold). There's a configuration table that is read to determine the set that are actually transferred to the application database for display. Two reasons for that:

    1. So we don't have to modify the SSIS package every time somebody comes up with a new measure to display -- we add one or two a month

    2. Total "row" size (if we combined the sales, activity, and activity type columns into a single row) is about 22000 bytes

    SELECT S.AreaSKey, S.ProductSKey, S.DivisionSKey, S.TimeSKey, AH.AreaName,

    CustomerTextField1 AS Text1, NULL AS Text2, NULL AS Text3, NULL AS Text4, NULL AS Text5, NULL AS Text6, NULL AS Text7, NULL AS Text8, NULL AS Text9, NULL AS Text10,

    Qtr_SeeAppropCountTarget_Percent AS Number1, CustomerCount_Segment1 AS Number2, CustomerCount_Segment2 AS Number3, CustomerCount_Segment3 AS Number4, CustomerCount_Segment4 AS Number5, mat_Brand_PC_Contribution_unit AS Number6, mat_market_share_unit_max AS Number7, mat_market_share_unit_min AS Number8, mat_market_unit AS Number9, mat_sales_unit AS Number10, mat_TA_PC_Contribution_unit AS Number11, qtr_CallCount_PC AS Number12, qtr_CallCount_PC_Highlight AS Number13, qtr_CustomerCallCount_Segment1 AS Number14, qtr_CustomerCallCount_Segment2 AS Number15, qtr_CustomerCallCount_Segment3 AS Number16, qtr_CustomerCallCount_Segment4 AS Number17, QTR_CustomerCallCount_Target AS Number18, qtr_CustomerCallCount_Total AS Number19, qtr_highlight_unit AS Number20, qtr_unit_max AS Number21, qtr_unit_min AS Number22, qtr_unit_pc AS Number23, qtr_market_share_highlight_unit AS Number24, qtr_market_share_unit AS Number25, qtr_market_share_unit_max AS Number26, qtr_market_share_unit_min AS Number27, qtr_market_unit AS Number28, qtr_quadrant_segment_MSTSA_unit AS Number29, QTR_RequiredCallcount_Total AS Number30, qtr_SeeApprop_CallCount_Highlight AS Number31, qtr_SeeAppropCountTarget_Percent AS Number32, RequiredCallCount_Target AS Number33, TargetCustomerCount AS Number34, TotalCustomerCount AS Number35, UserDefinedBubbleColor AS Number36, NULL AS Number37, NULL AS Number38, NULL AS Number39, NULL AS Number40, NULL AS Number41, NULL AS Number42, NULL AS Number43, NULL AS Number44, NULL AS Number45, NULL AS Number46, NULL AS Number47, NULL AS Number48, NULL AS Number49, NULL AS Number50, NULL AS Number51, NULL AS Number52, NULL AS Number53, NULL AS Number54, NULL AS Number55, NULL AS Number56, NULL AS Number57, NULL AS Number58, NULL AS Number59, NULL AS Number60,

    Qtr_AllArea_SeeAppropPct_Chart AS Chart1, Qtr_SeeAppropCountTarget_Percent_Chart AS Chart2, mat_market_share_unit_chart AS Chart3, mth_calltype_count_total_chart_id1 AS Chart4, mth_calltype_count_total_chart_id3 AS Chart5, qtr_all_area_unit_chart AS Chart6, qtr_all_area_market_share_unit_chart AS Chart7, qtr_all_calltype_chart_id1 AS Chart8, qtr_calltype_count_total_chart_id1 AS Chart9, qtr_calltype_count_total_chart_id3 AS Chart10, qtr_unit_chart AS Chart11, qtr_market_share_unit_chart AS Chart12, NULL AS Chart13, NULL AS Chart14, NULL AS Chart15, NULL AS Chart16, NULL AS Chart17, NULL AS Chart18, NULL AS Chart19, NULL AS Chart20, NULL AS Chart21, NULL AS Chart22, NULL AS Chart23, NULL AS Chart24, NULL AS Chart25

    FROM dbo.AreaHierarchic AH

    JOIN dbo.pqMetrics S

    ON S.AreaSKey = AH.AreaSKey

    JOIN dbo.vwPQMetricsAllProducts SP

    ON S.AreaSKey = SP.AreaSKey

    AND S.DivisionSKey = SP.DivisionSKey

    AND S.TimeSKey = SP.TimeSKey

    JOIN dbo.pqMetricsCharts mc

    ON s.AreaSKey = mc.AreaSKey

    AND s.ProductSKey = mc.ProductSKey

    AND s.DivisionSKey = mc.DivisionSKey

    AND s.TimeSKey = mc.TimeSKey

    LEFT OUTER JOIN dbo.pqMetricsActivity A

    ON S.AreaSKey = A.AreaSKey

    AND S.ProductSKey = A.ProductSKey

    AND S.DivisionSKey = A.DivisionSKey

    AND S.TimeSKey = A.SalesTimeSKey

    LEFT OUTER JOIN dbo.pqMetricsActivityCharts C

    ON S.AreaSKey = C.AreaSKey

    AND S.ProductSKey = C.ProductSKey

    AND S.DivisionSKey = C.DivisionSKey

    AND S.TimeSKey = C.SalesTimeSKey

    LEFT OUTER JOIN dbo.vwPQMetricsAllCallTypes CT

    ON S.AreaSKey = CT.AreaSKey

    AND S.ProductSKey = CT.ProductSKey

    AND S.DivisionSKey = CT.DivisionSKey

    AND S.TimeSKey = CT.SalesTimeSKey

    WHERE S.TimeSkey = 'Apr 1 2010 12:00AM'

    AND S.DivisionSKey = 1

    There are always sales metrics. There may not be activity or activity type metrics (hence the left joins). All tables have a clustered index on TimeSKey, AreaSKey, ProductSKey, DivisionSKey (as well as other indexes)

Viewing 11 posts - 1 through 10 (of 10 total)

You must be logged in to reply to this topic. Login to reply