What's my bottleneck?

  • I don't know for sure if "performance tuning" is the right place for this topic, but it has to do with performance so i think this is the best place:

    We have some very large tables (350 million records) which are used in some advanced filtered queries (lots of joins / group by's / sums / etc.) Some of those queries take quite a long time (1 minute) te execute, but the query can't be optimized any further. All indexes are there, the table is designed the right way, it just takes that long because it is a big-*** query.

    BUT the problem is: custumers don't care it is a big-*** query, they just want to see the results asap, and 1 minute is too long for them. So the only other option is to improve the hardware, and here is where my real question starts: What part of my hardware causes it to run this long.

    The disk doesn't seem to be the problem, when the query is running the MB/s is about 50mb/s, while it can handle 10 times as much. Besides: There is 40gyg of memory available, so most of the data will be in memory. The processer usage is during the execution of the query 60 - 90%, and my college's opinion is that a faster processer wouldn't help since it doesn't use all of it's capacity now and therefore this is not the bottleneck.

    Conclusion: There is no real bottleneck, the server runns smooth, but i want this query to run faster.....

  • When the query isn't running, what is your processor usage?

    Jason...AKA CirqueDeSQLeil
    _______________________________________________
    I have given a name to my pain...MCM SQL Server, MVP
    SQL RNNR
    Posting Performance Based Questions - Gail Shaw[/url]
    Learn Extended Events

  • You mentioned all the design is the right way, just to get an Idea, have you also used Database Partitioning? and what about the Number of Data Files and Filegroups?

    Is the Database used purely by the reporting Solution (where Query's are taking 1 Min) or it is a transaction based system with reporting application also reads simultaneously.


    Bru Medishetty

    Blog -- LearnSQLWithBru

    Join on Facebook Page Facebook.comLearnSQLWithBru

    Twitter -- BruMedishetty

  • 9 times out of 10 , the query can be more more efficient.

    Post the query and the query plan.



    Clear Sky SQL
    My Blog[/url]

  • Interesting statement "All indexes are there"

    One thing I have noticed over and over again, is many Developers and some DBA's don't really understand how and why indexes are used.

    You can have lots of indexes on your tables and the query optimizer still decides to da a full table scan. There are many reasons for this.

    Here are two things I look for (Cardinality & Distribution).

    I use a simple count(*) group by query like:

    Select index_fieldname, count(*)

    from tablename

    group by index_fieldname.

    If you only get a few records from this query the cardinality of this index is not the greatest and will be skipped more often that not. It is a good candidate to be dropped.

    If you notice the count is very uneven for 1 or 2 records say 90% in one value then the distribution for that index is poor and will often cause unexpected performance issues. Again it is a good candidate to be dropped.

    If you have compound indexes try to have the field with the highest cardinality as the first field in the index. I know country,state makes more sense to us than state, country for an index, but with relational databases I have found the reverse to be true.

    Check that statistics are being maintained.

    Check for fragmentation of the indexes.

    Use query plan and verify that the indexes chosen by the optimizer make sense.

    Hope this helps!

  • Sander A. (11/9/2009)


    Besides: There is 40gyg of memory available, so most of the data will be in memory.

    It will if it has been loaded in previously and not unloaded, otherwise it will be reading from disk. Can you be sure it's preloaded?

  • Lots of joins, grouping, and aggregates?

    Sounds like indexed views territory.

    I look forward to seeing the execution plans too.

  • Here is a query that took 5 minutes, it returns a little over 5 million records. I also included the execution plan and a screenshot of the activitymonitor. The average CPU activity during the day is 40%. The tables are not (yet) partitioned, i have been testing with this but in most cases this only slowed down the queries.

    WITH

    statCTE (IDKeyword, impr, clicks, pos, costs) AS (

    SELECT

    kwd.ID

    , ISNULL(sum(cast(stat.Impressions as bigint)),0) impr

    , ISNULL(sum(cast(stat.Clicks as bigint)),0) clicks

    , ISNULL(SUM(ISNULL(stat.AveragePosition,0)*ISNULL(cast(stat.Impressions as bigint),0)) / NULLIF(SUM(cast(stat.Impressions as bigint)), 0), 0) pos

    , ISNULL(sum(stat.TotalCostsEuro),0.0) costs

    FROM SAM_Accounts act

    JOIN SAM_Campaigns cpn on cpn.idAccount = act.id

    JOIN SAM_Groups grp on grp.idcampaign = cpn.id

    JOIN SAM_Keywords kwd on kwd.idgroup = grp.id

    JOIN CIS_Projects prj ON act.IDProject = prj.id

    JOIN SAS_KeywordStatistics stat ON stat.IDKeyword = kwd.ID

    AND stat.Date BETWEEN '01/01/2009' AND '11/09/2009'

    WHERE prj.id = 1047

    GROUP BY kwd.ID

    )

    ,kpiCTE (IDKeyword, kpi1) AS (

    SELECT

    kwd.ID

    , ISNULL(sum(kpi.KPI1),0.0) kpi1

    FROM SAM_Accounts act

    JOIN SAM_Campaigns cpn on cpn.idAccount = act.id

    JOIN SAM_Groups grp on grp.idcampaign = cpn.id

    JOIN SAM_Keywords kwd on kwd.idgroup = grp.id

    JOIN CIS_Projects prj ON act.IDProject = prj.id

    JOIN SAS_KeywordKPIs kpi ON kpi.IDKeyword = kwd.ID

    AND kpi.Date BETWEEN '01/01/2009' AND '11/09/2009'

    WHERE prj.id = 1047

    GROUP BY kwd.ID

    )

    ,statCCTE (IDCampaign, cpnclicks) AS (

    SELECT

    stat.IDCampaign

    , ISNULL(sum(cast(stat.Clicks as bigint)),0) cpnclicks

    FROM SAS_CampaignStatistics stat

    WHERE stat.Date BETWEEN '01/01/2009' AND '11/09/2009'

    GROUP BY stat.IDCampaign

    )

    SELECT

    kwd.id id

    , kwd.idType IDKeywordType

    , act.IDAccountType

    , kwd.HasHistory

    , kwd.Term kwdName

    , kwd.dateCreated datecreated

    , StatCTE.impr impr

    , StatCCTE.cpnclicks cpnclicks

    , StatCTE.clicks clicks

    , StatCTE.pos pos

    , kwd.DestinationUrl url

    , StatCTE.costs costs

    , kwd.MinCPCEuro mincpc

    , kwd.Negative kwdnegative

    FROM SAM_Accounts act

    JOIN SAM_Campaigns cpn on cpn.idAccount = act.id

    JOIN SAM_Groups grp on grp.idcampaign = cpn.id

    JOIN SAM_Keywords kwd on kwd.idgroup = grp.id

    JOIN CIS_Projects prj ON act.IDProject = prj.id

    LEFT JOIN KPICTE ON KPICTE.IDKeyword = kwd.ID

    LEFT JOIN StatCTE ON StatCTE.IDKeyword = kwd.ID

    LEFT JOIN statCCTE ON statCCTE.IDCampaign = cpn.ID

    JOIN TIS_Options Opt_kwdStatus ON Opt_kwdStatus.ID = kwd.idStatus

    WHERE prj.id = 1047

    AND (((COALESCE(StatCTE.clicks ,0) <= 1000.0)) OR kwd.id = -1)

    AND (kwd.idStatus in (56)

    AND grp.idStatus in (50)

    AND cpn.idStatus in (27, 29))

    PS: The query is build by a querybuilder, since the user himself can change the columns he wants, the daterange, etc. etc.

    PPS: We are planning for a new server, the main question now is if we have to invest in lots of CPU power, massive amounts of memory. Or superfast harddrives (something like 16 SSD's in 1 RAID array :w00t:)

  • Why does no-one ever include an actual execution plan? Actual row counts and thread distribution data are so important! And why omit the fact that the SQL is built by a tool? How anyone can say that the query cannot be tuned further exasperates me.

    Anyway, to answer the original question: the bottleneck is the design. Joining the same large detail-record tables together over and over again in this way is massively inefficient.

    Take kpiCTE for example. Consider how fast it would be if an indexed view something like the following existed:

    CREATE VIEW dbo.KPI

    WITH SCHEMABINDING

    AS

    SELECT prj_id = PRJ.id,

    kwd_id = KWD.id,

    kpi_date = KPI.[date],

    kpi1 = SUM(KPI.kpi1),

    row_count = COUNT_BIG(*)

    FROM dbo.SAM_Accounts ACT

    JOIN dbo.SAM_Campaigns CPN ON CPN.idAccount = ACT.id

    JOIN dbo.SAM_Groups GRP ON GRP.idcampaign = CPN.id

    JOIN dbo.SAM_Keywords KWD ON KWD.idgroup = GRP.id

    JOIN dbo.CIS_Projects PRJ ON ACT.IDProject = PRJ.id

    JOIN dbo.SAS_KeywordKPIs KPI ON KPI.IDKeyword = KWD.ID

    GROUP BY

    PRJ.id,

    KWD.id,

    KPI.[date];

    GO

    CREATE UNIQUE CLUSTERED INDEX c ON dbo.KPI (prj_id, kpi_date, kwd_id);

    I can't promise that it is exactly correct, or that the index is optimal - it is to illustrate a point.

    The kpiCTE part of the query could be replaced with something like:

    SELECT kwd_id,

    kpi1 = ISNULL(SUM(kpi1), 0)

    FROM dbo.KPI WITH (NOEXPAND)

    WHERE prj_id = 1047

    AND kpi_date BETWEEN '20090101' AND '20090911'

    GROUP BY

    kwd_id;

    In all probability, the same indexed view could be extended to cover 'statCTE' as well. I could go on about denormalization, reporting databases, filtered indexes, the advantages of data warehouses and Analysis Services, but to be honest it's way too much for a forum question.

    With some expert attention, and a vaguely sensible design, these types of queries should be darn near instantaneous - and running happily without parallelism, oodles of memory, or RAIDed SSDs, on a very ordinary server indeed.

    Paul

  • Thats only an estimated query plan , not an actual.

    What I look for , initially, is a big difference in the actual row count to the estimated row count which would point to statistics being out of date.

    Can you get the actual plan ?

    Also try turning off parallelism (Option (MAXDOP 1)) and see if that has an impact.



    Clear Sky SQL
    My Blog[/url]

  • I've executed the query again and included the actual executionplan. I also tried it with MAXDOP, but it failed with an "out of memory" exception, so therefore i inserted the results into a temp table (which is normaly done as well). With maxdop the query took even a minute longer, but i attached the executionplan of that one as well.

  • Paul White (11/10/2009)


    Why does no-one ever include an actual execution plan? Actual row counts and thread distribution data are so important! And why omit the fact that the SQL is built by a tool? How anyone can say that the query cannot be tuned further exasperates me.

    Anyway, to answer the original question: the bottleneck is the design. Joining the same large detail-record tables together over and over again in this way is massively inefficient.

    Take kpiCTE for example. Consider how fast it would be if an indexed view something like the following existed:

    CREATE VIEW dbo.KPI

    WITH SCHEMABINDING

    AS

    SELECT prj_id = PRJ.id,

    kwd_id = KWD.id,

    kpi_date = KPI.[date],

    kpi1 = SUM(KPI.kpi1),

    row_count = COUNT_BIG(*)

    FROM dbo.SAM_Accounts ACT

    JOIN dbo.SAM_Campaigns CPN ON CPN.idAccount = ACT.id

    JOIN dbo.SAM_Groups GRP ON GRP.idcampaign = CPN.id

    JOIN dbo.SAM_Keywords KWD ON KWD.idgroup = GRP.id

    JOIN dbo.CIS_Projects PRJ ON ACT.IDProject = PRJ.id

    JOIN dbo.SAS_KeywordKPIs KPI ON KPI.IDKeyword = KWD.ID

    GROUP BY

    PRJ.id,

    KWD.id,

    KPI.[date];

    GO

    CREATE UNIQUE CLUSTERED INDEX c ON dbo.KPI (prj_id, kpi_date, kwd_id);

    I can't promise that it is exactly correct, or that the index is optimal - it is to illustrate a point.

    The kpiCTE part of the query could be replaced with something like:

    SELECT kwd_id,

    kpi1 = ISNULL(SUM(kpi1), 0)

    FROM dbo.KPI WITH (NOEXPAND)

    WHERE prj_id = 1047

    AND kpi_date BETWEEN '20090101' AND '20090911'

    GROUP BY

    kwd_id;

    In all probability, the same indexed view could be extended to cover 'statCTE' as well. I could go on about denormalization, reporting databases, filtered indexes, the advantages of data warehouses and Analysis Services, but to be honest it's way too much for a forum question.

    With some expert attention, and a vaguely sensible design, these types of queries should be darn near instantaneous - and running happily without parallelism, oodles of memory, or RAIDed SSDs, on a very ordinary server indeed.

    Paul

    Creating such a view might be an idea. But i can't create a view with only the columns in this query. In reality users select themself which columns they want to see, so all columns should be included in such a view. This results in a view which is an exact copy of the SAS_KeywordStatistics table, but with the projectid,accountid,campaignid,groupid and keywordid added to it. Offcourse this is a performance improvement (I lose 5 joins), but those joins are all on the clustered index, so would the impact be realy that big ??? I could also denormalize the table and save this data to the actual table, in this way i don't have to create an indexed view with the same size of the base table (saves me 100gyg of storage), but the issue is the same: Will it make such a change, i'm having hard times to believe that....

    Edit: i have been thinking about it after i typed this message. And you might have a realy good point here. With the right indexes this could make a big change!

  • Sander A. (11/10/2009)


    Edit: i have been thinking about it after i typed this message. And you might have a really good point here. With the right indexes this could make a big change!

    Yes. The actual execution plan shows some truly epic row counts into some of those joins.

    I would need to be on-site and paid $1000 a day to take you much further with this (:laugh:) but I definitely encourage you to experiment with this on your test system. You will probably be amazed with the results, if done properly.

    Aggregate the return columns to as high a level as you can get away with, follow the rules for indexed views, eliminate those hugely expensive joins, and you should be well on your way! Those joins are absolutely killing you.

    Good luck. Please post back if you need any further (general) advice.

    Paul

  • Business Objects by any chance? I have made a good bit of money from helping others clean up performance messes from that awful system.

    1)

    AND (((COALESCE(StatCTE.clicks ,0) <= 1000.0)) OR kwd.id = -1)

    that will be a problem. Perhaps you can break this up into 2 queries with a union(all)? ORs are BAD for the optimizer.

    so are coalesce - any chance you can eliminate the NULLable and give it a default of 0?

    2) Bet the IN clauses can vary greatly too, right? yet another potential performance issue, but unfortunately there is a limited set of options for that one.

    3) indexed views, mentioned by others, can indeed help tremendously for some queries, but beware the maintenance overhead they carry if you do DML (such as perhaps a nightly load/build process)

    4) you need REALLY good statistics on everything touched by queries like this. Explicit ones may help greatly. Keep them fresh too.

    Best,
    Kevin G. Boles
    SQL Server Consultant
    SQL MVP 2007-2012
    TheSQLGuru on googles mail service

  • Index of the the table of SAM_Keywords

    May be not good.

Viewing 15 posts - 1 through 14 (of 14 total)

You must be logged in to reply to this topic. Login to reply