Size of Data Roadblocks

  • Grant Fritchey (3/10/2015)


    It sure sounds like you need to do fundamental query tuning, indexing, statistics maintenance, etc. Assuming you can't get someone in to help you who has experience with all this, I'd suggest you start studying now. I'd suggest starting with the Troubleshooting for the Accidental DBA[/url] book. It's free to download. Then move on to my books.

    Come on now Grant! Geez. :hehe:

    Grant Fritchey (3/10/2015)


    And partitioning can, in some cases, lead to performance increases, but it is absolutely not a guarantee of any kind. In fact, I have more often seen it lead to performance headaches because after implementing it, the people find that they can't guarantee the queries will always be based on the clustering key. If you can't absolutely guarantee that, then partitioning will not only fail to improve your performance, it will degrade it. Do not approach partitioning as a performance tuning mechanism. It is meant as a data management tool.

    Yep, this makes sense and why I suggested it because it will be guaranteed that queries will always be based on the clustering key. It's actually why I thought it would be ideal to mention even though I know how this community is on table partitioning for improved query performance. 😉

  • xsevensinzx (3/10/2015)


    ScottPletcher (3/10/2015)


    If the table is already clustered on time (hooray!), then reading a week's worth of data shouldn't take that long.

    We need to review two things (at least).

    First, the query code itself. We need to verify that the query is doing a seek. Maybe the lookup datetime are "nonsargable", and you're forcing SQL to scan the table rather than do a keyed seek. That would have a huge impact on performance with a large table.

    Second, missing index and index usage stats. This will identify how SQL is using existing index(es) and what index(es) SQL "thinks" are missing.

    Yeah, it's doing a seek. It wasn't at first. I threw in a table hint to point to the clustered index, which shaved off some time by 20 seconds versus the scan on the non-clustered index that's on a column that's not even being factored at all. That's over 200 million records over 20 days in about 49 seconds with the hint, after a statistics update and 0 fragmented index.

    Did I understand you right that the query took 20 days?

    A medium powered server should be able to perform a full table scan of 200 million rows in under half an hour.

    I'd suggest you begin by listing wait states at server level and post back result, so we have a clue what's going on.

    http://www.sqlskills.com/blogs/paul/wait-statistics-or-please-tell-me-where-it-hurts/

    Look at the query execution plan and see if it's using a HASH MATCH join, which can be a huge black hole on a server with large tables and which is running low on memory.

    Also confirm if it's stuck in CXPACKET wait for practically the entire duration of query, and if so then add OPTION (MAXDOP 1) hint to query to disable parallel execution. Packet exchanging between parallel threads can sometimes cause a query to run for orders of magnitude longer than it would with a single execution thread.

    "Do not seek to follow in the footsteps of the wise. Instead, seek what they sought." - Matsuo Basho

  • Eric M Russell (3/10/2015)


    Did I understand you right that the query took 20 days?

    A medium powered server should be able to perform a full table scan of 200 million rows in under half an hour.

    I think it's rather "20 days worth of data equals 200 million records in 49 seconds".

    Need an answer? No, you need a question
    My blog at https://sqlkover.com.
    MCSE Business Intelligence - Microsoft Data Platform MVP

  • xsevensinzx (3/10/2015)

    GilaMonster (3/10/2015)


    Partitioning is not done for performance reasons. It's for data management, fast loads and fast deletes. It won't necessarily improve performance, it may, in fact, reduce performance. If you're looking for performance benefits, stop looking at partitioning.

    Unfortunately, I have to disagree here mainly because you're making a big assumption here (see Grants response). You know without a doubt that table partitioning can in fact increase performance on querying depending on your partitioning strategy and how it's being used. It doesn't guarantee it, but it can increase performance. It also doesn't mean it should be the sole reason to add table partitioning, but I have an extremely large table that will benefit from it on other ends too. That and how your partition and how it's used is in queries from my understanding (correct me here if I'm wrong) greatly impacts how table partitioning either improves or degrades performance. Like for example, joining on a column that is not in a partition will slow down the query, not improve the query performance. That among other things.

    Partitioning typically improves performance only when the original table is clustered incorrectly (usu. by identity, since the myth persists that should be the "default" clustering key). When switching to partitioning, the best clustered index key, such as datetime, is used, and, bingo, better performance, because fewer rows are being scan/read. But, in fact, had the original table been clustered on datetime, performance would have been much better all along, for the same reason.

    SQL DBA,SQL Server MVP(07, 08, 09) A socialist is someone who will give you the shirt off *someone else's* back.

  • xsevensinzx (3/10/2015)


    Yeah, it's doing a seek. It wasn't at first. I threw in a table hint to point to the clustered index, which shaved off some time by 20 seconds versus the scan on the non-clustered index that's on a column that's not even being factored at all. That's over 200 million records over 20 days in about 49 seconds with the hint, after a statistics update and 0 fragmented index.

    If the optimizer wasn't choosing that index, it could be because that's not a good index for the query, or it's stats are off or out of date, or there is something up with the sargeability of the filtering in the query.

    "The credit belongs to the man who is actually in the arena, whose face is marred by dust and sweat and blood"
    - Theodore Roosevelt

    Author of:
    SQL Server Execution Plans
    SQL Server Query Performance Tuning

  • Koen Verbeeck (3/10/2015)


    Eric M Russell (3/10/2015)


    Did I understand you right that the query took 20 days?

    A medium powered server should be able to perform a full table scan of 200 million rows in under half an hour.

    I think it's rather "20 days worth of data equals 200 million records in 49 seconds".

    Correct. Sorry for confusion 🙂

  • xsevensinzx (3/10/2015)


    Koen Verbeeck (3/10/2015)


    Eric M Russell (3/10/2015)


    Did I understand you right that the query took 20 days?

    A medium powered server should be able to perform a full table scan of 200 million rows in under half an hour.

    I think it's rather "20 days worth of data equals 200 million records in 49 seconds".

    Correct. Sorry for confusion 🙂

    Can you post the actual SQL query and sample execution plan?

    "Do not seek to follow in the footsteps of the wise. Instead, seek what they sought." - Matsuo Basho

  • Eric M Russell (3/10/2015)


    xsevensinzx (3/10/2015)


    ScottPletcher (3/10/2015)


    If the table is already clustered on time (hooray!), then reading a week's worth of data shouldn't take that long.

    We need to review two things (at least).

    First, the query code itself. We need to verify that the query is doing a seek. Maybe the lookup datetime are "nonsargable", and you're forcing SQL to scan the table rather than do a keyed seek. That would have a huge impact on performance with a large table.

    Second, missing index and index usage stats. This will identify how SQL is using existing index(es) and what index(es) SQL "thinks" are missing.

    Yeah, it's doing a seek. It wasn't at first. I threw in a table hint to point to the clustered index, which shaved off some time by 20 seconds versus the scan on the non-clustered index that's on a column that's not even being factored at all. That's over 200 million records over 20 days in about 49 seconds with the hint, after a statistics update and 0 fragmented index.

    Did I understand you right that the query took 20 days?

    A medium powered server should be able to perform a full table scan of 200 million rows in under half an hour.

    I'd suggest you begin by listing wait states at server level and post back result, so we have a clue what's going on.

    http://www.sqlskills.com/blogs/paul/wait-statistics-or-please-tell-me-where-it-hurts/

    Look at the query execution plan and see if it's using a HASH MATCH join, which can be a huge black hole on a server with large tables and which is running low on memory.

    Also confirm if it's stuck in CXPACKET wait for practically the entire duration of query, and if so then add OPTION (MAXDOP 1) hint to query to disable parallel execution. Packet exchanging between parallel threads can sometimes cause a query to run for orders of magnitude longer than it would with a single execution thread.

    My CXPACKET is higher now with the new data. I have been fine tuning the MAXDOP today to adjust for a bit. I did option for MAXDOP 1, but this is as expected, worse. I just got to a good tune when I updated my MAXDOP and shaved off another 10 seconds. I think I'm at the edge though.

  • Grant Fritchey (3/10/2015)


    xsevensinzx (3/10/2015)


    Yeah, it's doing a seek. It wasn't at first. I threw in a table hint to point to the clustered index, which shaved off some time by 20 seconds versus the scan on the non-clustered index that's on a column that's not even being factored at all. That's over 200 million records over 20 days in about 49 seconds with the hint, after a statistics update and 0 fragmented index.

    If the optimizer wasn't choosing that index, it could be because that's not a good index for the query, or it's stats are off or out of date, or there is something up with the sargeability of the filtering in the query.

    Table is fresh, statistics updated today even with 30% samples and so forth. The index it keeps choosing is a non-clustered index on a column the query is not even using. I hinted towards the clustered index that is on Time, which is being filtered and counted by Time. Performance differences between what SQL is suggesting is a bit wonky because the hint is much faster and better.

    Everything looks good in terms of the index and statistics. I'm wanting to push it further beyond just the index tuning.

  • I'm back. Here is where I am at.

    I was just doing basic queries on a very large dataset with the indexes set in place right now. Nothing more, nothing less. Just getting baselines for how the server handles simple aggregates of bigger data because in my mind, if this can improve then it only trickles down to more complex queries and processes that actually require more thought behind query optimization, index tuning and the works. If that's the wrong approach, feel free to let me know. I only approach it that way because I use to work on big networking engines and the works. One of the sure fire ways of optimization was just simple overloads and trying to apply pressure until something breaks. Finding those bottlenecks on simply overloading it and optimizing to handle it has such a good butterfly effect to everything else.

    The reason I mention that is because so many questions and assumptions on query tuning, indexing and so forth were mentioned. I really haven't even gone down that path just yet with a bigger dataset. But they do of course matter a great deal than just throwing money at something like more processing power, more memory, faster disk, different raid configurations and so forth.

    That said, I think some of the things I'm going to expand on is upgrading to Enterprise (table compression/partitioning/columnstore/in-memory tables), bulking up on some more memory and continue to find a good sweet spot on my server configuration with the processors I have. Then focus on good index and query optimization strategies to make it all come together.

    Thanks for the help guys!

Viewing 10 posts - 16 through 24 (of 24 total)

You must be logged in to reply to this topic. Login to reply