Resources for tuning large workloads

  • Hi,

    It seems that a lot of the performance tuning books and articles I read assume up front that data is being returned to a user. I'm interested in resources for tuning workloads that are primarily internal data processing and not for human consumption. I understand that the same basics apply to a degree, but the stored procedures I'm tuning typically touch a range of 6 - 20 million rows. Sometimes there is no WHERE. Sometimes there are 6-8 joins, or just one big join. I deal with a lot of hash values and a lot of self-join-ish behavior linking records within a table to family, parent, and child relationships in the same table. If any of you know of anything, free or paid, I'd be very grateful. I can't seem to find much that addresses anything close to my situation. I've run through much of the free e-books and a few real books around here.

    Thanks

  • Most of the functions are the same with 6-8 joins or one. Most of the rules are the same dealing with a few rows or a few million. The real issues come up when you're trying do large data without any kind of filtering. At that point you can't tune queries. Instead, you need to tune the hardware. Or, you need to reassess your process to see if you can come up with a mechanism that does include filtering criteria. I know my book will offer you little in the way of help since your queries essentially can't be tuned. Indexing also becomes something of a waste of time without filtering criteria. In general, most data processing I've seen, OLTP, warehousing and reporting, have filters in place, so tuning of queries & indexes, etc., becomes possible. Without that, you are in a space that most data processing isn't in. Sorry not to be able to add something useful.

    "The credit belongs to the man who is actually in the arena, whose face is marred by dust and sweat and blood"
    - Theodore Roosevelt

    Author of:
    SQL Server Execution Plans
    SQL Server Query Performance Tuning

  • Actually, that's pretty helpful, since I at least have a better idea of where to focus my time now.

    Especially when I open up a 3000 line stored procedure, hit CTRL + F, and find exactly one instance of the word 'WHERE' 😀

    Thanks

  • Like Grant said, the fundamentals are the same regardless of where the output goes. Where I work, we deal with hundreds of millions of rows and are guided by the same principles everyone talks about in here for much lower table sizes. But we filter extensively and build covering indexes and indexed views to support that filtering. There is no doubt that self joins can be costly, and integers are really much more efficient than hashes as index keys, but without looking at your table and index structures, your queries and your business needs, it's impossible to give you any solid advice. It sounds like your discovery of a single WHERE clause is a start. Good luck.

    __________________________________________________

    Against stupidity the gods themselves contend in vain. -- Friedrich Schiller
    Stop, children, what's that sound? Everybody look what's going down. -- Stephen Stills

  • I would feel genuinely terrible dropping what I have to deal with on unpaid volunteers. The little things that I get stuck on, sure. But it's my responsibility overall. I guess I'm pretty relieved that I don't have to re/un-learn a whole set of things for dealing with large data sets.

    If I get a follow up question: are there any advanced resources for optimizing queries that you would recommend? Again, doesn't have to be free. Just looking to cut out some of the noise. Bonus points if it's in-person training in NYC.

    Thanks

  • Keep an eye out for the SQL Skills Immersion courses. They have a lot of detailed internals knowledge. And it's in-person and travels. Not sure what the schedule is, but you can probably get that from their web site. There's also Pluralsight. I don't know what their advanced courses look like, but I'm sure they have some.

    "The credit belongs to the man who is actually in the arena, whose face is marred by dust and sweat and blood"
    - Theodore Roosevelt

    Author of:
    SQL Server Execution Plans
    SQL Server Query Performance Tuning

  • Man, I wanted that free month of PluralSight from the 24 hour wait stats blog post, but my boss was terrified it wouldn't be anonymous.

    :Whistling:

  • sqldriver (7/7/2014)


    Actually, that's pretty helpful, since I at least have a better idea of where to focus my time now.

    Especially when I open up a 3000 line stored procedure, hit CTRL + F, and find exactly one instance of the word 'WHERE' 😀

    Thanks

    sqldriver (7/9/2014)


    Man, I wanted that free month of PluralSight from the 24 hour wait stats blog post, but my boss was terrified it wouldn't be anonymous.

    :Whistling:

    Is the first statement in that query 'USE PRISM' ?:-D

  • Gazareth (7/9/2014)


    sqldriver (7/7/2014)


    Actually, that's pretty helpful, since I at least have a better idea of where to focus my time now.

    Especially when I open up a 3000 line stored procedure, hit CTRL + F, and find exactly one instance of the word 'WHERE' 😀

    Thanks

    sqldriver (7/9/2014)


    Man, I wanted that free month of PluralSight from the 24 hour wait stats blog post, but my boss was terrified it wouldn't be anonymous.

    :Whistling:

    Is the first statement in that query 'USE PRISM' ?:-D

    More like 'USE ABYSS'

  • By far the biggest tuning pay back is to get the best clustering index on all tables, esp. large ones. Until you do that, the rest is honestly pretty much a waste of time since it's misdirected effort.

    SQL DBA,SQL Server MVP(07, 08, 09) A socialist is someone who will give you the shirt off *someone else's* back.

Viewing 10 posts - 1 through 9 (of 9 total)

You must be logged in to reply to this topic. Login to reply