In this month’s Road to PASS series, I’m challenging you to submit an abstract for the PASS Summit. Week 1 challenged you to write a few pain points you’ve solved this year, and week 2’s homework was to write the session’s recap slide.
This week, let’s write the technical part of the abstract. It’s tempting to start with a catchy theme or title, but hold off on that – your abstract needs good bones first, and we can always dress the skeleton in different clothes later. Let’s think about the basics first.
Given your recap slide from last week:
- What did the attendee already know when they walked in?
- What new things did you teach them during the session?
- How will they know they need to attend this session?
- Who should not attend this session, and is there a way we can filter them out in the abstract?
The process will be a little more intuitive after reading through what I’m working on:
How often should you take backups and run CHECKDB?
Here was the recap slide from last week:
- Understand what scenarios would cause you to do what kinds of restores
- Get the RPO/RTO HA/DR worksheet
- Fill out the current state columns with sp_BlitzBackups
- Use my management tips to get real RPO/RTO goals from the stakeholders
- When the jobs are too slow, use my time-saving tricks for backups and CHECKDB
- Trend database size & backup times to understand when you need to recalculate
With that in mind, here’s my target attendee profile:
Deb the DBA is the only database administrator in her shop, and she’s not even really comfortable with the job title – she feels like an impostor. She’s been to a Microsoft DBA certification training class where she was taught the differences between full, differential, and log backups, but the class didn’t really tie anything to real world recovery needs. Today, she does a full backup on every server once a day, and log backups every hour – because it seemed like the right thing to do. She’s never had a talk with the business about RPO and RTO, but that one time when she did lose production and she had to restore it, the business was pretty happy with her ability to get it down to just an hour of data loss. She’s wondering how it’s done in bigger shops.
That profile led me to write this abstract:
You’ve been taking SQL Server full and log backups for a couple of years now, and you’ve even done a few successful restores when your butt was on the line. But now that you’ve decided to become a professional database administrator, you’re wondering what kinds of restoration problems you’re going to run into in the future. Are there patterns and practices that a Senior DBA would have taught you – if you were lucky enough to work alongside one?
I want to help. I’m Brent Ozar, and I’ve worked with a lot of shops whose nightly maintenance tasks spiraled out of control. From long backups to corrupt restores to dropped tables, you’ll learn the kinds of server emergencies that have brought DBAs to their knees. I’ll show you how to prevent emergencies before they strike.
I’m not really in love with that abstract yet. I think it brings the right audience members in, and excludes the right ones (folks who’ve never learned from another DBA.) It just doesn’t grab me and shake me. That’s okay – I’ll add in a fun theme later. First, I just need to let that sit for a week and figure out whether the bones work well.
What to do when SQL Server is unusually slow
Here’s my recap slide:
When your SQL Server is unusually slow, step through these in order:
- sp_WhoIsActive @get_locks = 1, @get_plans = 1
- sp_BlitzFirst @ExpertMode = 1
- sp_BlitzCache @ExpertMode = 1, @SortOrder = ‘all’
With that in mind, I’m going to skip the attendee profile and jump straight into the abstract:
“Is something going on with the SQL Server right now?” You hear those words, and your heart starts to race a little faster. You try to connect to the SQL Server, and it responds, but you can tell it’s a little slow. You open up Activity Monitor, look at recent expensive queries, and maybe open up the Windows event logs to see if anything’s happened recently.
There’s gotta be a better way, right?
Good news – there is, and you’re about to learn how the pros do it. In one all-demo session, you’ll watch Brent Ozar use sp_WhoIsActive, sp_BlitzFirst, and sp_BlitzCache to solve several common emergencies. You’ll walk away with a written checklist to solve typical performance emergencies.
With that abstract, I don’t bother defining whether someone is a DBA, developer, sysadmin, or manager. I’m not concerned with their job title. Instead, I focus on them having to do the specific task in the opening paragraph. If it rings a bell, then they’re the right person for the session. They don’t need to know T-SQL or DMVs.
I’m also not defining their level or seniority because the 3rd paragraph tells them exactly what they’re going to learn. At first, it feels kinda odd for the abstract to give away the punch line, but that’s exactly the point. If someone’s already comfortable with sp_WhoIsActive, sp_BlitzFirst, and sp_BlitzCache, I don’t want them in the session. If I hid the tool names, someone who already knows those 3 tools might think I’m going to teach them a new trick. Here, that’s not an issue.
Analyzing parallelism
My recap slide said I’m going to break out parallelism by what parts of the SQL Server startup, login, and query execution process influence parallelism, and how to see the effects afterward. I’ve added a little more details to it here:
- Server startup: Cost Threshold, MAXDOP, core count, NUMA nodes
- Login: Resource Governor
- Database picked: MAXDOP
- Query plan built: query hints, parallelism inhibitors, types of tasks that benefit from parallelism
- Single query in progress & aftermath: live plan, actual (completed) plan, parallelism operators, thread stats, DMVs that show DOP, XE
- Multiple execution aftermath: parallelism operators in estimated plans, DMVs that show DOP
I’ve been thinking a lot about this, and I think I need to visualize it for them on a two-sided sheet of paper. One side will show how SQL Server decides whether a task will go parallel, and the other side will show how to see the effects of it when doing diagnostics.
The target attendee profile:
Tom the Tuner lives in queries and query plans. He knows he can fiddle with MAXDOP at the server level, and he knows that scalar UDFs are bad for parallelism. He works with a dozen developers who all write C# code that queries SQL Server, and when the query doesn’t go fast enough, they bring it to Tom. He really enjoys performance tuning, and spends 4-16 hours per week reading query plans and tweaking queries. He’s read blog posts from Adam Machanic, Erik Darling, and Paul White, but he wants something a little simpler to crystallize what’s going on with parallelism. He has a background in development, and he’s never managed servers. He’s been to a few PASS Summits.
Of the 4 abstracts though, this is one I don’t think I can write until I sketch out the first draft of the visualization. I’ve gone through a couple iterations in my mind, and when I get back home after SQLBits, I’m going to try a few sketches out to see what works. Depending on how complex things get, I may end up just focusing on one side of the piece of paper: either the factors that influence parallelism, OR how to see the effects of it when doing diagnostics. I’ll probably give them both sides – but given the time in a 75-minute session, I’m thinking I might only be able to explain one side.
This is also the kind of session where I might even write the slides for both sides of the paper, and even publish the deck with both sides, but the abstract might only cover the side I think is most important. That way, if I end up finishing WAY early, I could cover both sets of slides. That’d be a hell of a tough challenge though.
Demystifying cardinality estimates
The recap slide for this on:
Look at the driver operator (top right) first, and:
- Can SQL Server quickly understand your T-SQL?
- Are you asking for something that’s already stored on disk?
- Are there statistics about the data you’re asking for?
- Is the data stored in a way that makes subsequent operations easier?
- Now move through the plan’s operators, repeating those questions
The more YES’s you have, the better SQL Server’s estimates will be.
So the perfect attendee profile is Tom the Tuner – but long before he got his current role. It’s targeted at Tom back when he was a C# developer, and he only spent about 1-2 days per week in SQL – the rest in Visual Studio. He’s massively interested in databases, and he loves learning, but he doesn’t even know what he doesn’t know yet about SQL Server. When he sees a bad query plan, he’s still in the mentality of OPTION (RECOMPILE) and updating statistics.
So for an abstract, let’s start with:
You’ve been writing T-SQL queries for a few years now, and when you have performance issues, you’ve been updating stats and using OPTION (RECOMPILE). It’s served you well, but every now and then, you hit a problem you can’t solve. Your data’s been growing larger, your queries are taking longer to run, and you’re starting to wonder: how can I start getting better query plans?
The secret is often comparing the query plan’s estimated number of rows to actual number of rows. If they’re different, it’s up to you – not the SQL Server engine – to figure out why the guesses are wrong. To improve ’em, you can change your T-SQL, the way the data’s structured, or how SQL Server thinks about the data.
This session won’t fix every query – but it’ll give you a starting point to understand what you’re looking at, and where to go next as you learn about the Cardinality Estimator.
I love the first paragraph and the last one. I’m not wild about the middle one yet.