I a m developer (SSAS/SSIS) and familiar with the concepts of clusted and nonclustered indexes.
How much more about indexing/statistics/... should i have knowledge of?
What is "nice to have" and what is "must have"?
EDIT: asking because i am one of the people that answered wrong here
Funny that you should ask that question. I've found out (the hard way) that even "experts" on the subject are making recommendations and calling certain things "Best Practices" when the supposed "Best Practices" they're recommending are actually causing the system to generate massive amounts of page splits, which is the usual cause for what they're calling "fragmentation" (there are scenarios where no page split occur but you have 99.9% fragmentation appear even after a rebuild virtually overnight).
Most of the world follows the supposed "Best Practice" of doing nothing up until 5 or 10% fragmentation (we'll call it 5%), using REORGANIZE for "fragmentation" between 5% and 30%, and REBUILD for anything > 30% "fragmentation".
I know where those numbers came from (Paul Randal) but even he said they were a general recommendation that MS forced him to make. Further, it says right in Books Online ( https://docs.microsoft.com/en-us/sql/relational-databases/indexes/reorganize-and-rebuild-indexes?view=sql-server-ver15 ) right below that "recommendation" that states...
Read the important parts in that where it says that "However, the actual values may vary from case to case. It is important that you experiment to determine the best threshold for your environment".
And, yeah... Paul Randal wrote that, as well, and he's completely correct there.
Almost everyone ignores that.
It turns out that REORGANIZE isn NOT your friend. For example, almost everyone says that the best reason to NOT use Random GUIDs is because of the massive number of page splits (quickly causing huge amounts of fragmentation). While there are a lot of other reasons to not use Random GUIDs, fragmentation isn't actually one of them. I have tests that prove that you can actually go for months with absolutely ZERO page splits or fragmentation and, again, Paul Randal actually mentions such a thing ( https://docs.microsoft.com/en-us/sql/relational-databases/indexes/specify-fill-factor-for-an-index?view=sql-server-ver15 )...
"A nonzero fill factor other than 0 or 100 can be good for performance if the new data is evenly distributed throughout the table. "
Sounds like a bloody Random GUID to me. 😉 It's especially good for high INSERT rates because it has NO HOT SPOTS in the table.
The problem has nothing to do with the fact that you're using Random GUIDs. The problem is actually caused by the way most people do index maintenance. In this case, the use of REORGANIZE is the actual killer here because people don't actually consider the fact that REORGANIZE DOES NOT CREATE ANY NEW PAGES. So, REORGANIZE packs the data tighter up to the assigned Fill Factor but does NOTHING to bring pages fuller than the Fill Factor down to the Fill Factor. As a result, REORGANIZE actually guarantees, perpetuates, and makes page splits MUCH worse for such Random GUID indexes.
Then there's the other supposed "Best Practice" of reducing the Fill Factor if you have too much fragmentation between index maintenance runs. If you don't know what "kind" of insert and update pattern you have, then that's one of the worst recommendations there is because it won't only NOT stop the fragmentation, you end up totally wasting huge amounts of disk space and memory.
I'm slowly but surely working on a "Stairway" that identifies all of this type of stuff with indexes and also provides verifiable proof in the form of repeatable, demonstrable code on the subject. I've already taught the basics of what I'm talking about in my "Black Arts" Index Maintenance 1, 2, and 3 presentations at several SQL Saturdays. I even demonstrate that NOT rebuilding indexes and just religiously maintaining stats is better than maintaining indexes blindly and, so, incorrectly.
So, yeah... there's a whole lot to learn. It took me 20 years to figure out a lot of this stuff. The first thing to do is to learn ALL of the basics, while understanding that what most people recommend as "Best Practices" actually aren't. Then, keep learning from there.
is pronounced "ree-bar
" and is a "Modenism
" for R
First step towards the paradigm shift of writing Set Based code:
________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.
"Change is inevitable... change for the better is not".
"If "pre-optimization" is the root of all evil, then what does the resulting no optimization lead to?"
How to post code problems
How to Post Performance Problems
Create a Tally Function (fnTally)