RE: Is there any way to improve statistics on a table like this?

SSC Guru

Points: 1003865

May 10, 2014 at 9:54 am

First, please don't post a thousand rows in a code window. It drove even my 4 processor 6GB ram laptop a little nuts because of the "prettifier" that SQL Server Central runs on such code windows. My older machine was absolutely paralyzed when I opened this thread never mind trying to copy the code. For such large bits of code, attached them as a text file. If you can, you might want to go back and edit your original post to do that. 😉

Second, (and I'm admittedly assuming that this is supposed to be a true "Adjacency List" where each node has one and only one parent) if the data is a representation of actual data, you might have a serious problem with the data. For example, a ParentID of "12" is rather prominent in your test data but is not to be found as a k_ID. If the real data has a similar problem, people should spend some time cleaning up the data for that system before they worry about such things as indexes and stats. It may be a simple omission on your part but I don't see a self referencing constraint (FK, in this case) that requires the ParentID to exist as a K_ID nor do I see a unique constraint k_ID/ParentID as a combination. I also don't see a constraint that requires that the k_ID not be equal to the ParentID. Based on those "findings", I'd also check to see if there are any "loops" built into the data and, again, I'd do all of that before I even thought of looking at indexes or constraints. This also requires that you change the ParentID of the row that has a k_ID of "1" to NULL.

Once all of that is resolved (again, only if this is meant to be a true "Adjacency List) and the correct constraints and other safe guards have been put in place to prevent orphans (such as any rows that have the ParentID of "12" in your test data) and "loops" in the data, then you can start on the indexes and stats. As Gail would remind us, the first question to ask is "is there a performance problem". If there isn't, then you might not have to worry so much about indexes and stats. At that point, the only thing you might want to consider is a little index consolidation and, possibly, the removal of unused indexes for the sake of nightly maintenance and a bit of savings on hard-disk space used.

I'd also consider changing Type, State, and Lock to TINNYINT so save on index size and the overall footprint of the table not only on the hard-disk, but for backup and restore sizes. I also don't see anything in the data that requires the use of NCHAR or NVARCHAR data-types. If that's true and there's no intent that there ever will be such a requirement, you're just wasting 50% of the space on those columns and you might want to consider changing them to just CHAR or VARCHAR. Of course, THAT will require a change on the front end because the implicit conversions of searching for NVARCHAR literal names (for example) on VARCHAR columns will prevent the use of indexes altogether. Disk space might be cheap but the reduction of size of the underlying data-types could reduce the size of each row to a half or even a 3rd of it's current size, which means that the table will require less memory, indexes will be smaller and also require less memory, performance could go up a fair bit just because more rows will fit on each page, backup sizes will decrease, and a panic DR restore will take less time.

If you're traversing the tree to get other than just one row lookups (for example, you want to return a whole "set" starting with a given ParentID and want to return the entire down-line of the parent), consider the fact that an "Adjacency List" (Parent/Child table) is great for easy maintenance but relatively suck for hierarchical traversals and aggregations. They're also absolutely terrible for anything that might require traversals by level. If you're having performance problems with such things or the code has become a maintenance nightmare, consider combining the ease of maintenance of the "Adjacency List" with the blinding speed of "Nested Sets" for traversals and aggregations.

Most methods for maintaining "Nested Sets" are terrible in their complexity and most people use a slow RBAR "push-stack" method to do an initial conversion from "Adjacency List" to "Nested Sets". Please see the following two articles for a much better way to handle all of that. The initial conversion is so fast (4-5 seconds for 160K nodes as compared to nearly 50 minutes using the push-stack method) that some folks have taken to keeping the "Adjacency List" for ease of maintenance (each node is aware of one and only one other node) and just rebuilding the entire "Nested Sets" when there's a change rather than using any of the more complicated maintenance methods.

Here are a couple of articles complete with code that will build a test "Adjacency List" (a million nodes in the article but is programmable for any size) and demonstrate how it all would work. Of course, the front end code would need to be modified to take full advantage of the "Nested Sets" (well, unless stored procedures where used to abstract the lookup functionality to begin with in which case you'd be "golden").

http://www.sqlservercentral.com/articles/Hierarchy/94040/

http://www.sqlservercentral.com/articles/T-SQL/94570/

Last but certainly not least and, again, if you're having performance problems, consider splitting all the variable width "note" and similar columns off into a "sister table", which would have a PK of k_ID from your main table. That would do a couple of things... 1) you wouldn't experience page splits when someone made one of those "fields" larger with extra characters and 2) the basic traversals would run at light speed because it would greatly reduce the row size allow many more rows per page where it's important.

--Jeff Moden

RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
First step towards the paradigm shift of writing Set Based code:
________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

Change is inevitable... Change for the better is not.

Helpful Links:
How to post code problems
How to Post Performance Problems
Create a Tally Function (fnTally)