large-ish tables, poor query plans - db architecture question

Question

large-ish tables, poor query plans - db architecture question

sqlnyc

SSCommitted

Points: 1726
More actions
March 7, 2014 at 4:31 pm

#290739

Hi everyone,
My client has a db that is 94% reads, 6% writes.
There are two tables that have ~100 million rows each, and are often joined. They are running SQL 2008 R2 Standard on Windows 2008 R2 Enterprise. Database size is ~300GB and expected to grow. Statistics are updated with fullscan nightly. Running on a Dell 720 with 128GB memory, with 64 GB allocated to SQL Server.
I have warned management for quite a long time they can't just keep throwing rows into these tables, and expect that things will remain the same. Their response is that there are much larger databases out there that hold a lot more rows, and that they don't have the type of performance issues that the client is beginning to experience. My reply has been that those shops are not likely running SQL Standard, and that they are probably using some form of partitioning.
Queries that generate sub-optimal plans are becoming more frequent, causing index scans of one or both of these tables. These are single-statement queries that are generated by a framework, so are not easily optimizable unto themselves. There are predicates that should filter the query, but the predicate appears to not be pushed down.
My fear has been that as rows accumulate in these tables, statistics are less and less meaningful, causing the optimizer to choose a bad plan.
For those of you with large mostly read-heavy tables, what has been the max number of rows that you have in a single table before experience performance issues as described above? And how have you remedied the situation?
Thanks in advance--
sqlnyc

Viewing 6 posts - 1 through 5 (of 5 total)

You must be logged in to reply to this topic. Login to reply

MyDoggieJessie SSC-Forever Points: 44486 More actions · Answer 1

I know this is off topic to what you were expecting for an answer but, well, you may want to reconsider changing your statistics job to skip the nightly FULLSCAN and opt for something a little lighter (like just COLUMNS), perhaps leaving FULLSCAN for a weekly deal (just a thought) - it might help out with what I suspect is a long running process and a whole lot of consumed resources? Although you didn't mention it, I assume you have an index maintenance rebuild/reorg in place as well?

Regarding the current size and anticipated growth - It's really not that bad, we have nearly a billion rows in some tables, with the majority of our larger tables in the 300-500 million range. Thoughtful indexing and query optimizations can REALLY GO A LONG way in keep things running smoothly. Since you mention that you have little control over how the queries are built, you may want to further evaluate these queries to look for patterns with WHERE clauses, JOINS, etc, and possibly look into filtered indexes (since this isn't an enterprise-level feature). Obviously there are pros/cons to that as well but it just may help for those "BIG" queries involving the JOINS to the larger tables.

For the queries generating "sub-optimal" plans, dive deeper into the execution plans to see why you're getting scans rather than seeks, surely there is something you can do to optimize those (of course not everything is possible in certain cases). We have a similar situation (auto-generated queries by an external 3rd party vendor app), but we isolate the problematic queries and "recommend" (make) the vendor incorporate our changes so they perform better when critically necessary.

I honestly don't know where the "tipping" point is for extremely large data sets and where possibly a scan takes the same amount of time as a seek would (if there is even one), but I can tell you it's not where you are now....perhaps others on this forum with more experience in this area can help shed some light to this topic?

______________________________________________________________________________Never argue with an idiot; Theyll drag you down to their level and beat you with experience

MyDoggieJessie SSC-Forever Points: 44486 More actions · Answer 2

One other thing, you didn't mentioned anything about the disk sub-system this all sits on...what kind of hardware do you have? If you've got lots of storage arrays/LUNS or whatnot, you might get some performance boost out of moving files around to faster drives, etc...

______________________________________________________________________________Never argue with an idiot; Theyll drag you down to their level and beat you with experience

sqlnyc SSCommitted Points: 1726 More actions · Answer 3

Hey MyDoggieJessie,

Thanks so very much for your replies.

All disks are local to the server, no DAS or SAN yet.

MDFs: 6GB/s SATA SSD

LDFs: 6GB/s SAS 15K spinning

In general, we don't have performance issues. But I've seen an uptick in index scans on one of the 100 million row tables.

We are lucky to have a generous maintenance window, so running fullscan for stats is not an issue at the moment. But thanks for pointing out that we have alternatives.

Yes, index maintenance is done nightly, via Ola Hallengren backup scripts.

Just wanted to clarify that for a lot of the reports, they do call stored procedures, and we can modify those if there are performance issues. It's the framework-generated stuff that's a problem (and it has the added benefit of not really being human-readable).

Thanks again for your suggestions.

sqlnyc

MyDoggieJessie SSC-Forever Points: 44486 More actions · Answer 4

t's the framework-generated stuff that's a problem (and it has the added benefit of not really being human-readable).

It's TSQL isn't it? You should be able to find the executing statements - SQL is SQL!

______________________________________________________________________________Never argue with an idiot; Theyll drag you down to their level and beat you with experience

Grant Fritchey SSC Guru Points: 398713 More actions · Answer 5

I'd be curious why the plans are choosing to do scans. Assuming the distribution of the data hasn't changed, just having more data doesn't generally affect the statistics that much. Partitioning is not really a mechanism for improving performance. It can, but it's primarily a means of making data management of large data sets easier. Many people see poorer performance from partitioning, so I'd be cautious about that. From the sounds of things, I'd focus on general query tuning, ensuring the indexes are good, etc., and probably look to throw more disks at the problem. For a system that size it sounds like the hardware might not be optimal (and I'm not the guy who normally argues for hardware solutions).

"The credit belongs to the man who is actually in the arena, whose face is marred by dust and sweat and blood"
- Theodore Roosevelt

Author of:
SQL Server Execution Plans
SQL Server Query Performance Tuning