SQL 2017 post migration memory issues

Question

SQL 2017 post migration memory issues

jsqldba

SSChampion

Points: 11257
More actions
August 28, 2019 at 7:41 am

#3674969
Recently, we have migrated a database from win 2012 r2 standard / sql 2014 enterprise to win 2016 standard / sql 2017 (cu15) enterprise, and immediately after the migration we are seeing several memory related messages in the sql error logs. As you can see below, we tested reducing the database compatibility level downwards to see if a failing query would run. reducing it down to 120 sql 2014 compatibility level works. but we also have other errors popping up related to "AppDomain loading/unloading" that occur. This database is SAP BW with ETLs running overnight. The errors occur during the ETL jobs.
The new server is configured the same as the old server - 8CPU, 92GB windows memory / 80GB sql memory min/max, same disk layout on SAN. Both old and new servers are running on a VMware VM server. Lock pages in memory enabled, same maxdop (1), ctfp (50), full index rebuild / update statistics done each weekend (ola scripts), etc.
Any ideas?

2019-08-27 14:20:48.98 spid84 Setting database option COMPATIBILITY_LEVEL to 140 for database 'XXX'.
2019-08-27 14:44:00.48 spid73 Error: 8623, Severity: 16, State: 1.
2019-08-27 14:44:00.48 spid73 The query processor ran out of internal resources and could not produce a query plan. This is a rare event and only expected for extremely complex queries or queries that reference a very large number of tables or partitions. Please simplify the query. If you believe you have received this message in error, contact Customer Support Services for more information.
2019-08-27 14:49:48.54 spid345 Setting database option COMPATIBILITY_LEVEL to 130 for database 'XXX'.
2019-08-27 14:50:38.79 spid193 Error: 8623, Severity: 16, State: 1.
2019-08-27 14:50:38.79 spid193 The query processor ran out of internal resources and could not produce a query plan. This is a rare event and only expected for extremely complex queries or queries that reference a very large number of tables or partitions. Please simplify the query. If you believe you have received this message in error, contact Customer Support Services for more information.
2019-08-27 14:51:51.32 spid375 Setting database option COMPATIBILITY_LEVEL to 120 for database 'XXX'.
2019-08-27 16:00:03.12 spid184 40 transactions rolled forward in database 'XXX' (7:0). This is an informational message only. No user action is required.
2019-08-27 16:00:04.73 spid184 0 transactions rolled back in database 'XXX' (7:0). This is an informational message only. No user action is required.
2019-08-27 16:00:05.78 spid184 Recovery completed for database XXX (database ID 7) in 4 second(s) (analysis 2 ms, redo 771 ms, undo 1563 ms.) This is an informational message only. No user action is required.
2019-08-27 19:21:30.94 spid18s FlushCache: cleaned up 26111 bufs with 4358 writes in 68943 ms (avoided 1166 new dirty bufs) for db 6:0
2019-08-27 19:21:30.94 spid18s average writes per second: 63.21 writes/sec
average throughput: 2.94 MB/sec, I/O saturation: 2840, context switches 7221
2019-08-27 19:21:30.94 spid18s last target outstanding: 6, avgWriteLatency 49
2019-08-27 19:37:29.81 spid74 AppDomain 6 (XXX.dbo[runtime].5) created.
2019-08-27 20:35:52.57 spid184 DBCC CHECKDB (XXX) WITH all_errormsgs, no_infomsgs, physical_only, maxdop = 8 executed by XXX-NET\XXX found 0 errors and repaired 0 errors. Elapsed time: 4 hours 35 minutes 51 seconds. Internal database snapshot has split point LSN = 00103dce:00b054ab:0003 and first LSN = 00103dce:00b052f2:0001.
2019-08-27 20:36:20.21 spid184 DBCC CHECKDB (DBA) WITH all_errormsgs, no_infomsgs, physical_only, maxdop = 8 executed by XXX-NET\XXX found 0 errors and repaired 0 errors. Elapsed time: 0 hours 0 minutes 1 seconds. Internal database snapshot has split point LSN = 0000002a:0000293b:0001 and first LSN = 0000002a:00002939:0001.
2019-08-27 21:43:12.82 spid46s AppDomain 6 (XXX.dbo[runtime].5) is marked for unload due to memory pressure.
2019-08-27 21:43:12.84 spid46s AppDomain 6 (XXX.dbo[runtime].5) unloaded.
2019-08-27 23:16:31.71 spid304 DBCC TRACEON 2861, server process ID (SPID) 304. This is an informational message only; no user action is required.
2019-08-28 00:00:08.27 spid13s This instance of SQL Server has been using a process ID of 3160 since 8/24/2019 7:21:49 PM (local) 8/24/2019 5:21:49 PM (UTC). This is an informational message only; no user action is required.
2019-08-28 06:23:59.22 spid190 DBCC TRACEON 2861, server process ID (SPID) 190. This is an informational message only; no user action is required.
THANKS!
- This topic was modified 6 years, 11 months ago by jsqldba.
- This topic was modified 6 years, 11 months ago by jsqldba.

Viewing 6 posts - 1 through 6 (of 6 total)

You must be logged in to reply to this topic. Login to reply

Grant Fritchey SSC Guru Points: 399419 More actions · Answer 1

SQL Server 2014 introduced a new cardinality estimation engine. Any migration from a version of SQL Server prior to 2014 to post 2014 has to deal with this issue. In all likelihood, this is what you're hitting. The actual error is fairly rare. The optimizer didn't have enough memory to complete processing. Usually this is an indication of a very complex query, or a moderately complex query combined with a bad structure (bad, missing, incorrect indexes and bad, missing, out of date statistics).

The best solution, fix the bad code.

Second best solution, switch the database to the old compatibility level and run the load for a period of time (you'll have to determine what that length of time is) with the Query Store enabled on the database. After you've run the load for a period of time, switch the compatibility mode again. Try forcing the plans that regress. You may still have issues because plan forcing doesn't prevent or interrupt the optimization process, and that's where you're hitting issues. However, it could help. That's what I'd do as an initial test.

The only other possibility is that you've configured the server to use radically less memory than it had before? I'm not sure. I'd still lean towards it being the cardinality estimation engine.

"The credit belongs to the man who is actually in the arena, whose face is marred by dust and sweat and blood"
- Theodore Roosevelt

Author of:
SQL Server Execution Plans
SQL Server Query Performance Tuning

Jeff Moden SSC Guru Points: 1004749 More actions · Answer 2

Ahhh... I've got to disagree a bit. Instead of changing to comparability level, SQL Server 2017 does have the ability to use the old cardinality estimator at at least the DB level of granularity.

I totally agree on the fixing of the code that the new cardinality flushed out, though.

--Jeff Moden

RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
First step towards the paradigm shift of writing Set Based code:
________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

Change is inevitable... Change for the better is not.

Helpful Links:
How to post code problems
How to Post Performance Problems
Create a Tally Function (fnTally)

Grant Fritchey SSC Guru Points: 399419 More actions · Answer 3

Jeff Moden wrote:

Ahhh... I've got to disagree a bit. Instead of changing to comparability level, SQL Server 2017 does have the ability to use the old cardinality estimator at at least the DB level of granularity.
I totally agree on the fixing of the code that the new cardinality flushed out, though.

Oh, I didn't suggest leaving the old compatibility in place. That's a horrible solution. Just putting in there while you gather data for the query store.

"The credit belongs to the man who is actually in the arena, whose face is marred by dust and sweat and blood"
- Theodore Roosevelt

Author of:
SQL Server Execution Plans
SQL Server Query Performance Tuning

Jeff Moden SSC Guru Points: 1004749 More actions · Answer 4

Grant Fritchey wrote:

Jeff Moden wrote:
Ahhh... I've got to disagree a bit. Instead of changing to comparability level, SQL Server 2017 does have the ability to use the old cardinality estimator at at least the DB level of granularity.
I totally agree on the fixing of the code that the new cardinality flushed out, though.
Oh, I didn't suggest leaving the old compatibility in place. That's a horrible solution. Just putting in there while you gather data for the query store.

Ah... got it. Thanks, Grant.

--Jeff Moden

RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
First step towards the paradigm shift of writing Set Based code:
________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

Change is inevitable... Change for the better is not.

Helpful Links:
How to post code problems
How to Post Performance Problems
Create a Tally Function (fnTally)

jsqldba SSChampion Points: 11257 More actions · Answer 5

hi guys, thanks for the info. the new server is configured the same as the old, same memory and cpu. the query is actually quite simple, being a SELECT INTO from one table. i simplified the where clause a bit for the user, and added an index to drop the cost from 99% table scan to 46% table seek, but still the same. also updated statistics (which are fully updated every weekend) and no change. we've got a support case with MS open at the moment. i'll post an update later.

thanks!

SELECT PRODORDER

FROM [xxx].[/BI0/PPRODORDER]

INTO TABLE I_ORDERS

WHERE OBJVERS = 'A'

AND (STATUSSYS0 IN (0, 1, 2, 5, 6) OR STATUSSYS2 IN (0,1))

AND CO_AREA <> ''