Whilst doing some ad-hoc work on some data on a 2012 SP1 instance I came across an issue with the SQL server raising a sev 20 error with a script that I was attempting to run. The work that I was doing was that I was exporting some binary data (basic text files) that is stored in one table to the file system via a filetable within the same database. It came as bit of a surprise when I saw this error message:
Process ID: 15004
Description: Trying to use the transaction while there are 1 parallel nested xacts outstanding
Msg 3624, Level 20, State 1, Line 1
A system assertion check has failed. Check the SQL Server error log for details. Typically, an assertion failure is caused by a software bug or data corruption. To check for database corruption, consider running DBCC CHECKDB. If you agreed to send dumps to Microsoft during setup, a mini dump will be sent to Microsoft. An update might be available from Microsoft in the latest Service Pack or in a QFE from Technical Support.
Msg 0, Level 20, State 0, Line 0
A severe error occurred on the current command. The results, if any, should be discarded.
The query that I had developed was nothing special and only consisted of an INSERT..SELECT statement that had a few joins. Naturally at this point I’m reaching for google to see what else I could find out about this error which wasn’t an awful lot. So I opened up my very first connect ticket! I had to complete the task at hand fairly quickly so my first attempt at circumventing this bug was to create a temp table and insert the data into there first before inserting the contents of the temp table directly into the filetable. This actually worked fine and the error wasn’t raised by using this method. Furthermore, further testing showed that using the MAXDOP 1 hint on the original query I developed also eliminated the error.
Afterwards, I thought I would try to develop a repro script but at first I had trouble generating the error again. It was only after plenty of trial and error that I realised that I could only get the error to occur when 1) SQL choose a parallel plan 2) the filetable that I was inserting into was actually referenced as part of the SELECT query and 3) with a certain amount of data in the base tables. I was joining in the filetable to the query as I wanted to export the data into certain folders and I need the path_locator value as explained in this blog by Bob Beauchemin
This is the query that I'm using in the repro. Table4 is the filetable:
INSERT INTO [dbo].[Table4] (name, file_stream, path_locator) SELECT t3.xName, CAST(t3.xData AS VARBINARY(MAX)) , path_locator.ToString() + convert(varchar(20), convert(bigint, substring(convert(binary(16), newid()), 1, 6))) + '.' + convert(varchar(20), convert(bigint, substring(convert(binary(16), newid()), 7, 6))) + '.' + convert(varchar(20), convert(bigint, substring(convert(binary(16), newid()), 13, 4))) + '/' FROM [dbo].[Table1] t1 INNER JOIN [dbo].[Table2] t2 ON t1.Id = t2.xId INNER JOIN [dbo].[Table3] t3 ON t2.Id = t3.xId INNER JOIN [dbo].[Table4] t4 ON t4.name = t2.Name WHERE t2.zId = 35
For any of those wanting to try this out yourselves, here is a full repro script but I can't guarantee that this will always crash. It does for me on two 2012 instances. I'm not going to show it in its entirety here as there is quite a bit of code/data to actually get the error to surface. Additionally as part of the repro, to get the optimiser to choose a parallel plan I’ve had to use the undocumented DBCC SETCPUWEIGHT as mentioned in this blog by Paul White. One of the tables referenced in the original query is about 30gb in production which may have been a factor why SQL choose a parallel plan as that was the main difference between the production and test boxes. I just couldn’t get SQL to produce a parallel plan during my testing otherwise even with identical schemas and identical data in all other tables bar this larger table.
Needless to say, don't run the repro in your production!