TempDB

Question

TempDB

riyadh_01

SSC Veteran

Points: 250
More actions
January 20, 2016 at 3:38 am

#307176

Hi,
I am running an INSERT statement (the table is built with 1 billion records). The problem is the TempDB grows to over 600 GB and fills the disk prior to completion of the query executing.
I checked the tempDB and details as follows ;
Recovery model: Simple
Logic Name: tempdev
Initial size: 8MB
Autogrowth: By 10 percent, unrestricted growth
Logic Name: templog
Initial size: 1MB
Autogrowth: By 10 percent, unrestricted growth
Please advise if I need to do anything to control the size of the tempDB from growing out of control and utilising the entire disk space..
INSERT INTO dbo.EmisEvent (EventGuidDigest,PatientID,ConsultationID,StaffID,EventDate,EventType,MedicalCodeID,Value,Unit,Abnormal)
SELECT EventGuidDigest,PatientID,ConsultationID,StaffID,EventDate,EventType,MedicalCodeID,Value,Unit,Abnormal
FROM dbo.EmisEventExtraction WITH (tablock)
Can I do the insertion in batches and how can I change the INSERT statement....
Thanks

Viewing 9 posts - 1 through 8 (of 8 total)

You must be logged in to reply to this topic. Login to reply

Hugo Kornelis SSC Guru Points: 64780 More actions · Answer 1

First, change the settings for tempdb. The settings you use are miniscule. Every time the server restarts it will drop tempdb and recreate it using those settings; and then it will have to grow back as more space is needed, causing delays in your execution, and resulting in very fragmented files. My tip is to cycle your server, wait until a normal workload has been executed, check how big the data and log files in tempdb have grown, and then change tempdb to be at least 120% of that. That ensures that tempdb will be allocated at sufficient size the next time you restart the server and will hardly ever have to grow. (You should also change the autogro settings to be a fixed amount - about 5-10% of the starting size is a good default)

You should also consider using multiple files for tempdb and setting some traceflags to utilize them correctly, but that's way out of scope of your question. Google for it, or start a new topic if you are interested.

The reason that tempdb needs to grow during the insert is probably because there are multiple indexes on the target table. If you look at the execution plan, you will probably see a few table spools. One of them stores all rows inserted in a worktable in tempdb, the others read those rows, sort them in the order of one of the nonclustered indexes, and then add the new rows to that index. This forms a double hit on tempdb: the worktable for the spool itself (which will contain a copy of all rows for all columns that appear in at least one index - probably the full data), but also overflow work memory for sorting (the sorts probably don't fit in memory, so they use tempdb to store the data that does not fit while sorting).

You could consider dropping all nonclustered indexes (not the clsutered index!!!!) on the target table before inserting the rows, then rebuilding them afterwards. This eliminates the need for the spools in the execution plan. You will still use tempdb for the spilled workarea of the sort operators, though. But do beware that the effectivity of this method depends for a large part on how much data already exists in the target table. If you have 800 billion rows there and use this trick to add a billion rows, then instead of sorting the one billion rows that were inserted, you will sort all 801 billion rows that are in the table after the insert. It might still be faster and take less tempdb space and it does have the additional benefiit of creating defragmented indexes, but it's more a tradeoff then when you do this with an (almost) empty target table.

Batching the insert as you suggest can be a great alternative strategy. To do this, you need to find a way to batch the input that is guaranteed to hit all rows exactly once, and that is effective. Which means that it will depend on what constraints and indexes you ahve on the source table. If you can post the CREATE TABLE statement for it, plus all indexes, we can try to provide some pointers.

Hugo Kornelis, SQL Server/Data Platform MVP (2006-2016)
Visit my SQL Server blog: https://sqlserverfast.com/blog/
SQL Server Execution Plan Reference: https://sqlserverfast.com/epr/

riyadh_01 SSC Veteran Points: 250 More actions · Answer 2

Thank you for the detailed explanation - I will really like to do the INSERT in batches and the table structure is as follows;

/****** Object: Table [dbo].[EmisEvent] Script Date: 01/20/2016 11:35:29 ******/

SET ANSI_NULLS ON

GO

SET QUOTED_IDENTIFIER ON

GO

SET ANSI_PADDING ON

GO

CREATE TABLE [dbo].[EmisEvent](

[EventID] [int] IDENTITY(1,1) NOT NULL,

[EventGuidDigest] [char](64) NOT NULL,

[PatientID] [bigint] NOT NULL,

[ConsultationID] [bigint] NULL,

[StaffID] [int] NOT NULL,

[EventDate] [datetime] NULL,

[EventType] [tinyint] NOT NULL,

[MedicalCodeID] [int] NOT NULL,

[Value] [decimal](19, 3) NULL,

[Unit] [varchar](50) NULL,

[Abnormal] [bit] NOT NULL,

CONSTRAINT [PK_EmisEvent] PRIMARY KEY CLUSTERED

(

[EventID] ASC

)WITH (PAD_INDEX = OFF, STATISTICS_NORECOMPUTE = OFF, IGNORE_DUP_KEY = OFF, ALLOW_ROW_LOCKS = ON, ALLOW_PAGE_LOCKS = ON) ON [PRIMARY]

) ON [PRIMARY]

GO

SET ANSI_PADDING OFF

GO

and the query is as follows;

SELECT EMISNov15DB.dbo.Event.EventGuidDigest, dbo.EmisPatient.PatientID, dbo.EmisConsultation.ConsultationID, dbo.EmisStaff.StaffID,

EMISNov15DB.dbo.Event.EffectiveDateTime AS EventDate, EMISNov15DB.dbo.Event.EventType, EMISNov15DB.dbo.Event.SnomedCTConceptId,

EMISNov15DB.dbo.Event.EmisCode, EMISNov15DB.dbo.Event.ReadCode, EMISNov15DB.dbo.Event.Term, dbo.CPRDLkupMedical.MedicalCodeID,

EMISNov15DB.dbo.Event.NumericValue AS Value, EMISNov15DB.dbo.Event.NumericUnits AS Unit, EMISNov15DB.dbo.Event.IsAbnormal AS Abnormal

FROM EMISNov15DB.dbo.Event WITH (tablock) INNER JOIN

dbo.EmisPatient ON EMISNov15DB.dbo.Event.PatientGuidDigest = dbo.EmisPatient.PatientGuidDigest INNER JOIN

dbo.EmisStaff ON EMISNov15DB.dbo.Event.AuthorisingUserRoleGuidDigest = dbo.EmisStaff.StaffGuidDigest LEFT OUTER JOIN

dbo.EmisConsultation ON dbo.EmisPatient.PatientID = dbo.EmisConsultation.PatientID AND

EMISNov15DB.dbo.Event.ConsultationGuidDigest = dbo.EmisConsultation.ConsultationGuidDigest LEFT OUTER JOIN

dbo.CPRDLkupMedical ON (EMISNov15DB.dbo.Event.SnomedCTConceptId = dbo.CPRDLkupMedical.Snomed OR

EMISNov15DB.dbo.Event.SnomedCTConceptId IS NULL AND dbo.CPRDLkupMedical.Snomed IS NULL) AND

ISNULL(CASE WHEN LEFT(EMISNov15DB.dbo.Event.EmisCode, 7) = 'EMISATT' THEN 'EMISATT' WHEN LEFT(EMISNov15DB.dbo.Event.EmisCode, 5)

= 'PCSDT' THEN 'PCSDT' ELSE EMISNov15DB.dbo.Event.EmisCode END, N'(novalue)') COLLATE SQL_Latin1_General_CP1_CS_AS = ISNULL(dbo.CPRDLkupMedical.Emiscode,

N'(novalue)') COLLATE SQL_Latin1_General_CP1_CS_AS AND (EMISNov15DB.dbo.Event.SnomedCTDescriptionId = dbo.CPRDLkupMedical.SnomedDescription OR

EMISNov15DB.dbo.Event.SnomedCTDescriptionId IS NULL AND dbo.CPRDLkupMedical.SnomedDescription IS NULL) AND ISNULL(EMISNov15DB.dbo.Event.Term,

N'(novalue)') = ISNULL(dbo.CPRDLkupMedical.Term, N'(novalue)') AND ISNULL(CASE WHEN LEFT(EMISNov15DB.dbo.Event.ReadCode, 5)

= 'PCSDT' THEN 'PCSDT' WHEN LEFT(EMISNov15DB.dbo.Event.ReadCode, 7) = 'EMISATT' THEN 'EMISATT' ELSE EMISNov15DB.dbo.Event.ReadCode END,

N'(novalue)') COLLATE SQL_Latin1_General_CP1_CS_AS = ISNULL(dbo.CPRDLkupMedical.Readcode, N'(novalue)') COLLATE SQL_Latin1_General_CP1_CS_AS

Thank you

Hugo Kornelis SSC Guru Points: 64780 More actions · Answer 3

Now I am confused.

In your original post you included a query for the insert that showed the new rows as coming from a simple select on a single table: dbo.EmisEventExtraction. That is the table I need to know the schema and indexes for in order to help you batch the insert.

Now you post the schema of the table you are inserting into (not what I was asking, but posting more information is always good so thanks), and a rather complex query that probably is related to your question in some way that I don't understand. And you don't post the schema of the dbo.EmisEventExtraction table that I requested.

I'd love to help you, but I need to understand the question first...

Hugo Kornelis, SQL Server/Data Platform MVP (2006-2016)
Visit my SQL Server blog: https://sqlserverfast.com/blog/
SQL Server Execution Plan Reference: https://sqlserverfast.com/epr/

riyadh_01 SSC Veteran Points: 250 More actions · Answer 4

A view (dbo.EmisEventExtraction) populates dbo.EmisEvent table. So, dbo.EmisEventExtraction is a SQL View and not a table.

So the INSERT reads the above view to insert the records to dbo.EmisEvent. The view reads from different tables to extract the required fields.

I hope am clearer in the explanation. So the batch extraction could be based on the INSERT statement correct?

Thanks for your assistance in this issue

Hugo Kornelis SSC Guru Points: 64780 More actions · Answer 5

Ah, that explains my confusion. Good.

I do still have some more questions, though:

1. In the CREATE TABLE for the EmisEvent table, I see only a clustered index. Are there no other indexes on this table, or have you forgotten to post them?

2. Since EmisEventExtraction is a view, it's going to be a bit harder for me to find a way to batch the inserts. And I will need much more information. For all tables that are used in the view, I will need the CREATE TABLE statement - you can leave out columns not used in the view definition, but please do include all constraints and all indexes. If you can also provide INSERT statements with a few rows of sample data for each of those tables (and ensure that the sample data is consistent), that would be great too. (Please before posting run the code in an empty database to check for errors so that I don't have to!)

3. I will also need to know how many rows are in each of the table referenced by the view, and how many rows are in EmisEvent before you start the insert statement.

4. Not sure if I am going to need it but just in case - is it possible to change the query used in the view, or is this a given that you cannot change?

I realize that I'm asking for a lot of information, but without it I will not be able to give you a good solution.

Hugo Kornelis, SQL Server/Data Platform MVP (2006-2016)
Visit my SQL Server blog: https://sqlserverfast.com/blog/
SQL Server Execution Plan Reference: https://sqlserverfast.com/epr/

riyadh_01 SSC Veteran Points: 250 More actions · Answer 6

Ok - thanks for your input Sir..

Answers to your questions:

1) No, there is ONLY one clustered index (EventID) and it is an identity and will be incremented by value 1..

2)

/****** Object: Table [dbo].[Event] Script Date: 01/20/2016 14:55:06 ******/

SET ANSI_NULLS ON

GO

SET QUOTED_IDENTIFIER ON

GO

SET ANSI_PADDING ON

GO

CREATE TABLE [dbo].[Event](

[EventGuidDigest] [char](64) NOT NULL,

[PatientGuidDigest] [char](64) NOT NULL,

[EventType] [tinyint] NOT NULL,

[Term] [nvarchar](255) NULL,

[SnomedCTConceptId] [bigint] NULL,

[EmisCode] [varchar](50) NULL,

[ReadCode] [varchar](50) NULL,

[NumericValue] [decimal](19, 3) NULL,

[NumericUnits] [varchar](50) NULL,

[EffectiveDateTime] [datetime] NULL,

[ConsultationGuidDigest] [char](64) NULL,

[IsAbnormal] [bit] NOT NULL,

[AuthorisingUserRoleGuidDigest] [char](64) NOT NULL,

CONSTRAINT [PK_Event_PatientGuidDigest_EventGuidDigest] PRIMARY KEY CLUSTERED

(

[PatientGuidDigest] ASC,

[EventGuidDigest] ASC

)WITH (PAD_INDEX = OFF, STATISTICS_NORECOMPUTE = OFF, IGNORE_DUP_KEY = OFF, ALLOW_ROW_LOCKS = ON, ALLOW_PAGE_LOCKS = ON) ON [PRIMARY]

) ON [PRIMARY]

GO

SET ANSI_PADDING OFF

GO

ALTER TABLE [dbo].[Event] WITH CHECK ADD CONSTRAINT [FK_Event_AuthorisingUserRoleGuidDigest] FOREIGN KEY([AuthorisingUserRoleGuidDigest])

REFERENCES [dbo].[UserRole] ([UserRoleGuidDigest])

GO

ALTER TABLE [dbo].[Event] CHECK CONSTRAINT [FK_Event_AuthorisingUserRoleGuidDigest]

GO

ALTER TABLE [dbo].[Event] WITH CHECK ADD CONSTRAINT [FK_Event_PatientGuidDigest] FOREIGN KEY([PatientGuidDigest])

REFERENCES [dbo].[Patient] ([PatientGuidDigest])

GO

ALTER TABLE [dbo].[Event] CHECK CONSTRAINT [FK_Event_PatientGuidDigest]

GO

ALTER TABLE [dbo].[Event] WITH CHECK ADD CONSTRAINT [FK_Event_PatientGuidDigest_ConsultationGuidDigest] FOREIGN KEY([PatientGuidDigest], [ConsultationGuidDigest])

REFERENCES [dbo].[Consultation] ([PatientGuidDigest], [ConsultationGuidDigest])

GO

ALTER TABLE [dbo].[Event] CHECK CONSTRAINT [FK_Event_PatientGuidDigest_ConsultationGuidDigest]

GO

/****** Object: Table [dbo].[EmisStaff] Script Date: 01/20/2016 14:57:34 ******/

SET ANSI_NULLS ON

GO

SET QUOTED_IDENTIFIER ON

GO

SET ANSI_PADDING ON

GO

CREATE TABLE [dbo].[EmisStaff](

[StaffID] [int] IDENTITY(1,1) NOT NULL,

[StaffGuidDigest] [char](64) NOT NULL,

[JobCategory] [varchar](100) NOT NULL,

CONSTRAINT [PK_EmisStaff] PRIMARY KEY CLUSTERED

(

[StaffID] ASC

)WITH (PAD_INDEX = OFF, STATISTICS_NORECOMPUTE = OFF, IGNORE_DUP_KEY = OFF, ALLOW_ROW_LOCKS = ON, ALLOW_PAGE_LOCKS = ON) ON [PRIMARY]

) ON [PRIMARY]

GO

SET ANSI_PADDING OFF

GO

/****** Object: Table [dbo].[EmisPatient] Script Date: 01/20/2016 14:58:15 ******/

SET ANSI_NULLS ON

GO

SET QUOTED_IDENTIFIER ON

GO

SET ANSI_PADDING ON

GO

CREATE TABLE [dbo].[EmisPatient](

[PatientID] [bigint] NOT NULL,

[PatientNo] [int] IDENTITY(1,1) NOT NULL,

[PracID] [int] NOT NULL,

[PersonID] [int] NULL,

[PatientGuidDigest] [char](64) NOT NULL,

[BirthYear] [date] NOT NULL,

[Gender] [char](10) NULL,

[DeathDate] [date] NULL,

[Ethnicity] [nvarchar](510) NULL,

[EthnicityCode] [varchar](200) NULL,

[PatientType] [varchar](50) NOT NULL,

[Registered] [bit] NOT NULL,

[RegistrationStartDate] [date] NULL,

[RegistrationEndDate] [datetime] NULL,

[PatientStatus] [tinyint] NULL,

[TruncatedPostcode] [varchar](6) NULL,

CONSTRAINT [PK_EmisPatient] PRIMARY KEY CLUSTERED

(

[PatientID] ASC

)WITH (PAD_INDEX = OFF, STATISTICS_NORECOMPUTE = OFF, IGNORE_DUP_KEY = OFF, ALLOW_ROW_LOCKS = ON, ALLOW_PAGE_LOCKS = ON) ON [PRIMARY]

) ON [PRIMARY]

GO

SET ANSI_PADDING OFF

GO

/****** Object: Table [dbo].[EmisConsultation] Script Date: 01/20/2016 14:58:33 ******/

SET ANSI_NULLS ON

GO

SET QUOTED_IDENTIFIER ON

GO

SET ANSI_PADDING ON

GO

CREATE TABLE [dbo].[EmisConsultation](

[ConsultationID] [bigint] IDENTITY(1,1) NOT NULL,

[ConsultationGuidDigest] [char](64) NOT NULL,

[PatientID] [bigint] NOT NULL,

[StaffID] [int] NOT NULL,

[ConsultationDate] [datetime] NULL,

[ConsultationType] [varchar](200) NULL,

[Duration] [smallint] NULL,

CONSTRAINT [PK_EmisConsultation] PRIMARY KEY CLUSTERED

(

[ConsultationID] ASC

)WITH (PAD_INDEX = OFF, STATISTICS_NORECOMPUTE = OFF, IGNORE_DUP_KEY = OFF, ALLOW_ROW_LOCKS = ON, ALLOW_PAGE_LOCKS = ON) ON [PRIMARY]

) ON [PRIMARY]

GO

SET ANSI_PADDING OFF

GO

/****** Object: Table [dbo].[CPRDLkupMedical] Script Date: 01/20/2016 14:59:01 ******/

SET ANSI_NULLS ON

GO

SET QUOTED_IDENTIFIER ON

GO

SET ANSI_PADDING ON

GO

CREATE TABLE [dbo].[CPRDLkupMedical](

[MedicalCodeID] [int] IDENTITY(1,1) NOT NULL,

[Snomed] [bigint] NULL,

[SnomedDescription] [bigint] NULL,

[Emiscode] [varchar](50) NULL,

[Readcode] [varchar](50) NULL,

[Term] [nvarchar](255) NULL,

CONSTRAINT [PK_CPRDLkupMedicalNew] PRIMARY KEY CLUSTERED

(

[MedicalCodeID] ASC

)WITH (PAD_INDEX = OFF, STATISTICS_NORECOMPUTE = OFF, IGNORE_DUP_KEY = OFF, ALLOW_ROW_LOCKS = ON, ALLOW_PAGE_LOCKS = ON) ON [PRIMARY]

) ON [PRIMARY]

GO

SET ANSI_PADDING OFF

GO

3) dbo.Event = 1 Billions records

dbo.EmisPatient = 7 M

dbo.EmisConsultation = 500 M

dbo.EmisStaff = 10,000

dbo.CPRDLkupMedical = 200000

dbo.EmisEvent = 0 records prior to insertion

4 ) Yes you can change the query in the view to improve performance.

Thanks

TheSQLGuru SSC Guru Points: 134017 More actions · Answer 7

Functions around columns in WHERE or JOIN clauses can be DEVASTATINGLY BAD!! CPU burn, bad estimates due to preventing good statistics acquisition (meaning bad plans and bad concurrency both, including the potential for MASSIVE HASHES/SORTs or bajillion-row NESTED LOOPS/INDEX SEEKS), no INDEX SEEKS when they should otherwise be used. Just HORRIBLE. I hope you can find a way to make them go away. Smells like questionable design though, although I didn't evaluate closely enough.

Best,
Kevin G. Boles
SQL Server Consultant
SQL MVP 2007-2012
TheSQLGuru on googles mail service

Hugo Kornelis SSC Guru Points: 64780 More actions · Answer 8

When I attempt to run the CREATE TABLE statements you posted, I get a ton of errors. Please create an empty database, run the code yourself, and fix all errors, then post the corrected code.

I am happy to spend my time to fix problems you are experiencing, but I refuse to waste my time on fixing simple things like foreign keys to other tables that you can easily check and remove yourself.

Hugo Kornelis, SQL Server/Data Platform MVP (2006-2016)
Visit my SQL Server blog: https://sqlserverfast.com/blog/
SQL Server Execution Plan Reference: https://sqlserverfast.com/epr/