What is happening at END of an insert/delete?

  • I have been inserting large numbers of records from a “live” table into an archive table in a query window in SSMS. I’m doing this in batches and know in advance how many records will be inserted. Once the records are in the archive table, I’m deleting them from the live table. I’m dealing with roughly 50,000,000 records at a time. In another query window, I’m monitoring the live and archive tables to get a sense of where the process is based on the number of rows in the tables. I’m using:

    sp_spaceused 'MySchema.MyTable'

    What I noticed is that my sp_spaceused will return a “rows” value that indicates my insert or delete is done, however the query window where the insert or delete is occurring continues to indicate “Executing…” for several more minutes. In one example, when performing the insert, my “rows” value indicated all the rows in question were inserted in 25 minutes, however the query window continued to chug along and the final time on the insert was 58 minutes. If all the rows were in the table at the 25 minute mark, what exactly was going on under the hood from minute 26 to minute 58?? I’m not a DBA and can only think of a few things:

    - Is it completing writing to the transaction log? I would have thought it would be doing that all along?

    - Is it cleaning up from the internal “inserted” table?

    - Is it cleaning up tempdb?

    - Is it writing to disk? I would think it would have been doing that all along and the point at which I saw all the rows in the table, I would have expected the disk space to have been allocated.

    - Is it updating statistics? But when I run DBCC SHOW_STATISTICS (‘MySchema.MyTable’, PK_MyPKName) WITH STAT_HEADER, I see that the statistics are stale.

    Regardless of the methodology (this was just a quick, get the old data archived kind of thing) I would love to understand what’s actually happening here!

  • So your query windows runs either an INSERT command or a DELETE command? Else my answer would be that it's running the DELETE command once the INSERT is done.

    But if there are indexes on the table that could be the answer.

    It may also be a matter of that the value that sp_spaceused reads is updated ahead of time. You could run "SELECT COUNT(*) FROM tbl WITH (NOLOCK)" to see that gives.

    [font="Times New Roman"]Erland Sommarskog, SQL Server MVP, www.sommarskog.se[/font]

  • Correct. The query window is running only one of those. And in another query window, I'm checking the table as it runs (using sp_spaceused because it is so much quicker than count(*) ) and gradually see the rowcount change. Then I see I've hit my target rowcount, but my other window continues executing for quite a while longer. Thank you for your reply.

  • I played with this and I was able to reproduce the behaviour. And in my test, the answer is clearly: populating the non-clustered indexes.

    I had this script:

    CREATE TABLE "Orders" (

    "OrderID" "int" NOT NULL ,

    "CustomerID" nchar (5) NULL ,

    "EmployeeID" "int" NULL ,

    "OrderDate" "datetime" NULL ,

    "RequiredDate" "datetime" NULL ,

    "ShippedDate" "datetime" NULL ,

    "ShipVia" "int" NULL ,

    "Freight" "money" NULL CONSTRAINT "DF_Orders_Freight" DEFAULT (0),

    "ShipName" nvarchar (40) NULL ,

    "ShipAddress" nvarchar (60) NULL ,

    "ShipCity" nvarchar (15) NULL ,

    "ShipRegion" nvarchar (15) NULL ,

    "ShipPostalCode" nvarchar (10) NULL ,

    "ShipCountry" nvarchar (15) NULL ,

    CONSTRAINT "PK_Orders" PRIMARY KEY CLUSTERED

    (

    "OrderID"

    )

    )

    GO

    CREATE INDEX "CustomerID" ON "dbo"."Orders"("CustomerID")

    CREATE INDEX "CustomersOrders" ON "dbo"."Orders"("CustomerID")

    CREATE INDEX "EmployeeID" ON "dbo"."Orders"("EmployeeID")

    CREATE INDEX "EmployeesOrders" ON "dbo"."Orders"("EmployeeID")

    CREATE INDEX "OrderDate" ON "dbo"."Orders"("OrderDate")

    CREATE INDEX "ShippedDate" ON "dbo"."Orders"("ShippedDate")

    CREATE INDEX "ShippersOrders" ON "dbo"."Orders"("ShipVia")

    CREATE INDEX "ShipPostalCode" ON "dbo"."Orders"("ShipPostalCode")

    GO

    INSERT Orders (OrderID,

    CustomerID,

    EmployeeID,

    OrderDate, RequiredDate, ShippedDate,

    ShipVia, Freight, ShipName,

    ShipAddress, ShipCity, ShipRegion, ShipPostalCode, ShipCountry)

    SELECT (a.OrderID - (SELECT MIN(OrderID) FROM Northwind..Orders)) * 10000 + b.OrderID,

    substring(a.CustomerID, 1, 3) + substring(b.CustomerID, 4, 2),

    ((a.EmployeeID - 1) * 100) + b.EmployeeID,

    a.OrderDate, b.RequiredDate, a.ShippedDate,

    (a.ShipVia - 1) * 100 + b.ShipVia, a.Freight, b.ShipName,

    a.ShipAddress, b.ShipCity, a.ShipRegion, b.ShipPostalCode, a.ShipCountry

    FROM Northwind..Orders a

    CROSS JOIN Northwind..Orders b

    WHERE a.OrderID < b.OrderID

    go

    drop table Orders

    And while this was running, I ran this from a second window:

    exec sp_spaceused Orders

    SELECT COUNT(*) FROM Orders WITH (NOLOCK)

    SELECT COUNT(*) FROM Orders WITH (NOLOCK, INDEX = 1)

    The first COUNT(*) will use one of the non-clustered indexes, because that is a more efficient way of counting rows. The second COUNT(*) query forces the clustered index.

    Typical output:

    name rows reserved data index_size unused

    -------------------- ----------- ------------------ ------------------ ------------------ ------------------

    Orders 344035 68944 KB 65864 KB 3008 KB 72 KB

    -----------

    0

    (1 row(s) affected)

    -----------

    344035

    (1 row(s) affected)

    344035 is the final number of rows in the table.

    [font="Times New Roman"]Erland Sommarskog, SQL Server MVP, www.sommarskog.se[/font]

  • Wow!! Very cool! Thank you!

Viewing 5 posts - 1 through 4 (of 4 total)

You must be logged in to reply to this topic. Login to reply