Click here to monitor SSC
SQLServerCentral is supported by Red Gate Software Ltd.
 
Log in  ::  Register  ::  Not logged in
 
 
 

On the exorcism of Ghost Records

There was a question earlier on one of the SQL forums as to whether or not Ghost Cleanup overwrote the deleted rows when it ran so as to ensure no one could read that data again.

Now I could just reference Paul Randal‘s blog post where he described Ghost Cleanup in depth and leave it there, but where would the fun be in that? Smile

So, now that everyone’s read Paul’s blog post, time for some fun with undocumented commands to prove (again) what we already know (and what Paul’s already shown).

Setup code:

CREATE TABLE TestingCleanup (
ID INT IDENTITY PRIMARY KEY,
Description VARCHAR(20),
Filler CHAR(50) DEFAULT ''
);
GO

INSERT INTO TestingCleanup (Description)
VALUES ('One'), ('Two'), ('Three'), ('Four'), ('Five'), ('Six'), ('Seven'), ('Eight'), ('Nine'), ('Ten')

-- Find the page number that the table is on

SELECT OBJECT_ID('TestingCleanup') -- 1399012065
DBCC IND(11,1399012065,1)
-- The IAM is 1:309 and the data page is 1:308

Now I’m going to delete half the table and then go and look at the page. The ghost cleanup can’t run until the transaction commits, so running this within a transaction gives a chance to go and look at how things work.

BEGIN TRANSACTION
DELETE FROM TestingCleanup WHERE ID%2 = 0 -- delete the even rows
SELECT ID, Description FROM TestingCleanup -- 5 rows
DBCC TRACEON (3604)
DBCC PAGE(11,1,310,1)
DBCC TRACEOFF (3604)
COMMIT TRANSACTION

The delete removes all the rows with even identity values and the select returns only 5 rows, as expected

Dump type 1 for DBCC Page gives the header and then each row separately in binary. I’m editing out uninteresting parts of the output to keep things manageable.

Page header:

m_pageId = (1:308)                   m_headerVersion = 1                  m_type = 1
Metadata: ObjectId = 1399012065      m_prevPage = (0:0)                   m_nextPage = (0:0)
m_ghostRecCnt = 5
m_tornBits = 0

Five ghosted records (m_ghostRecCnt = 5), corresponding to the 5 deleted rows. Now for the rows (I’m only going to show the first two)

Slot 0, Offset 0x60, Length 68, DumpStyle BYTE

Record Type = PRIMARY_RECORD         Record Attributes =  NULL_BITMAP VARIABLE_COLUMNS
Record Size = 68
Memory Dump @0x000000000FD4A060

0000000000000000:   30003a00 01000000 20202020 20202020 †0.:.....
0000000000000010:   20202020 20202020 20202020 20202020 †
0000000000000020:   20202020 20202020 20202020 20202020 †
0000000000000030:   20202020 20202020 20200300 00010044 †          .....D
0000000000000040:   004f6e65 ††††††††††††††††††††††††††††.One

Slot 1, Offset 0xa4, Length 68, DumpStyle BYTE

Record Type = GHOST_DATA_RECORD      Record Attributes =  NULL_BITMAP VARIABLE_COLUMNS
Record Size = 68
Memory Dump @0x000000000FD4A0A4

0000000000000000:   3c003a00 02000000 20202020 20202020 †<.:.....
0000000000000010:   20202020 20202020 20202020 20202020 †
0000000000000020:   20202020 20202020 20202020 20202020 †
0000000000000030:   20202020 20202020 20200300 00010044 †          .....D
0000000000000040:   0054776f ††††††††††††††††††††††††††††.Two

The first row, with an ID of 1 and Description of ‘One’, is a Primary Record. The second row, with an ID of 2 and Description of ‘Two’ is a Ghost Data Record. Deleted but not removed from the rows on the page. The same goes for the rest of the rows, the ones that were deleted (even values) are Ghost Data Records, the other are Primary Records. (aside, that begs the question, if there are Primary Data Records, are there Secondary or Tertiary? If so, where?)

Then if we look at the bottom of the DBCC Page output there’s the slot array (the offset of the rows on the page) there’s only 5 entries in there, despite there still being 10 displayed in the record section of DBCC Page. That’s because 5 of the aren’t real rows any longer.

Now let’s exorcise those ghosts.

DBCC ForceGhostCleanup -- Undocumented. Do not use in production

DBCC TRACEON (3604)
DBCC PAGE(11,1,310,1)
DBCC TRACEOFF (3604)

Now when we look at the header, the ghostRecCnt is 0, the ghosts are gone and DBCC Page only returns the 5 remaining rows thusly

</code>Slot 0, Offset 0x60, Length 68, DumpStyle BYTE

Record Type = PRIMARY_RECORD         Record Attributes =  NULL_BITMAP VARIABLE_COLUMNS
Record Size = 68
Memory Dump @0x000000000B8DA060

0000000000000000:   30003a00 01000000 20202020 20202020 †0.:.....
0000000000000010:   20202020 20202020 20202020 20202020 †
0000000000000020:   20202020 20202020 20202020 20202020 †
0000000000000030:   20202020 20202020 20200300 00010044 †          .....D
0000000000000040:   004f6e65 ††††††††††††††††††††††††††††.One

Slot 1, Offset 0xe8, Length 70, DumpStyle BYTE

Record Type = PRIMARY_RECORD         Record Attributes =  NULL_BITMAP VARIABLE_COLUMNS
Record Size = 70
Memory Dump @0x000000000B8DA0E8

0000000000000000:   30003a00 03000000 20202020 20202020 †0.:.....
0000000000000010:   20202020 20202020 20202020 20202020 †
0000000000000020:   20202020 20202020 20202020 20202020 †
0000000000000030:   20202020 20202020 20200300 00010046 †          .....F
0000000000000040:   00546872 6565††††††††††††††††††††††††.Three          

So Two’s gone. Or has is?

DBCC Page with dump type 2 dumps the entire page out without any interpretation of rows (the header is still interpreted, but there’s nothing new there hence I’m omitting it). So…

DBCC TRACEON (3604)
DBCC PAGE(11,1,310,2)
DBCC TRACEOFF (3604)
</code>DATA:
Memory Dump @0x000000000D6EA000

<snip>
000000000D6EA070:   20202020 20202020 20202020 20202020 †
000000000D6EA080:   20202020 20202020 20202020 20202020 †
000000000D6EA090:   20202020 20202020 20200300 00010044 †          .....D
000000000D6EA0A0:   004f6e65 3c003a00 02000000 20202020 †.One<.:.....
000000000D6EA0B0:   20202020 20202020 20202020 20202020 †
000000000D6EA0C0:   20202020 20202020 20202020 20202020 †
000000000D6EA0D0:   20202020 20202020 20202020 20200300 †              ..
000000000D6EA0E0:   00010044 0054776f 30003a00 03000000 †...D.Two0.:.....
000000000D6EA0F0:   20202020 20202020 20202020 20202020 †
000000000D6EA100:   20202020 20202020 20202020 20202020 †
000000000D6EA110:   20202020 20202020 20202020 20202020 †
000000000D6EA120:   20200300 00010046 00546872 65653c00 †  .....F.Three<.
<snip>

The even-numbered data is still there. The rows are deleted, the page has only 5 rows on it and it requires either a raw binary dump of the page or some work with a hex editor to see them, but the data that was stored in the now deleted rows is still there and will be there until its overwritten by new rows from somewhere.

Does Ghost Cleanup overwrite old data to prevent people reading it later? No, most certainly not.  Now it’s not something any user can do. Reading the data file requires permissions to either stop SQL, detach the DB or take the DB offline (or a hex editor that ignores file locks) and, unless a hex editor is standard on servers, permission to install software (or a hex editor that doesn’t require installation) and DBCC Page requires sysadmin rights. It’s still something to bear in mind if you’re ever working with confidential or classified data that has to be irretrievable after being deleted. It’s far from trivial to ensure that in SQL, especially if working with variable-length columns (where an update might not overwrite the old data if a page split/forwarded record results).

Comments

Posted by Jason Brimhall on 8 June 2011

Excellent post Gail - thanks

Posted by Perry Whittle on 13 June 2011

Interesting article Gail. It's possibly a good argument for the use of cell encryption to protect sensitive data.

The biggest problem I have seen with ghost cleanup on the forums is when a user has a large table with very large columns, commonly of type image and ntext, etc. They delete a large number of rows but do not see the space released immediately which I thinks gets them confused.

Posted by matt.newman on 12 July 2011

I am curious if anyone has used dbcc forceghostcleanup and had any negative affects. Not looking to place liability on anyone, just looking to see if anyone has used it and it caused an issue. I can't seem to figure out why my MSDB database doesn't want to have ghosts cleaned out of a couple select tables. I was unsure on how to check the PFS to see if the ghost flag was on for the index to clean up but have rebuilt the index, to no fix of the issue. My two testing servers are fine. The production databases, 1 stray ghost on Master, and clean user databases, but MSDB just doesn't want to admit it's not halloween anymore.

Leave a Comment

Please register or log in to leave a comment.