MSDB growth on sysjobstepslogs due to whitespace

Question

MSDB growth on sysjobstepslogs due to whitespace

matt.newman

Hall of Fame

Points: 3910
More actions
June 16, 2011 at 12:28 pm

#238180

Has anyone encountered the issue where you; have a job, the job step advanced options are set to log to table, the job step runs, and in the step executes a select that logs returned data... however, in that select there is a varchar data type and what is then recorded into MSDB..sysjobstepslogs is padded as though it was selecting a char data type?
To reproduce, I create a job, create a step on the job put in the below code, ran the step, and checked the table for log data growth.
set ansi_padding off
set nocount on
declare @varcharmax varchar(max), @counter int
set @counter = 1
while @counter < 100
begin
set @varcharmax = 'moo'
select rtrim(@varcharmax)
set @counter = @counter + 1
end
Checking data amount that was stored:
select
log_size,
sl.date_created,
sl.date_modified,
log_id,
name,
step_name,
step_id,
database_name,
last_run_date,
last_run_time,
last_run_duration,
description
from
sysjobstepslogs sl
join sysjobsteps ss on sl.step_uid = ss.step_uid
join sysjobs sj on sj.job_id = ss.job_id
order by 1 desc
The log size or len(log) data returned a massive amount of whitespace. I have been unable to search and find answers on how to limit this padding amount. My MSDB database is growing very quickly due to many jobs logging to table and the space is all padded in as though I wanted to see it and used a char data type in the select. Putting settings for padding and explicit trimming do not seem to help.

Viewing 15 posts - 1 through 15 (of 27 total)

You must be logged in to reply to this topic. Login to reply

matt.newman Hall of Fame Points: 3910 More actions · Answer 1

I see a few references to http://social.msdn.microsoft.com/forums/en-US/sqldatabaseengine/thread/ac5e64f7-01ee-4412-8482-9468b1e99bf4/ but no solution. If I truncate the table that keeps growing, would that solve the problem?

I do not know how to hunt for ghosts. Curious if anyone has run into this before and hunted ghosts.

matt.newman Hall of Fame Points: 3910 More actions · Answer 2

In case this comes up for anyone else, truncate to the table msdb..sysjobstepslogs will clean up the space. Just was able to free up 36 GB after doing a quick select into to a logs temp table. Only had 113 rows, but a lot of "hidden" space that was not getting reclaimed properly. Have not found the root of the problem, other than mentioned by the post linked to, but the solution I did not need to rebuild the MSDB database as some other findings stated to do.

Bam. Dun. Now just need to find the root.

Ninja's_RGR'us SSC Guru Points: 294069 More actions · Answer 3

matt.newman (6/22/2011)
In case this comes up for anyone else, truncate to the table msdb..sysjobstepslogs will clean up the space. Just was able to free up 36 GB after doing a quick select into to a logs temp table. Only had 113 rows, but a lot of "hidden" space that was not getting reclaimed properly. Have not found the root of the problem, other than mentioned by the post linked to, but the solution I did not need to rebuild the MSDB database as some other findings stated to do.
Bam. Dun. Now just need to find the root.

Can you check if that table has a clustered index?.. nothing else comming to mind atm.

matt.newman Hall of Fame Points: 3910 More actions · Answer 4

It does, and it was pretty trashed at the time. Though, it was on an int that supposedly only had 113 rows. I saw some other postings that rebuilding the clustered index did not have a successful result. Due to size held, I did not want to rebuild and lock jobs from executing, causing a pile up on resources. If I locked that table and has 23 jobs running simultaneously that wanted to log, more problems. The truncate after backup was the cleanest for this table from what the reported issue was.

Ninja's_RGR'us SSC Guru Points: 294069 More actions · Answer 5

Did you consider doing this sequence?

select * from base into ##temp

truncate base

insert into base select * from ##temp

Might have solved it for you.

matt.newman Hall of Fame Points: 3910 More actions · Answer 6

Not sure what "base" refers to here, but I did

select * into [|TempStorageTableInCaseNeededLater|] from sysjobstepslogs

truncate sysjobstepslogs

jobstepslogs was then already refreshed soon after due to job processes running (within 10 seconds new records were present). I would not "correctly" be able to insert back into without some time and effort of which I need more time for other things. As was the issue explained in the above link regarding non-reclaimed space from the table, Snapshot Isolation, if this was the issue, was retaining records due to timestamp validation rules. It appears the rule was broken sometime late last month. I had done an audit to remove varhcar(max) data from being logged to the table as it appears that data is not stored well at all (whitespace). Then, I did an assement of growth in mail queue and attachements. Although there was growth from early April, not enough to merit the space used for msdb. Then, and audit of sysjobstepslogs where I did a len of log * 8 / 1024 to get data size from all records using sum. Then a sp_spaceused as well as a couple other audit checks because the numbers were not adding up, then Google the link listed above, then the decision was truncate and hope it works based on what the issue was. I have multiple jobs running at 1 minute intervals for various tasks. To complete the operation you suggessted I would need to stop SQL Agent serverices. Although feesable, It was best to just do a temp store of the data elsewhere, truncate, and let the job queue resume as needed.

There was a combination of problems leading to msdb growth that I needed resolved. New jobs doing new things, logging varchar(max) to sysjobstepslogs logged it with massive whitespace, then the log records when they were supposed to be deleted for the next log because append was not used did not reclaim space. Isolation on, I needed to flush the table. Everything is clean now and I have 34 GB padding. I will be putting an audit check in place nightly to verify log data size and compare to freespace to the table itself.

Long winded, I was never really good at online explainations. Sorry. I think we're on the same page, but I had a few more things going on. I just listed them all out though I believe you have an understanding in case someone else runs into any of the issues will be able to search and find what I had found and hopefully get a solution in place using the above.

Thanks!

matt.newman Hall of Fame Points: 3910 More actions · Answer 7

Looks like this may go into a recurring issue for me and did not resolve the final underlaying problem.

select sum(len(log) * 8 / 1024) from sysjobstepslogs

go

sp_spaceused sysjobstepslogs

A rough comparison on these two results should show a growth in line with one another, but one number has leveled off while the other is showing an increase. Not sure why that space would not release but will keep tabs and any findings for later.

Ninja's_RGR'us SSC Guru Points: 294069 More actions · Answer 8

Switch to datalength instead of len so that it includes trailing spaces.

Also why the *8 / 1024? It's the 8 that I don't understand.

sp_spaceused returns kbs so to covert bytes to kb all you need is / 1024.

matt.newman Hall of Fame Points: 3910 More actions · Answer 9

Oops... len * 2... 2 bytes per character.

Thanks for the catch! Looks like the issue then is most definitely recurring! I will have to look into turning snapshot isolation off, though don't yet know the costs of that. Don't want to have to rebuild MSDB, may need to look at restarting the agent service.

Ninja's_RGR'us SSC Guru Points: 294069 More actions · Answer 10

matt.newman (6/23/2011)
Oops... len * 2... 2 bytes per character.
Thanks for the catch! Looks like the issue then is most definitely recurring! I will have to look into turning snapshot isolation off, though don't yet know the costs of that. Don't want to have to rebuild MSDB, may need to look at restarting the agent service.

I don't see why snapshot isolation would affect this. The only that makes sense is a lob that gets constantly updated. Then IIIRC you have to clean the table with rebuild index or dbcc cleantable (not sure about that part).

If you say you only have 113 rows, then what's the space used by the temp table where you dumped the records to?

Ninja's_RGR'us SSC Guru Points: 294069 More actions · Answer 11

P.S. datalenght will give you the correct size with trailing spaces and datatype. This is what I'd use to make sure I have the correct value there.

matt.newman Hall of Fame Points: 3910 More actions · Answer 12

Temp Table: 113 rows

namerowsreserveddataindex_sizeunused

TMPsysJobStepsLogs062211113 219064 KB219000 KB16 KB48 KB

sum datalength = 214144

Prod Table: 86 rows

namerowsreserveddataindex_sizeunused

sysjobstepslogs86 917344 KB916936 KB24 KB384 KB

sum datalength = 179032

Snapshot isolation was mentioned in: http://social.msdn.microsoft.com/forums/en-US/sqldatabaseengine/thread/ac5e64f7-01ee-4412-8482-9468b1e99bf4/

Ninja's_RGR'us SSC Guru Points: 294069 More actions · Answer 13

Wow you learn something new everyday!

Do you still have the temp table laying around?

You could try all the usual tricks (drop / rebuild clustered index, lob compaction on, etc).

If that works you could create another job that does this little dance maybe once a week (select into, repair, truncate live table and reinsert).

If the final table only has a couplds mbs the whole thing "should" run in only a few seconds...

matt.newman Hall of Fame Points: 3910 More actions · Answer 14

If I can't find an online solution my plan is try and select top 0 * into sysjobstepslogs2 from sysjobstepslogs, mod the table to reflect correct constraint on pkey, create clustered on pkey, stop the agent service, rename old table filling up to old suffix, rename new table to drop 2 suffix, and restart the service. If the issue is still apparent after that, I will need to know what turning off snapshot isolation will provide for benefit. Then, need to later assess if having jobs running at 1 minute aprt is attributing, and if all else fails, schedule something to truncate the table nightly since it does reclaim the space. Sadly it's all shoot from the hip stuff with no real solution found.

Thanks for the pointers!