MSDB growth on sysjobstepslogs due to whitespace

  • Has anyone encountered the issue where you; have a job, the job step advanced options are set to log to table, the job step runs, and in the step executes a select that logs returned data... however, in that select there is a varchar data type and what is then recorded into MSDB..sysjobstepslogs is padded as though it was selecting a char data type?

    To reproduce, I create a job, create a step on the job put in the below code, ran the step, and checked the table for log data growth.

    set ansi_padding off

    set nocount on

    declare @varcharmax varchar(max), @counter int

    set @counter = 1

    while @counter < 100

    begin

    set @varcharmax = 'moo'

    select rtrim(@varcharmax)

    set @counter = @counter + 1

    end

    Checking data amount that was stored:

    select

    log_size,

    sl.date_created,

    sl.date_modified,

    log_id,

    name,

    step_name,

    step_id,

    database_name,

    last_run_date,

    last_run_time,

    last_run_duration,

    description

    from

    sysjobstepslogs sl

    join sysjobsteps ss on sl.step_uid = ss.step_uid

    join sysjobs sj on sj.job_id = ss.job_id

    order by 1 desc

    The log size or len(log) data returned a massive amount of whitespace. I have been unable to search and find answers on how to limit this padding amount. My MSDB database is growing very quickly due to many jobs logging to table and the space is all padded in as though I wanted to see it and used a char data type in the select. Putting settings for padding and explicit trimming do not seem to help.

  • I see a few references to http://social.msdn.microsoft.com/forums/en-US/sqldatabaseengine/thread/ac5e64f7-01ee-4412-8482-9468b1e99bf4/ but no solution. If I truncate the table that keeps growing, would that solve the problem?

    I do not know how to hunt for ghosts. Curious if anyone has run into this before and hunted ghosts.

  • In case this comes up for anyone else, truncate to the table msdb..sysjobstepslogs will clean up the space. Just was able to free up 36 GB after doing a quick select into to a logs temp table. Only had 113 rows, but a lot of "hidden" space that was not getting reclaimed properly. Have not found the root of the problem, other than mentioned by the post linked to, but the solution I did not need to rebuild the MSDB database as some other findings stated to do.

    Bam. Dun. Now just need to find the root.

  • matt.newman (6/22/2011)


    In case this comes up for anyone else, truncate to the table msdb..sysjobstepslogs will clean up the space. Just was able to free up 36 GB after doing a quick select into to a logs temp table. Only had 113 rows, but a lot of "hidden" space that was not getting reclaimed properly. Have not found the root of the problem, other than mentioned by the post linked to, but the solution I did not need to rebuild the MSDB database as some other findings stated to do.

    Bam. Dun. Now just need to find the root.

    Can you check if that table has a clustered index?.. nothing else comming to mind atm.

  • It does, and it was pretty trashed at the time. Though, it was on an int that supposedly only had 113 rows. I saw some other postings that rebuilding the clustered index did not have a successful result. Due to size held, I did not want to rebuild and lock jobs from executing, causing a pile up on resources. If I locked that table and has 23 jobs running simultaneously that wanted to log, more problems. The truncate after backup was the cleanest for this table from what the reported issue was.

  • Did you consider doing this sequence?

    select * from base into ##temp

    truncate base

    insert into base select * from ##temp

    Might have solved it for you.

  • Not sure what "base" refers to here, but I did

    select * into [|TempStorageTableInCaseNeededLater|] from sysjobstepslogs

    truncate sysjobstepslogs

    jobstepslogs was then already refreshed soon after due to job processes running (within 10 seconds new records were present). I would not "correctly" be able to insert back into without some time and effort of which I need more time for other things. As was the issue explained in the above link regarding non-reclaimed space from the table, Snapshot Isolation, if this was the issue, was retaining records due to timestamp validation rules. It appears the rule was broken sometime late last month. I had done an audit to remove varhcar(max) data from being logged to the table as it appears that data is not stored well at all (whitespace). Then, I did an assement of growth in mail queue and attachements. Although there was growth from early April, not enough to merit the space used for msdb. Then, and audit of sysjobstepslogs where I did a len of log * 8 / 1024 to get data size from all records using sum. Then a sp_spaceused as well as a couple other audit checks because the numbers were not adding up, then Google the link listed above, then the decision was truncate and hope it works based on what the issue was. I have multiple jobs running at 1 minute intervals for various tasks. To complete the operation you suggessted I would need to stop SQL Agent serverices. Although feesable, It was best to just do a temp store of the data elsewhere, truncate, and let the job queue resume as needed.

    There was a combination of problems leading to msdb growth that I needed resolved. New jobs doing new things, logging varchar(max) to sysjobstepslogs logged it with massive whitespace, then the log records when they were supposed to be deleted for the next log because append was not used did not reclaim space. Isolation on, I needed to flush the table. Everything is clean now and I have 34 GB padding. I will be putting an audit check in place nightly to verify log data size and compare to freespace to the table itself.

    Long winded, I was never really good at online explainations. Sorry. I think we're on the same page, but I had a few more things going on. I just listed them all out though I believe you have an understanding in case someone else runs into any of the issues will be able to search and find what I had found and hopefully get a solution in place using the above.

    Thanks!

  • Looks like this may go into a recurring issue for me and did not resolve the final underlaying problem.

    select sum(len(log) * 8 / 1024) from sysjobstepslogs

    go

    sp_spaceused sysjobstepslogs

    A rough comparison on these two results should show a growth in line with one another, but one number has leveled off while the other is showing an increase. Not sure why that space would not release but will keep tabs and any findings for later.

  • Switch to datalength instead of len so that it includes trailing spaces.

    Also why the *8 / 1024? It's the 8 that I don't understand.

    sp_spaceused returns kbs so to covert bytes to kb all you need is / 1024.

  • Oops... len * 2... 2 bytes per character.

    Thanks for the catch! Looks like the issue then is most definitely recurring! I will have to look into turning snapshot isolation off, though don't yet know the costs of that. Don't want to have to rebuild MSDB, may need to look at restarting the agent service.

  • matt.newman (6/23/2011)


    Oops... len * 2... 2 bytes per character.

    Thanks for the catch! Looks like the issue then is most definitely recurring! I will have to look into turning snapshot isolation off, though don't yet know the costs of that. Don't want to have to rebuild MSDB, may need to look at restarting the agent service.

    I don't see why snapshot isolation would affect this. The only that makes sense is a lob that gets constantly updated. Then IIIRC you have to clean the table with rebuild index or dbcc cleantable (not sure about that part).

    If you say you only have 113 rows, then what's the space used by the temp table where you dumped the records to?

  • P.S. datalenght will give you the correct size with trailing spaces and datatype. This is what I'd use to make sure I have the correct value there.

  • Temp Table: 113 rows

    namerowsreserveddataindex_sizeunused

    TMPsysJobStepsLogs062211113 219064 KB219000 KB16 KB48 KB

    sum datalength = 214144

    Prod Table: 86 rows

    namerowsreserveddataindex_sizeunused

    sysjobstepslogs86 917344 KB916936 KB24 KB384 KB

    sum datalength = 179032

    Snapshot isolation was mentioned in: http://social.msdn.microsoft.com/forums/en-US/sqldatabaseengine/thread/ac5e64f7-01ee-4412-8482-9468b1e99bf4/

  • Wow you learn something new everyday!

    Do you still have the temp table laying around?

    You could try all the usual tricks (drop / rebuild clustered index, lob compaction on, etc).

    If that works you could create another job that does this little dance maybe once a week (select into, repair, truncate live table and reinsert).

    If the final table only has a couplds mbs the whole thing "should" run in only a few seconds...

  • If I can't find an online solution my plan is try and select top 0 * into sysjobstepslogs2 from sysjobstepslogs, mod the table to reflect correct constraint on pkey, create clustered on pkey, stop the agent service, rename old table filling up to old suffix, rename new table to drop 2 suffix, and restart the service. If the issue is still apparent after that, I will need to know what turning off snapshot isolation will provide for benefit. Then, need to later assess if having jobs running at 1 minute aprt is attributing, and if all else fails, schedule something to truncate the table nightly since it does reclaim the space. Sadly it's all shoot from the hip stuff with no real solution found.

    Thanks for the pointers!

Viewing 15 posts - 1 through 15 (of 26 total)

You must be logged in to reply to this topic. Login to reply