RE: Multiple backup at the same time

SSC Guru

Points: 1003863

March 16, 2014 at 1:40 pm

obtllc (3/15/2014)
If I have 30 players, it will simply be 30/5 = 6. Simple.
The teams are grouped as follows:
Best player (largest db) - group 1
Second best (second largest db) - group 2
Third best (third largest db) – group 3
Fourth best (fourth largest db) – group 4
Fifth best (fifth largest db) – group 5
Sixth best (sixth largest db) group 6
Seventh best (seventh largest db) group 1
Etc, etc.

That will mean that Group 1 will always take the longest and you lose out on the benefit of having multiple backups running at the same time because Groups 2 thru 6 will finish before Group 1, Groups 3 thru 6 will finish before Group 2, etc, etc.

As a simple example, lets say that you had just 6 databases and they all varied in size by just 1. According to your algorithm and with 3 simultaneous backups running, you would have the following...

Size Group

6 1

5 2

4 3

3 1

2 2

1 3

If we aggregate the sizes for each group, we get...

Size Group

6+3 = 9 1

5+2 = 7 2

4+1 = 5 3

If we said that each unit of value were 15 minutes worth of backup time, that means that Group 1 would finish in 2:15 hours. Group 2 would finish in 1:45 and sit idle for 0:30 compared to Group 1. Group 3 would finish in 1:15 and sit idle for 1:00 compared to group 1.

What you want to happen is the following where all of the groups have a balance load and finish as quickly as possible (1:45 for all groups instead of 2:15 for the longest group):

Size Group

6+1 = 7 1

5+2 = 7 2

4+3 = 7 3

That's also known as a "Load Balancing" or "Bin Stacking" problem and it requires a loop of some sort to do it's job.

Here's a simple bit of code for the example above.

DROP TABLE #MyHead,#Accumulator

DECLARE @DBCount INT

,@Backups INT

;

SELECT @Backups = 3 --Number of simultaneous backups

;

--===== Get the database names and sizes and remember them in descending size order.

-- Obviously, this is just test data. You'd have to change this to look at

-- something like sys.master_files.

SELECT N = IDENTITY(INT,1,1), GB = Number, DBName = 'Database'+RIGHT(Number+1000,3)

INTO #MyHead

FROM master.dbo.spt_Values t

WHERE t.Number BETWEEN 1 AND 6

AND t.Type = 'P'

ORDER BY GB DESC --Sorting by descending size is critical for this to work.

;

--===== Remember the number of databases to backup

SELECT @DBCount = @@ROWCOUNT

;

--===== Create our backup control bins

SELECT TOP (@Backups)

Bin = IDENTITY(INT,1,1)

,GB = 0

,DBNames = CAST(NULL AS VARCHAR(MAX))

INTO #Accumulator

FROM master.sys.all_columns ac1

;

--===== Presets for the loop

DECLARE @Counter INT;

SELECT @Counter = 1;

--===== Stack the bins so "weigh" as the same as closely as possible by

-- always assigning the current database, which has been sorted by

-- size in descending order, to the bin with the least amount of

-- work to do.

WHILE @Counter <= @DBCount

BEGIN

--===== This takes whatever the current database size is, finds the bin

-- with the least amount of work to do (bin number breaks ties),

-- and assigns the database name to that bin in a CSV.

UPDATE a

SET a.GB = a.GB + mh.GB

,a.DBNames = ISNULL(a.DBNames +',','') + mh.DBName

FROM #MyHead mh

CROSS JOIN #Accumulator a

WHERE mh.N = @Counter

AND a.Bin IN (SELECT TOP 1 a1.Bin FROM #Accumulator a1 ORDER BY a1.GB ASC, a1.Bin ASC)

;

--===== I left this here just so you can see the bins fill in order.

-- Take this out for production

SELECT * FROM #Accumulator ORDER BY Bin

;

--===== Bump the counter

SET @Counter = @Counter+1

;

END

;

--===== This displays the worklist by bin and process order (ItemNumber) and could be converted to

-- dynamic SQL backup commands to power just the number of jobs that you want to have running.

SELECT a.Bin, TotalBinSize = a.GB, ca.ItemNumber, ca.Item

FROM #Accumulator a

CROSS APPLY dbo.DelimitedSplit8K(a.DBNames,',')ca

;

Here's the output for above. I manually added a space between backup jobs just for clarity sake.

Bin TotalBinSize ItemNumber Item

--- ------------ ---------- -----------

1 7 1 Database006

1 7 2 Database001

2 7 1 Database005

2 7 2 Database002

3 7 1 Database004

3 7 2 Database003

(6 row(s) affected)

This system works well for when you have a database or two that might be quite a bit larger than the rest. For example, if Database001 had a size of 10, we'd get the following balancing act because the "system" wouldn't give it any more to do because of it's size and would try to balance the load as evenly as possible.

Bin TotalBinSize ItemNumber Item

--- ------------ ---------- -----------

1 10 1 Database001

2 11 1 Database006

2 11 2 Database003

2 11 3 Database002

3 9 1 Database005

3 9 2 Database004

(6 row(s) affected)

In this case, it does make a bit of a bad guess because the combination of 6+4 and 5+3+2 would be better combinations to make them all come out to 10 and would require another "pass" to resolve that, but it's a whole lot better than 10+4, 6+3, 5+2 or 14, 9, 7.

--Jeff Moden

RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
First step towards the paradigm shift of writing Set Based code:
________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

Change is inevitable... Change for the better is not.

Helpful Links:
How to post code problems
How to Post Performance Problems
Create a Tally Function (fnTally)