Sanity Check of (possibly) over-complicated query.

  • Background:

    We have a Document Management System which currently has 10 file servers configured in the DMS Database table DOCSERVERS.

    Relevant Columns:

    CREATE TABLE [MHGROUP].[DOCSERVERS](
    [DOCSERVER] [NVARCHAR](32) NOT NULL, --internal DMS fileserver name reference
    [OS] [NVARCHAR](32) NOT NULL,
    [LOCATION] [NVARCHAR](254) NOT NULL, --\\hostname\share
    [ROOTPATH] [NVARCHAR](254) NULL --\root dir
    )

    The user table DOCUSERS stores all user data, including their default document server.

    Relevant columns:

    CREATE TABLE [MHGROUP].[DOCUSERS](
    [USERID] [NVARCHAR](64) NOT NULL,
    [DOCSERVER] [NVARCHAR](32) NULL
    )

    All users always have the same default server configured, but this is changed (currently manually) based on doc server with most free space. Any document accessed or saved is moved or saved to the users' default doc server.

    Documents are archived based on age, freeing space semi-randomly across all 10 servers. An admin manually sets all users' default server based on weekly filesystem capacity reports.

    As this admin has left, and not been replaced, and the capacity report system is now managed externally, I have developed a fully automated method, which runs in about a second to harvest the free space of all the docservers into a utility table using a powershell script, and a bulk insert, then update DOCUSERS.DOCSERVER, setting this to the server with most free space.

    Utility Table definition:

    CREATE TABLE [dbo].[iManageDocServer](
    [docServer] [NVARCHAR](32) NOT NULL,
    [hostName] [NVARCHAR](15) NOT NULL,
    [freeSpace] [DECIMAL](6, 2) NULL,
    [diskSize] [DECIMAL](6, 2) NULL
    )

    --Sample Data
    INSERT INTO dbo.iManageDocServer (
    docServer
    , hostName
    , freeSpace
    , diskSize
    )

    VALUES
    ( N'DEFSERVER', N'ServerV07', 476.95, 2000.00 ),
    ( N'DEFSERVER10', N'ServerF007', 270.26, 1843.00 ),
    ( N'DEFSERVER2', N'ServerV08', 482.84, 2000.00 ),
    ( N'DEFSERVER3', N'ServerV09', 398.64, 2048.00 ),
    ( N'DEFSERVER4', N'ServerV01', 378.68, 2048.00 ),
    ( N'DEFSERVER5', N'ServerV02', 381.85, 2048.00 ),
    ( N'DEFSERVER6', N'ServerV03', 364.99, 2150.00 ),
    ( N'DEFSERVER7', N'ServerV04', 358.69, 2048.00 ),
    ( N'DEFSERVER8', N'ServerV05', 385.38, 2048.00 ),
    ( N'DEFSERVER9', N'ServerV06', 358.90, 2047.00 )

    All good so far, works a treat. However, I wanted a more controlled way to manage the switch of the users default server. As the free space balances out, I do not want the server changing on every run of the job, just because one server has a few MBs more free space than the current one.

    I could reduce the frequency of the run from daily to weekly/monthly, but that's a bit arbitrary, but prefer instead to ALTER the update sProc to set a threshold, such that the server selected to switch to must have at least 2% more free space than the current one to make it worth the switch.

    I wanted to this as set-based as possible, even though performance is not really an issue, we are only talking 10 rows, one per server, in the calculation.

    That's my long winded way pre-amble. I came up with this to alter the query in the current sProc which UPDATEs DOCUSERS. I just feel it's too convoluted and I am missing a far easier technique. Juggling 1001 other tasks, I think I may have clouded judgement.

    Basically, I've used the LEAD and RANK windowing functions to be able to compare the currently used server with server with the next most free space. As the current server fills, it moves from rank 1 to rank 2, and the next candidate server takes up rank 1, even if only a few kB more free space.

    @CandidateServer is NULL until the candidate server at rank 1 has more than 2% more free space than rank 2, so will update 0 rows.

    When the candidate server at rank 1 has more than 2% more free space than rank2, @CandidateServer is populated and DOCUSERS updates.

    Can anyone see glaring errors, or a stupidly simple technique I have missed for comparing ColA, RowA to ColA, RowB please?

    DECLARE @CandidateServer AS NVARCHAR(32)

    WITH cte AS(
    SELECT
    x.docServer
    ,x.PercentFree
    ,x.lg
    ,x.rnk
    FROM
    (
    SELECT
    imds.docServer
    ,imds.hostName
    ,imds.freeSpace
    ,imds.diskSize
    ,imds.freeSpace/imds.diskSize * 100 AS PercentFree
    ,LEAD(imds.freeSpace/imds.diskSize * 100, 1) OVER (PARTITION BY NULL ORDER BY (imds.freeSpace/imds.diskSize) DESC) AS lg
    ,DENSE_RANK() OVER (PARTITION BY NULL ORDER BY (imds.freeSpace/imds.diskSize) DESC) AS rnk
    FROM
    UtilityDB.dbo.iManageDocServer AS imds
    ) AS x
    )

    SELECT
    @CandidateServer = cte.docServer
    FROM cte
    WHERE cte.rnk = 1
    AND cte.PercentFree > (cte.lg + cte.lg * 0.02)


    UPDATE MHGROUP.DOCUSERS
    SET
    DOCSERVER = @candidateServer
    WHERE
    (
    DOCSERVER <> @candidateServer
    OR
    DOCSERVER IS NULL
    )
    AND @CandidateServer IS NOT NULL

    Many thanks, if you read this far!

    (Edited, I got my LEAD and LAG the wrong way round!)

    "Knowledge is of two kinds. We know a subject ourselves, or we know where we can find information upon it. When we enquire into any subject, the first thing we have to do is to know what books have treated of it. This leads us to look at catalogues, and at the backs of books in libraries."
    — Samuel Johnson
    I wonder, would the great Samuel Johnson have replaced that with "GIYF" now?

  • David, how about just having your powershell script round the freespace to the nearest 10 (or whatever you want as a reasonable switching threshold)?

    For example if there were two servers with this free space:

    ServerV07, 476.95

    ServerV08, 482.84

    With the Default currently pointed to ServerV07 If your script rounds them to the nearest 10 like this:

    ServerV07, 480.00

    ServerV08, 480.00

    Then a switch to ServerVo8 would not occur until free space on ServerV07 got down to 474.99 (470.00)

  • kcecil,

    Thank you for taking the time to wade through that and reply.

    That nice simple approach to the problem as a whole has merit, if I'd thought of that at an earlier stage, I would probably have gone down that route, or similar 🙂

    The precise data in the utility table is useful however, there is a scheduled SSRS report sent containing disk space which also uses this table, but I guess a rounding in the query on the free space percent-free space might could work.

    I guess I was posting more to check my SQL method more than anything, having since seen a similar post and replies on this forum, from a few days earlier, about comparing row data, I'm a little happier that what I have done is not too awful a kludge 🙂

    "Knowledge is of two kinds. We know a subject ourselves, or we know where we can find information upon it. When we enquire into any subject, the first thing we have to do is to know what books have treated of it. This leads us to look at catalogues, and at the backs of books in libraries."
    — Samuel Johnson
    I wonder, would the great Samuel Johnson have replaced that with "GIYF" now?

Viewing 3 posts - 1 through 2 (of 2 total)

You must be logged in to reply to this topic. Login to reply