Set-Based Solution to this Problem?

  • Hello there! I've got an interesting situation I'm handling at present; I've developed a means of completing the task at hand, but it requires a WHILE loop at present, and I was wondering if I could remove the loop. Performance is good; for the use cases the procedure will run in, it completed in a few seconds, which is acceptable. However, should the use case scale up, I'd like to keep it from bogging down, if I can.

    Basically, I've got a lot of databases (100+), all with identical tables and table structures, and a PHP front-end where users will select projects from a checklist, and pass the project names to SQL Server, where a corresponding database name is picked up and a short UPDATE is run. So, it ends up like this:

    CREATE TABLE #Test(ProjectName varchar(75), DBName varchar(75))

    INSERT INTO #Test(ProjectName, DBName)

    VALUES

    ('Project1','DB1'),

    ('Project2','DB2'),

    ('Project3','DB3')

    The ProjectName is used to JOIN the #Test table to another table that contains the project names and their associated database name, and retrieves the DBName for each project, but I'm just supplying dummy DBNames here. From there, the update goes like so:

    DECLARE @DBName varchar(75)

    DECLARE @sqlstatement varchar(1000)

    DECLARE @Counter int

    SELECT @Counter = (SELECT COUNT(DBName) FROM #Test)

    WHILE @Counter > 0

    BEGIN

    SELECT @DBName = (SELECT TOP 1 DBName FROM #Test ORDER BY DBName ASC)

    SELECT @sqlstatement = 'UPDATE [' +@DBName+ '].dbo.tablename SET ...'

    EXEC(@sqlstatement)

    DELETE FROM #Test WHERE DBName = @DBName

    SELECT @Counter = @Counter - 1

    END

    As stated, this works out nicely, and does what it needs to do quickly enough; however, is there any way to abolish the WHILE loop in there? I'm inclined to think there isn't, since this has to go across a subset of databases and doing so almost always requires a loop, but I could very well be wrong. Thanks in advance for any help provided here!

    - 😀

  • First thing I would look into is to move the EXEC out of the loop. Make your @sqlstatement to be the complete update string...dont execute for each iteration....just build the string for each iteration... and then in the end outside the loop, execute the @sqlstatement ...so something like this

    WHILE @Counter > 0

    BEGIN

    SELECT @DBName = (SELECT TOP 1 DBName FROM #Test ORDER BY DBName ASC)

    SELECT @sqlstatement = 'UPDATE [' +@DBName+ '].dbo.tablename SET ...'

    --PRINT(@sqlstatement)

    DELETE FROM #Test WHERE DBName = @DBName

    SELECT @Counter = @Counter - 1

    END

    EXEC @sqlStatement

    Also, I would get rid of While loop and use something like this or other forms of row concatenation.

    SET @sqlStatement = ''

    SELECT @sqlstatement = @sqlstatement + 'UPDATE [' + T.DBName + '].dbo.tablename SET ...' + CHAR(10)

    FROM #test T

    EXEC @sqlStatement

    Verify if this other approach is better than your while loop though. One advantage of the WHILE LOOP I can think of is, you can trigger your UPDATE execution for every N iterations...helping you keep your transaction sizes manageable. Lets say you are running 150 update statements, you can trigger your execution for every 25 updates, instead of all 150 at once, which might be a bad thing in certain cases.

    Finally, I make a habit of using EXEC sp_executesql instead of EXEC. Keep in mind though, sp_executesql requires NCHAR or NVARCHAR datatype as it input.

    Other than this, I am at a loss to see how to make this "more" set-based. Hopefully, this might alleviate some problems.

  • Aha, I knew there had to be some way to abolish the loop :w00t:. I'm offsite from where the procedure needs to be run, but next week I'll be able to do some comparison testing to see if this method is better. I don't believe there should be too many databases to run this on, perhaps a max of 20, with a total row count for the updates being around 2,000 to 3,000, but I don't know the extent to which this procedure might need to scale, so it's better to build for that sooner rather than later.

    Also, I'll change the EXEC to EXEC sp_executesql; I think my brain just decided to stop working when I originally wrote this for some reason. I know sp_executesql has fewer avenues for potential issues (though the parameters used for this procedure are all generated and retrieved from the database, with no user alterations possible), but I used a plain EXEC instead for some silly reason.

    Thanks for the suggestions! I'll be sure to do some testing and see how both approaches play out.

    - 😀

  • One thing which I can see holding this solution back as volumes increase is that the multiple EXECs are essentially single-threaded - they run sequentially.

    A better-performing solution may be to fire off all the queries in parallel - or at least to allow some degree of parallelism.

    Unfortunately, I can think of no easy way of achieving this! sp_start_job came to mind initially, but as it's impossible to have multiple instances of SQL Agent jobs, this is not easy. Maybe you could have a few very similar SQL Agent jobs (Update1, Update2 etc) and fire these off in parallel.

    If your SSIS skills are up to scratch, this would be another possible solution.

    If you haven't even tried to resolve your issue, please don't expect the hard-working volunteers here to waste their time providing links to answers which you could easily have found yourself.

  • My question is this:

    Why would you be using, or even have over 100 databases with all the exact same objects. This sounds like you have a database that is storing information for each distinct project. Now that would be totally insane.

    Please share with us why one would have have multiple databases with the exact same objects.

    Andrew SQLDBA

  • Haha, it's a vendor-based solution :-D. The company I'll be working for shortly is using it for its core operations, and it gets the job done well enough. However, there's all manner of issues with it that could cause problems down the road, like improper indexing on each database, and problems on the front-end, such as the one that necessitated the creation of this procedure (it deactivates parts of a project that are unusable; without this, a front-end user has to deactivate those parts one at a time. Usually, there's anywhere between 100 and 1000 parts to deactivate. It can get time-consuming!).

    The company is looking to move away from this solution at some point, or to engineer a means of making things more manageable; granted, I'm just starting with them, so I don't know the full extent of how things are running there, but the general consensus is that it meets current demand passably, though improvements would be much appreciated.

    - 😀

  • Phil Parkin (4/24/2013)


    One thing which I can see holding this solution back as volumes increase is that the multiple EXECs are essentially single-threaded - they run sequentially.

    A better-performing solution may be to fire off all the queries in parallel - or at least to allow some degree of parallelism.

    Unfortunately, I can think of no easy way of achieving this! sp_start_job came to mind initially, but as it's impossible to have multiple instances of SQL Agent jobs, this is not easy. Maybe you could have a few very similar SQL Agent jobs (Update1, Update2 etc) and fire these off in parallel.

    If your SSIS skills are up to scratch, this would be another possible solution.

    Actually, you could create unique jobs dynamically, start them, and if completed successfully have the job delete itself.

Viewing 7 posts - 1 through 6 (of 6 total)

You must be logged in to reply to this topic. Login to reply