Making Process Changes

  • Dropping this here as I have no peer where I work that really understands SQL so looking for some feedback in making some changes to a procedure the "fixes" a TableOfContents table in our application because things happen and it can get squirrely.

    Quick background (do not judge me by the this database, it was designed by developers in an actual war zone with missiles and mortars going off at random times with no real database professional to assist).
    Not going into a lot of detail so ask questions if needed.  This is for an UPDATE process where the reports exist in both the report tables and TableOfContents.

    The TableOfContents table has key data from each report stored in the database, these reports being stored in tables designed for each report.  The primary key for each report is a GUID stored as a varchar(36) string.
    The TableOfContents table has a primary key, also a GUID stored as a varchar(36) string, as well as the ReportKey of each published report (draft reports are not listed in the table of contents).
    You may already see the issue (and I don't mean the issue of using GUIDs), if I am batching the reports for processing does it make sense sort each batch when selecting the batch with a TOP clause by the ReportKey of the report or by the TableOfContentsKey for that report?  As I think about it, I actually find myself leaning to the latter to try and keep the batches as close together as possible in logical order since the primary keys also happen to be the clustered indexes (yes, another problem that I am fully aware).
    I am already looking at some changes to how data is queried that should allow me to accomplish the latter.

    If none of this makes sense, that is okay.  I am just trying to make sure I am thinking straight while trying to make sure things process as efficiently as possible.  This is where it would be nice to have another SQL Server professional around to discuss these types of things.

  • Lynn, why do you need to order the reports at all? Can't you just process them in whatever order they are returned from the query?

    (Perhaps this is a weird question based on your full process, but after reading your message I do not feel I already understand the bigger picture)

    Also, could you post DDL and sample data (real or mockup) of the tables involved - limiting yourself to just two or three reports for illustration sake is probably okay - so that I can get a better picture of what you're trying to do?


    Hugo Kornelis, SQL Server/Data Platform MVP (2006-2016)
    Visit my SQL Server blog: https://sqlserverfast.com/blog/
    SQL Server Execution Plan Reference: https://sqlserverfast.com/epr/

  • The actual reports are stored in these tables? Or the report data? Or both?

    Can the TableOfContents have more than one report key to the same TableOfContents key (one-to-many)? Or is it a one-to-one relationship?

    I assume the reports are being batched to keep processing time to a specific time window and to prevent the CPUs or printers from getting overloaded. Is there a way to tell report size from the data you have? I would almost (personal preference here) want to batch the smaller reports together so they do all their stuff quickly, then separate out the larger reports so they can have their own personal timeslot without interfering with each other.

    Brandie Tarvin, MCITP Database AdministratorLiveJournal Blog: http://brandietarvin.livejournal.com/[/url]On LinkedIn!, Google+, and Twitter.Freelance Writer: ShadowrunLatchkeys: Nevermore, Latchkeys: The Bootleg War, and Latchkeys: Roscoes in the Night are now available on Nook and Kindle.

  • To answer the questions, in no particular order.
    TableOfContentsKey to ReportKey is one to one.  Yes, doesn't quite make sense why there is a TableOfContentsKey but there is.  This is one reason it is possible to get a duplicate row of a data for the same ReportKey.  That is part of my cleanup process that this stored procedure is implementing.

    Currently the batches are basically random for a specific report type which means each batch of updates could potentially touch reports in every part of the table each time a new batch is processed.
    We are batching the updates to the TableOfContents table to minimize the effects of block on users.  I even have a short delay coded between batches to ensure other processes have access to tables that may be used by this process.
    I have also set the deadlock priority of this process to -10 to ensure if it is part of a deadlock that it is the victim not a user process, and have coded it to retry to current transaction after a slight delay.

    What I was think is that if I ordered the data I was selecting the batch from by the TableOfContentsKey that I would be updating data that was logically close to each other thereby minimizing the time it takes to update that batch of report data in the TableOfContents table rather than data potentially spread out throughout the table due to the nature of the indexing used in this database.
    As an aside, I understand the reason for using guids as this data is replicated between numerous sites on the same network as well as other networks (think different levels of classification and access and making use of Guard systems between networks.
    It will take a bit to get the schema, but making up data will be a bit more difficult as the really data I can't provide.

  • Given your reply, it makes sense to batch reports that are logically close to each other, but I would still look at (if you can) report size and use that as a factor in deciding your batches as well.

    Am I hearing correctly that these reports update data? Or are the reports themselves getting updated with data every time new data is entered? I can't quite tell from the way you explained it.

    If the later, do they have to run each time the data gets updated or can there be delays on some reports so that you can batch your processing in a more controlled manner?

    Brandie Tarvin, MCITP Database AdministratorLiveJournal Blog: http://brandietarvin.livejournal.com/[/url]On LinkedIn!, Google+, and Twitter.Freelance Writer: ShadowrunLatchkeys: Nevermore, Latchkeys: The Bootleg War, and Latchkeys: Roscoes in the Night are now available on Nook and Kindle.

  • Not sure what you mean by report size.  The data for each report is stored in either the main report table or one of numerous sub-tables for a specific report type.  For the table TableOfContents the only report tables used are the main table and the location sub-table, and only a small set of the columns of each.  Each of these tables are fairly wide, but pulling the data from the report tables isn't the issue.  The issue is trying to minimize the time it takes to update the TableOfContents when data has changed in the source report but for what ever reason the TableOfContents does not reflect those changes, which is what this stored procedure will be correcting.

    It looks like my direction is working but now I am finding another issue that looks like there could be some data change issues that are causing some infinite loop issues because updated data says it still needs to be updated.  I have fix in-place to handle that but I need to identify the underlying root cause.

    I wish we could move to SQL Server 2016 but I don't see that happening due to cost.

Viewing 6 posts - 1 through 6 (of 6 total)

You must be logged in to reply to this topic. Login to reply