Administrating Analysts

  • bob.pearman

    SSC-Addicted

    Points: 414

    Hi Guys,

    I was just wondering if any of you guys would be willing to share your experience or have any advice on how to manage a situation we are seeing.

    So we have a back end data warehouse, fairly standard setup with an overnight load and SSRS reports etc. We also have a bunch of analysts around the building who write queries against it and often save the query into Excel and then send the spreadsheet around for people to refresh the data.

    The problem comes in that the t-SQL that is written sometimes is not the best and can occasionally lead to slowing down the server and impacting the live reports. Especially if the load is still running etc.

    Obviously there is an education and training side to this. But in my mind there is also a "service protection" side to it to. I don't think it's an option to completely block this behavior, people are running these things because they need the information. But on the other hand it's not acceptable for poorly written code to impact service like this.

    I am curious what other people have done to protect their service as i'm sure this isn't a unique situation.

    Ideas could range from having a system that kills long running queries, blocking queries coming from Excel during critical times or even intercepting queries to estimate the cost and blocking based on high estimated cost?

    Ideas, tips, experiences would be appreciated.

    Thanks

     

     

     

  • Site Owners

    SSC Guru

    Points: 80373

    Thanks for posting your issue and hopefully someone will answer soon.

    This is an automated bump to increase visibility of your question.

  • Mr. Brian Gale

    SSC-Insane

    Points: 23075

    At my workplace, end users are not allowed to connect to the LIVE systems via excel.  We don't have any specific rules restricting this, but we advise the analysts not to do it.

    If they NEED data, we build up a report in SSRS and let them use that.  If they need to do analytics on data, they can pull the data from the SSRS report that was created forthem to use.  It is a bit more restrictive, but it allows the development and administration teams help build efficient queries for the reports.

    In the event that our reports don't provide the proper data, we will work with the analysts to determine what they need and work with them to build a report (or multiple reports) that are efficient and suits their needs.

    We strongly encourage everyone to NOT send spreadsheets around to each other (especially ones that can pull "fresh" data) to each other and should have a centralized location for the excel files.  Why you ask?  Lets say you send the spreadsheet to 3 people.  Those 3 people send it to 3 0thers who in turn send it to 3 others each.  You now have 40 people with that document who could all be hitting the database at the same time to get their report data.  It brings up 2 problems.  First, the performance hit.  If that document spreads around too much and a lot of people refresh it at the same time, they are going to be hammering the servers.  The second problem, lets say that the document has been  shared via email to 1000 people total.  You are aware of 10 out of those 1000 people who use it.  You make a change to the document to fix a bad calculation and now there are 2 versions floating around.  The other fun thing that can happen is Analyst A makes the Excel file by pulling in a lot of data sources and ends up having a lot of advanced TSQL and Excel magic in there.  They understand the file inside and out, but nobody else wants to make changes because it is very complicated.  Analyst A quits and Analyst B gets tasked with fixing a new found bug but has no idea how it is working.  They need to spend a bunch of time debugging and decoding the document and Excel can hide a lot of things from you (calculated cells/columns, VBA, hidden cells/worksheets, etc) so troubleshooting it may be non-trivial.

    Now take a different approach where you have a single SSRS report which is verified and validated.  You export the results on a schedule to an Excel file on a network share.  Having multiple people open it at once is fine (first copy will be read-write, the rest will be read-only), there is no "data refresh" so no hammering of the database.  And  if changes are needed to be made to the excel file, they are changed at the SSRS report level and next export will be accurate and valid data.

    Having a data analyst build something up in Excel as a prototype against a test system we have no issues with.  But anything that is going to be going out to be used by the company needs to be controlled and supportable.

    And if it is in SSRS and someone complains that the Excel file is wrong after they do some calculations, they can re-export the data from the SSRS server.

     

    Another approach would be that if the analysts keep hammering the server and bringing it to  its knees, they could get a read-only replica of the data warehouse.  Then while they are doing the analysis, they are only hurting other analysts and not the entire company.

     

    To summarize the above - Our company does it with simple company wide policies.  We don't put systems in place to prevent certain things from happening, we put rules in place and employees follow them to avoid getting the dreaded call from the DBA team asking what they are doing that is bringing the system down.  We have monitoring in place for long running transactions and for blocking/deadlocks so if someone starts blocking for a long period of time, they may get a phone call, or (depending on what is being done and how bad it is hurting things) they may just get their SPID killed followed up by an email.

  • Jeff Moden

    SSC Guru

    Points: 996843

    There's several ways to skin this cat... Write good stored procedures and distribute "refreshable" spreadsheets that call them or pre-aggregate the data as tab separated information and let the spreadsheet users import that or write some good reports that create the spreadsheets that get distributed.

    You should also setup some monitoring to capture people's queries so that you can see the mistakes they're making in code and give them a hand in fixing them.

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.
    "Change is inevitable... change for the better is not".
    "If "pre-optimization" is the root of all evil, then what does the resulting no optimization lead to?"

    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)

Viewing 4 posts - 1 through 4 (of 4 total)

You must be logged in to reply to this topic. Login to reply