A scenario where SQL Server doesn't seem to help

  • Hi

    We have a scenario, where busniess users are presented with a screen with 20+ fields. They can choose any number of fileds to build a dynamic SQL query in a proc. The query that is formed always have a SELECT,WHERE,GROUP BY and ORDER BY clause. They can pick any number of columns for any of these clauses. The table has some 3.5 million records from which this data is fetched. Now lot of queries timeout as I am in no position to create indexes that can support all of possible queries. I don't see a solution to this problem in SQL server.

    If anyone has some inputs, pls do share.

    Thanks

    Online Trainer For SQL DBA and Developer @RedBushTechnologies with 18 yrs exp.

  • Is this against a production database? Sounds like you need a Data Warehouse solution (OLAP).


    SELECT quote FROM brain WHERE original = 1
    0 rows returned

  • If your only problem is timeouts:

    1. Preheat your data cache by doing a SELECT * FROM whatever WHERE checksum(*) = 1

    2. Create an index before doing your SELECT and then drop it afterwards.

    3. Change the default timeout for queries.

    4. Only support the WHERE clause in the query, and do the COLUMNS, GROUP BY, and ORDER BY in the application instead of the database.

    5. Require one column in the WHERE clause that is indexed such as a date column.

  • Or SSAS and Excel?

  • S_Kumar_S (11/28/2016)


    Hi

    We have a scenario, where busniess users are presented with a screen with 20+ fields. They can choose any number of fileds to build a dynamic SQL query in a proc. The query that is formed always have a SELECT,WHERE,GROUP BY and ORDER BY clause. They can pick any number of columns for any of these clauses. The table has some 3.5 million records from which this data is fetched. Now lot of queries timeout as I am in no position to create indexes that can support all of possible queries. I don't see a solution to this problem in SQL server.

    If anyone has some inputs, pls do share.

    Thanks

    Sounds to me like one of those situations where "user defined" code generation becomes a problem, seen this far too many times.

    😎

    Questions: Is the total number of columns 20? Do these columns come from a well defined relational schema? Can you change the underlying code generation process? Can you post an example of the code generated? Does the "thing" support ad-hoc filters and sort order?

  • Without more details, it's hard to know what the right tool is for the job. The flexible part sounds a bit like it could be done in PowerPivot, but it's still impossible to say for sure. How about some more details? what kinds of things are the end users supposed to be able to do with the data?

    Maybe you should look at Rob Collie's book? see http://www.powerpivotpro.com

  • Hi Eirikur

    Let me try to answer your questions:

    Is the total number of columns 20?

    No, it is somewhere between 20 and 30. But the user has the flexibility to choose any combination of columns for any of the clauses.

    Do these columns come from a well defined relational schema?

    Yes, all these columns are in one single SQL Server table

    Can you change the underlying code generation process?

    No, it is as per business need. and the code ultimately prepares the clauses whoch are passed to a proc. they are then clubbed to form a dynamic SQL and executed using sp_executesql

    Does the "thing" support ad-hoc filters and sort order?

    Yes, user may select any column(s) out of 20+ columns for WHERE or ORDER BY clause

    I hope I am clear. Pls ask if something else needs to be explained.

    thx

    Eirikur Eiriksson (11/28/2016)


    S_Kumar_S (11/28/2016)


    Hi

    We have a scenario, where busniess users are presented with a screen with 20+ fields. They can choose any number of fileds to build a dynamic SQL query in a proc. The query that is formed always have a SELECT,WHERE,GROUP BY and ORDER BY clause. They can pick any number of columns for any of these clauses. The table has some 3.5 million records from which this data is fetched. Now lot of queries timeout as I am in no position to create indexes that can support all of possible queries. I don't see a solution to this problem in SQL server.

    If anyone has some inputs, pls do share.

    Thanks

    Sounds to me like one of those situations where "user defined" code generation becomes a problem, seen this far too many times.

    😎

    Questions: Is the total number of columns 20? Do these columns come from a well defined relational schema? Can you change the underlying code generation process? Can you post an example of the code generated? Does the "thing" support ad-hoc filters and sort order?

    Online Trainer For SQL DBA and Developer @RedBushTechnologies with 18 yrs exp.

  • Thanks Bill for giving some really nice alternatives. For point 3 , we have already increased timeout to some 60 secs. I will try point 1,2 and 4.

    POint 5 probably won't be possible.

    thanks again

    Bill Talada (11/28/2016)


    If your only problem is timeouts:

    1. Preheat your data cache by doing a SELECT * FROM whatever WHERE checksum(*) = 1

    2. Create an index before doing your SELECT and then drop it afterwards.

    3. Change the default timeout for queries.

    4. Only support the WHERE clause in the query, and do the COLUMNS, GROUP BY, and ORDER BY in the application instead of the database.

    5. Require one column in the WHERE clause that is indexed such as a date column.

    Online Trainer For SQL DBA and Developer @RedBushTechnologies with 18 yrs exp.

  • S_Kumar_S (11/28/2016)


    Thanks Bill for giving some really nice alternatives. For point 3 , we have already increased timeout to some 60 secs. I will try point 1,2 and 4.

    POint 5 probably won't be possible.

    thanks again

    Bill Talada (11/28/2016)


    If your only problem is timeouts:

    1. Preheat your data cache by doing a SELECT * FROM whatever WHERE checksum(*) = 1

    2. Create an index before doing your SELECT and then drop it afterwards.

    3. Change the default timeout for queries.

    4. Only support the WHERE clause in the query, and do the COLUMNS, GROUP BY, and ORDER BY in the application instead of the database.

    5. Require one column in the WHERE clause that is indexed such as a date column.

    A) Item 1 is not helpful and will just be load on the system for nothing. You are already running queries that do table scans so whatever will fit in RAM will already be there.

    B) Item 2 is FAR worse, because you have to do a table scan to get the data to create the index, then BUILD IT (sort in tempdb, IO/RAM/CPU BURN and a TABLE LOCK during all of this!!), THEN run the query, then DROP IT. Sorry, but that is just silly.

    C) Item 3 should have been done long ago.

    D) Item 4 is silly too. Do you really want to bundle up potentially 3.5 MILLION ROWS of 20 columns of data up into TDS packets, send them over the network and then have some now-CRUSHED system do grouping, aggregations, sorting, etc?

    Things that WILL HELP:

    E) BIGGER HARDWARE. 3.5M rows is NOT MUCH DATA unless the 20 cols are a bunch of blobs or massive character fields. RAM AND CPUs need to be chock full all the time. If the entire table doesn't fit in RAM (and RAM is 96GB or so meaning Standard Edition SQL Server 2012 gets all 64GB it can use) then you will need fast IO too. Tempdb will need to be ROBUST too since you may well be spilling work tables to there regularly.

    F) Upgrade to SQL Server 2014 or even better 2016 and SP1. With both you can access 128GB RAM in SE SQL Server. With 2016 SP1 you can now have COMPRESSED TABLES or even better since this seems to be a reporting system COLUMN STORE INDEXES (in standard edition!!)!!

    G) Use READ UNCOMMITTED isolation level so that no blocking occurs if data writes are happening at the same time as selects. Do note that this can lead to bad data.

    H) Hire a performance tuning professional to review your hardware/virtualization/SQL Server/database/application/etc config/settings/mx/indexing/code/etc.

    Best,
    Kevin G. Boles
    SQL Server Consultant
    SQL MVP 2007-2012
    TheSQLGuru on googles mail service

  • You seem to have missed the poster's single problem he mentioned which is query timeout. All my suggestions answer that one problem. If a pre-select is done in a separate query, then the actual query likely won't timeout.

    I do create indexes to use just before monthly reports. Afterwards the monthly indexes may be dropped.

    It all depends on which resources are stressed as to what he should do. I'm just giving him some options.

    Suggest some new things instead of criticizing others without knowing what you're talking about.

  • Bill Talada (11/28/2016)


    You seem to have missed the poster's single problem he mentioned which is query timeout. All my suggestions answer that one problem. If a pre-select is done in a separate query, then the actual query likely won't timeout.

    I do create indexes to use just before monthly reports. Afterwards the monthly indexes may be dropped.

    It all depends on which resources are stressed as to what he should do. I'm just giving him some options.

    Suggest some new things instead of criticizing others without knowing what you're talking about.

    I did suggest other things. And I most certainly do know what I am talking about.

    Plural users doing plural reports of unknown construction. There is a high probability of concurrent queries given the wording of the post. There is no reason to suspect that the data isn't in cache already under those circumstances. Even if they aren't concurrent it is highly likely.

    You create indexes before monthly reportS - plural. Then the benefit of doing all that work could be worthwhile if the index that was created was hit multiple times or if it allowed the optimizer a substantial benefit such as a merge join on a huge table. Otherwise it would still be a waste (on this single-table scenario the user has for certain). You also clearly have the ability to lock the table for the duration of the index builds and accept the overhead on the server for same. This is not a one-off situation like you mention.

    Do note the OP said

    in no position to create indexes that can support all of possible queries

    . That's another reason I said your suggestion was not appropriate. Also even if they WERE to do that, they would have to interrogate the query and then create ... exactly which indexes would you have them create for any given query? And what if 3 queries in a row needed indexes on 3 particular columns. You would need to code something to check for the index already existing. Sorry, but you gave a completely untenable solution for quite a few reasons as you can see.

    Best,
    Kevin G. Boles
    SQL Server Consultant
    SQL MVP 2007-2012
    TheSQLGuru on googles mail service

  • It would be helpful if you can post some of the dynamically generated SQL, ideally one with all 20+ columns included. In my experience, procedural developers don't write very good SQL. When they try to write it dynamically, it tends to be worse. Also, some DDL would be nice as well so we know more about the structure of the table this is running against.

    Does the dynamic SQL wildcard every column? (e.g. ColA LIKE '%' + @ColA + '%')

    Have you done any profiling to look for specific patterns? More than likely, there are just a few columns that are used in the majority of the queries. You may be able to gain some performance by creating an index to cover the most frequently used columns. Speeding up some of the queries may help the others by reducing system load.

    Does the table consist of more than the 20+ fields they can search by?

    Is this the only screen that uses this approach or is this problem spread throughout your application?

    Wes
    (A solid design is always preferable to a creative workaround)

  • whenriksen (11/30/2016)


    It would be helpful if you can post some of the dynamically generated SQL, ideally one with all 20+ columns included. In my experience, procedural developers don't write very good SQL. When they try to write it dynamically, it tends to be worse. Also, some DDL would be nice as well so we know more about the structure of the table this is running against.

    Does the dynamic SQL wildcard every column? (e.g. ColA LIKE '%' + @ColA + '%')

    Have you done any profiling to look for specific patterns? More than likely, there are just a few columns that are used in the majority of the queries. You may be able to gain some performance by creating an index to cover the most frequently used columns. Speeding up some of the queries may help the others by reducing system load.

    Does the table consist of more than the 20+ fields they can search by?

    Is this the only screen that uses this approach or is this problem spread throughout your application?

    +1000 to this. It's the likely problem. Without knowing what some of the formed queries actually look like, taking any other action, including beefing up hardware, is likely to be a futile effort. Even adding indexes to support every query might not solve the problem because many/most of the queries could be non-SARGable.

    Generally speaking, performance is in the code and can make up for a wealth of hardware issues and poor database design. Concentrate on fixing the code and, possibly, tweaking the underlying table design. If folks can't change either of those, they won't be able to change performance much.

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

    Change is inevitable... Change for the better is not.


    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)

  • A couple more questions:

    You said lots of queries time out, but not all. Have you looked to see if there are specific columns that experience the timeouts? I expect there are some [n]varchar(MAX) columns that cause timeouts when included.

    Does the timeout happen when you run the generated SQL manually in SSMS, or does it only happen when submitted through the application? You may have different timeout settings for the query vs the connection.

    If it happens in SSMS, does the timeout happen when you remove the Order By and/or Group By, leaving just the Where clause?

    Wes
    (A solid design is always preferable to a creative workaround)

  • What you've described is a terrible application for a traditional Row-Store / B-tree indexed table, but a ColumnStore index can help mitigate this type of dynamic access pattern. If this is a relatively static table that is periodically inserted and updated, then consider creating a ColumnStore index that covers any columns that can potentially be referenced in the SELECT, WHERE, or GROUP BY clause of a dynamic query.

    "Do not seek to follow in the footsteps of the wise. Instead, seek what they sought." - Matsuo Basho

Viewing 15 posts - 1 through 15 (of 16 total)

You must be logged in to reply to this topic. Login to reply