performance of the query and index

  • hi friends,

    I need some advice on where to start with performance tuning. I am sure it is huge topic and we need to look into many mnay things. I started with simple query but huge table. First thing is i need to decide on this index. We already have clustered index on but that field called (service_id) is varchar(65) and it is made key based on combination of diffent field. It is unique so eligible for clustered but i am not convinced as it takes lot of space.

    The table has got 143 million record. it doesnt have autoid generated.

    First of all is it right index?

    one idea i am thinking is to create a separte lookup table with autoid (int) and service_id (varchar (62)) and then use the autoid in the main table.

    Also this table is joined many items with other table but this index is not working it seems.

    It takes 6 minutes to just retrieve the data.

    Thanks.

  • the query below is on this table which takes 6 minutes to do the row number.

    select con_fyear,v_finyear,prov_spell_id,patient_id,admidate,disdate,

    row_number() over (partition by patient_id order by admidate,disdate,pat_epiorder21) as admiorder

    from dbo.Health_Table

  • 6 minutes to retrieve, sort and transmit 143 million rows doesn't sound too bad. That's a huge amount of data. What are you doing with 143 million rows?

    Gail Shaw
    Microsoft Certified Master: SQL Server, MVP, M.Sc (Comp Sci)
    SQL In The Wild: Discussions on DB performance with occasional diversions into recoverability

    We walk in the dark places no others will enter
    We stand on the bridge and no one may pass
  • thats bit of relief then. I am not sure how long it takes to retrieve as i didnt work on huge dataset before so just making sure.

    Once the row number is done it is doing the self join on the same table to find the difference between two dates (discharge and admission date) for same patients. which takes 3 minutes.

    select t1.v_finyear as finyear_1, t1.prov_spell_id as prov_spell_id_1,

    t2.v_finyear as finyear_2, t2.prov_spell_id as prov_spell_id_2,

    DATEDIFF(day,t1.disdate,t2.admidate) as timediff

    from dbo.Table1 t1

    inner join dbo.table1 t2

    on t1.patient_id=t2.patient_id and t1.admiorder=t2.admiorder-1

    and t2.admidate>=t1.disdate

    The main step is after this where it takes half an hour but it is big query. I can send it here if you like and can go through it.

    Thanks.

  • Also i've noticed that clustered index on this table is having two columns.

    service_id (varchar (62)) & year (201011 or 201112 etc.). That year is also varchar(6) and combined with service_id.

    My impression is we can have clustered index on service_id as it is unique for each row and can have non clustered in year as it is used in many joins.

  • How about you give us the entire query? Can't work on drips and drabs. Also execution plan, table and index definitions. http://www.sqlservercentral.com/articles/SQLServerCentral/66909/

    Gail Shaw
    Microsoft Certified Master: SQL Server, MVP, M.Sc (Comp Sci)
    SQL In The Wild: Discussions on DB performance with occasional diversions into recoverability

    We walk in the dark places no others will enter
    We stand on the bridge and no one may pass
  • dva2007 (10/21/2011)


    My impression is we can have clustered index on service_id as it is unique for each row

    Clustered indexes don't have to be unique. It's generally recommended that unique columns are chosen for the cluster, unless somewhere else makes more sense.

    Gail Shaw
    Microsoft Certified Master: SQL Server, MVP, M.Sc (Comp Sci)
    SQL In The Wild: Discussions on DB performance with occasional diversions into recoverability

    We walk in the dark places no others will enter
    We stand on the bridge and no one may pass
  • Hi Gail,

    thanks for looking into this. I can provide all the information but it seems it will take some time as query runs for 40 minutes and it is friday evening so will do it tomorrrow or monday. I do apologies about asking some basic questions some times but i think once i deal with 1 or 2 queries it will give me basic understanding of where to investigate and how to investigate.

    Many thanks for your help.

    Vijay

  • Hi Gail,

    I've tried to include all the information in the attachment.

    SQL Query, Table definition, index detail, constraint, number of records, timing of the query.

    It takes around 45-47 minutes at the moment. I've summarised some of the timings in the query.

    Please let me know if you need other information.

    Thanks,

    Vijay

  • I just took a quick look, but to be blunt that procedure probably needs a rewrite.

    Permanent tables instead of temp tables? That's gonna cause all osrts of fun if this is ever run twice at the same time (not to mention the logging overhead)

    It reads very procedural. Insert data into temp table 1. Do something and insert into table 2. Do something and insert into table 3. Most of that can probably be done in one statement without needing temp storage in the middle.

    Do you really need all the rows? Are you actually processing all of them or can they be filtered right from the beginning?

    This isn't a quick fix, the fundamentals are inefficient, can't just patch on top.

    Gail Shaw
    Microsoft Certified Master: SQL Server, MVP, M.Sc (Comp Sci)
    SQL In The Wild: Discussions on DB performance with occasional diversions into recoverability

    We walk in the dark places no others will enter
    We stand on the bridge and no one may pass
  • 1. Tables are completely not normalized.

    2. It's better not to have a clustered index at all than that big and ugly one. It will add 68 bytes to each and every non-clustered index + additional 8 bytes if your clustered index is not unique. And if the values are not incrementing but random then it will makes a lot of fragmentation in the index.

    3. Creating this index will help a little bit for your query:

    CREATE NONCLUSTERED INDEX IDX_PS_HED_ALLYEAR_EPISODES_Frozen_provsp_seq

    ON dbo.PS_HED_ALLYEAR_EPISODES_Frozen (provsp_seq)

    INCLUDE (prov_spell_id, SecondaryDiag02, mainspef, tretspef)

    4. But the first thing you need to do is to stop moving 40GB of data here and there. No index or any other tricks will help if you are reading and writing the whole tables. Find a proper filter for your queries to limit the number of records. I don't believe that 140M records are changing every day and have to be fully recalculated each time.


    Alex Suprun

  • I just took a quick look, but to be blunt that procedure probably needs a rewrite.

    Permanent tables instead of temp tables? That's gonna cause all osrts of fun if this is ever run twice at the same time (not to mention the logging overhead)

    It reads very procedural. Insert data into temp table 1. Do something and insert into table 2. Do something and insert into table 3. Most of that can probably be done in one statement without needing temp storage in the middle.

    Do you really need all the rows? Are you actually processing all of them or can they be filtered right from the beginning?

    This isn't a quick fix, the fundamentals are inefficient, can't just patch on top.

    ---------------------

    Hi Gail,

    Yeah the first thing i never write select * into statement unless it is few hunder records and just one off transfer of the table. Even if it is temporary table it is best to write create table and then then drop at the end. That will it wont lock tempdb as some one else may need to use the functionality jsut for one off and i may lock it for few minutes to process millions of records.

    I also think we may not need to process all of these rows but just need to confirm with the team. I think we may be able to reduce the numebr of records.

    I will look into combine the scripts to reduce the transfer of data from one table to another.

    Many thanks for your help. Please let me know if you have any further suggestions.

    Thanks.

  • dva2007 (10/22/2011)


    Even if it is temporary table it is best to write create table and then then drop at the end. That will it wont lock tempdb as some one else may need to use the functionality jsut for one off and i may lock it for few minutes to process millions of records.

    If you're talking about the myth that SELECT ... INTO locks the entire of TempDB, there used to be a problem with locked system tables (not the entire DB), but that was fixed in either SQL 7 or 2000.

    Create Table #... or SELECT INTO #..., I really don't care which, but you are using a permanent table (a table in a user database) here, not a temp table. That's got a higher logging overhead and if two people run this at the same time, you have a concurrency nightmare (they will both use the same table, one will get errors, both will get incorrect results.

    Gail Shaw
    Microsoft Certified Master: SQL Server, MVP, M.Sc (Comp Sci)
    SQL In The Wild: Discussions on DB performance with occasional diversions into recoverability

    We walk in the dark places no others will enter
    We stand on the bridge and no one may pass
  • GilaMonster (10/22/2011)


    If you're talking about the myth that SELECT ... INTO locks the entire of TempDB, there used to be a problem with locked system tables (not the entire DB), but that was fixed in either SQL 7 or 2000.

    Well done! I can't believe how many people still believe that SELECT/INTO paralyzes TempDB.

    Just to add some more information about this subject... The fix was incorporated into SQL Server 7 and a Trace Flag "fix" was made available in SQL Server 6.5 SP1. Please see the following for MS documentation on that subject...

    http://support.microsoft.com/kb/153441/EN-US/

    Notice at the bottom of that document where it says "NOTE: This problem does not apply to SQL Server 7.0 and later."

    Now, I'll also say that the myth is still partially true but only under some particular circumstances.

    1. If someone is doing a large SELECT/INTO, then trying to bring up the "Properties" page on TempDB from SSMS will wait for the SELECT/INTO to complete. This is because of some of the more minor locks (schema locks, IIRC) that SELECT/INTO imparts on (IIRC) sys.objects in TempDB.

    2. This one is a whole lot more nasty... I don't know if it's still true in 2008 and above but, in 2005 and below, if you do a SELECT/INTO using a linked server as the source of information, it will exclusively lock the source until the SELECT/INTO completes. It's not a documented behavior that I can find anywhere so you'll have to try it to prove it to yourself. I found out the hard way when my Systems DBA came to my desk and asked me what the hell I was doing. 😛

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

    Change is inevitable... Change for the better is not.


    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)

  • thanks jeff & gail.

    I had a quick look at this article before couple of years which made my wrong impression about select into.

    http://searchsqlserver.techtarget.com/tip/Tricks-to-increase-SQL-Server-query-performance

    Many thanks for the clarification.

Viewing 15 posts - 1 through 15 (of 19 total)

You must be logged in to reply to this topic. Login to reply