Find Appropriate Event After An Incident

  • I have a table of about 1.8 million violations that I need to associate with a sanction. The sanctions table has about 48000 sanctions on it. Each order in my violation table may have many (and I mean many) violations associated with it and each order may have many sanctions. The violations do not always have an end date hence the use of the start date as a date parameter. The sanction is triggered when the violations hit certain totals and there is no pattern to when these totals are reached. When a sanction is triggered the violations are reset and do not count towards any following sanctions. On the sample tables, violations 10052, 10053 and 18572 are dealt with on sanction 45222, violations 19624, 19755, 19854 and 25052 are dealt with on sanction 45224 and finally the remaining three violations are handled by sanction 45328.

    What I need to find is the next sanction that was raised after each set of violations. I'm using the code below to do this and it does exactly what I need. The problem is it's horribly slow over the full table. Does anybody have any better ideas? I think a Windowing Function is the way to go but I'm not sure how.

    create table #violation

    (

    OrderID int ----Many orders in the full table

    ,StartDate datetime2

    ,EndDate datetime2 null

    ,ViolationID int primary key --- Not consecutive for individual orders

    )

    create table #sanction

    (

    OrderID int ---Many orders in the full table

    ,IdentifiedDate datetime2

    ,SanctionID int primary key --- Not consecutive for individual orders

    )

    insert into #violation

    select 111111,'2014-09-27 19:00:00','2014-09-27 19:11:00',10049 union all

    select 222222,'2014-09-27 20:00:00','2014-09-28 03:16:00',10051 union all

    select 222222,'2014-09-28 03:17:00','2014-09-28 08:00:00',10052 union all

    select 111111,'2014-09-28 23:05:00','2014-09-28 23:25:00',10053 union all

    select 111111,'2014-09-29 20:05:00',null,18572 union all

    select 111111,'2014-10-01 19:25:00','2014-10-01 21:18:00',19624 union all

    select 111111,'2014-10-04 20:45:00','2014-10-04 21:01:00',19755 union all

    select 111111,'2014-10-06 19:00:00','2014-10-06 19:01:00',19854 union all

    select 111111,'2014-10-15 20:09:00',null,25052 union all

    select 111111,'2014-11-05 00:58:00','2014-11-06 06:56:00',26350 union all

    select 111111,'2014-12-14 01:00:00','2014-12-14 05:15:00',27852 union all

    select 111111,'2014-12-16 19:00:00','2014-12-16 23:11:00',28536

    insert into #sanction

    select 222222,'2014-09-29 11:16:00',44566 union all

    select 111111,'2014-09-30 14:00:00',45222 union all

    select 111111,'2014-10-16 09:00:00',45224 union all

    select 100001,'2014-12-17 10:55:00',45328

    select

    s.orderid

    ,v.startdate

    ,v.enddate

    ,v.violationid

    ,s.identifieddate

    ,s.sanctionID

    from #violation v

    join #sanction s on s.orderid = v.orderid

    where

    s.identifieddate = (select min(s2.identifieddate) from #sanction s2

    where s2.orderid = s.orderid

    and s2.identifieddate > v.startdate)

    drop table #violation, #sanction

    This is the edited highlights of the full code but it's the SELECT MIN(date) in the sub-query that's slowing things down.


    On two occasions I have been asked, "Pray, Mr. Babbage, if you put into the machine wrong figures, will the right answers come out?" ... I am not able rightly to apprehend the kind of confusion of ideas that could provoke such a question.
    —Charles Babbage, Passages from the Life of a Philosopher

    How to post a question to get the most help http://www.sqlservercentral.com/articles/Best+Practices/61537

  • Instead of MIN why not try a TOP 1 with an ORDER BY to get the dates in the correct order. You should even be able to have an index set in that order as well.

    "The credit belongs to the man who is actually in the arena, whose face is marred by dust and sweat and blood"
    - Theodore Roosevelt

    Author of:
    SQL Server Execution Plans
    SQL Server Query Performance Tuning

  • Thanks Grant, I've tried the TOP 1 approach and replaced the CTE's I was using with indexed temp tables but it's still not returned anything after half an hour.

    Is there any way to lose that sub-query completely I wonder? I've a suspicion this isn't going to be quick no matter what I do.


    On two occasions I have been asked, "Pray, Mr. Babbage, if you put into the machine wrong figures, will the right answers come out?" ... I am not able rightly to apprehend the kind of confusion of ideas that could provoke such a question.
    —Charles Babbage, Passages from the Life of a Philosopher

    How to post a question to get the most help http://www.sqlservercentral.com/articles/Best+Practices/61537

  • Not seeing the execution plan it's hard to make suggestions as to where exactly it's running slow. Are you sure you have indexes in place that support the query you're running.

    "The credit belongs to the man who is actually in the arena, whose face is marred by dust and sweat and blood"
    - Theodore Roosevelt

    Author of:
    SQL Server Execution Plans
    SQL Server Query Performance Tuning

  • I've put a clustered index on the Violations table that uses the StartDate and one on the Sanctions table using the dateidentified. I'll be honest though and say what I know about indexes and execution plans can be written on the back of cigarette packet. Will the execution plan for one order be the same as that for the whole set? I've not yet got the query to run to completion so as far as I know I can't produce the execution plan. If a sample set will produce the same plan I'll post one ASAP.


    On two occasions I have been asked, "Pray, Mr. Babbage, if you put into the machine wrong figures, will the right answers come out?" ... I am not able rightly to apprehend the kind of confusion of ideas that could provoke such a question.
    —Charles Babbage, Passages from the Life of a Philosopher

    How to post a question to get the most help http://www.sqlservercentral.com/articles/Best+Practices/61537

  • You can get the estimated execution plan without ever running the query at all. That's the place to start.

    "The credit belongs to the man who is actually in the arena, whose face is marred by dust and sweat and blood"
    - Theodore Roosevelt

    Author of:
    SQL Server Execution Plans
    SQL Server Query Performance Tuning

  • Of course I can :doze:

    It makes repeated reference to creating an index on the TimeViolation table but that's read-only. The rest of it is so much hieroglyphics to me so if you could have a look over it please I'd appreciate it.


    On two occasions I have been asked, "Pray, Mr. Babbage, if you put into the machine wrong figures, will the right answers come out?" ... I am not able rightly to apprehend the kind of confusion of ideas that could provoke such a question.
    —Charles Babbage, Passages from the Life of a Philosopher

    How to post a question to get the most help http://www.sqlservercentral.com/articles/Best+Practices/61537

  • First, you're estimated to be moving 2.9 million rows into temp tables and that's coming from four scans against your tables. That's just to load #temp_violation. That's going to be a giant performance hit and time suck, right there. You have a WHERE clause on the query, but it doesn't seem to do any filtering of the data at all. Based on the cardinality of the tables, you're not dealing with 2.9 million rows from each table, but you're multiplying them through this set of JOINs to arrive at that. This is the first thing I'd check. Also, a very common code smell is the DISTINCT operator. That usually indicates there is a problem with data, your structure, or your code putting together the data and the structure (or on really fun days, all three).

    The next insert estimate is that it's moving one row, but it's still going from a couple of scans. There are suggested indexes on both these that are worth exploring. Hard to say how much this is hurting you. The estimate is only for a single row, but I don't trust that.

    You might want to try building and loading the temp tables, then adding the clustered indexes to them. Right now, by having the indexes defined, you're having to maintain the index and the statistics while the data loads. That's also going to be a hefty performance hit.

    You then have a whole series of functions. Looks like scalar functions for the most part. Depending on what these do, it's all processing time in the main query.

    Query 24 is the SELECT statement. The WHERE clause is not leading to any filtering because most of the data access is scans. The worse one is the scan against the FilteredOrders table that is part of a nested loops join where it's going to scan 39k rows for every row that comes out of the rest of the query. That's not good. The temp tables are all being scanned too.

    Pretty much, you need to work on the indexing. That's the worst thing going on at the moment. Your code isn't filtering in a way that the indexes support (or, your indexes don't support your filters).

    First index I'd add, OrderID on the FilteredOrders table. But, that's the same as the suggestion for the missing indexes.

    "The credit belongs to the man who is actually in the arena, whose face is marred by dust and sweat and blood"
    - Theodore Roosevelt

    Author of:
    SQL Server Execution Plans
    SQL Server Query Performance Tuning

  • First thing is to get the best clustering:

    Cluster the violation table on ( OrderID, StartDate )

    Cluster the sanction table on ( OrderID, IdentifiedDate )

    SQL DBA,SQL Server MVP(07, 08, 09) A socialist is someone who will give you the shirt off *someone else's* back.

  • Stupid f"£$ing RDP connection has shut itself down and I've lost everything I did previously!!!!!! I'll now have to pretty much start again but that may not be a bad thing. Thanks for your input gents, at least the starting again will have the improvements in there.


    On two occasions I have been asked, "Pray, Mr. Babbage, if you put into the machine wrong figures, will the right answers come out?" ... I am not able rightly to apprehend the kind of confusion of ideas that could provoke such a question.
    —Charles Babbage, Passages from the Life of a Philosopher

    How to post a question to get the most help http://www.sqlservercentral.com/articles/Best+Practices/61537

  • Here's a possible re-write of the first temp table INSERT:

    insert into #temp_violation

    select

    tv.OrderID

    ,tv.StartDate

    ,tv.EndDate

    ,ca1.HourID

    from EMSTBLSN.dbo.TimeViolation tv

    cross apply (

    select distinct fh.HourID

    from G4SFEP1.dbo.Service fs

    join G4SFEP1.dbo.Hours fhon fh.ServiceID= fs.ServiceID

    where

    fs.LocationID = tv.SiteID and

    exists(

    select 1

    from test.dbo.FilteredOrders fo

    where fo.OrderId = fs.OrderID

    )

    ) as ca1

    where

    tv.ExcludeFromATV= 0

    SQL DBA,SQL Server MVP(07, 08, 09) A socialist is someone who will give you the shirt off *someone else's* back.

  • I've finally got back to this after the Christmas break and a brief sojourn in proper work.

    Thanks for your advice gents. I learned something very useful about execution plans yesterday, in that you can edit the code used to create them from a saved plan. That meant I could resurrect the original code I'd used after losing it in the RDP incident. I've also picked up a fair bit about indexing too, I put the indexing in place and it did speed things up a little. Then I had a flash of inspiration and I finally realised why a particular bit of code was used by others. It struck me that I didn't need to join to the Violations table to populate my Breach table. I needed to find the breaches for the order numbers that were in the violations table but I didn't need to link each breach to each individual violation. I used

    where so.eai_orderno_int in (select orderno from #temp_violation)

    rather than the join and it now runs in fifty or so seconds rather than fifty or so minutes. That construct always seemed a little pointless to me (a complete noob) but now it makes total sense.


    On two occasions I have been asked, "Pray, Mr. Babbage, if you put into the machine wrong figures, will the right answers come out?" ... I am not able rightly to apprehend the kind of confusion of ideas that could provoke such a question.
    —Charles Babbage, Passages from the Life of a Philosopher

    How to post a question to get the most help http://www.sqlservercentral.com/articles/Best+Practices/61537

  • Can you post the execution plan?

    Have you considered using EXISTS

    WHERE EXISTS (SELECT 1 FROM #temp_violation t WHERE t.orderno = so.eai_orderno_int)

    rather than IN

    where so.eai_orderno_int in (select orderno from #temp_violation)

    - performance is unlikely to be significantly different

    - EXISTS allows you to use more than one correlating column between the inner and outer query

    - extrapolating a little, you're less likely to get bitten by the NOT IN / NULL issue[/url]

    “Write the query the simplest way. If through testing it becomes clear that the performance is inadequate, consider alternative query forms.” - Gail Shaw

    For fast, accurate and documented assistance in answering your questions, please read this article.
    Understanding and using APPLY, (I) and (II) Paul White
    Hidden RBAR: Triangular Joins / The "Numbers" or "Tally" Table: What it is and how it replaces a loop Jeff Moden

  • ChrisM@Work (12/31/2014)


    Can you post the execution plan? Done

    Have you considered using EXISTS No I hadn't, see my earlier comment about being a noob 🙂 The fact it ran at all opened up another can of worms which needed work.

    WHERE EXISTS (SELECT 1 FROM #temp_violation t WHERE t.orderno = so.eai_orderno_int)

    rather than IN

    where so.eai_orderno_int in (select orderno from #temp_violation)

    - performance is unlikely to be significantly different Anything that returns a result in less than an hour will be considered a success!!

    - EXISTS allows you to use more than one correlating column between the inner and outer query Not likely to be necessary for this but I'll bear it in mind

    - extrapolating a little, you're less likely to get bitten by the NOT IN / NULL issue[/url] This is for ad-hoc runs on static data so it's unlikely to be a problem in this case but again I'll bear it mind


    On two occasions I have been asked, "Pray, Mr. Babbage, if you put into the machine wrong figures, will the right answers come out?" ... I am not able rightly to apprehend the kind of confusion of ideas that could provoke such a question.
    —Charles Babbage, Passages from the Life of a Philosopher

    How to post a question to get the most help http://www.sqlservercentral.com/articles/Best+Practices/61537

  • Thanks.

    Query 1 (insert into #temp_violation etc.) has an odd filter:

    and

    (

    cast(fh.startdate as date)<= cast(tv.startdate as date)

    and

    cast(fh.startdate as time)<= cast(tv.startdate as time)

    )

    This will capture datetimes from fh where they are less than or equal to tv.startdate (as date), but for any fh rows which match - which could be different days - then times after cast(tv.startdate as time) will be eliminated. I'm not sure this is what you want. Shouldn't you just be comparing the two datetimes directly?

    [/fh.startdate <= tv.startdate code]

    Can you create any indexes on the tables in query 1?

    Query 2 (insert into #temp_breach) has an expensive key lookup to table IncidentExtensionBase. Can you add the columns which the key lookup collects to index IX_VI_OUTRES, or alternatively, create a new index which contains just the columns you need from the table? Better still, since you're only using a small number of columns from the view

    (eai_orderno_int, vf_subjectorderid and vf_subjectid_lk from Vf_subjectorderExtensionBase)

    do you actually need to reference the view at all?

    - convert the scalar function EM_Exit.dbo.G4STime() to an iTVF.

    Query 3 would be helped with an additional index on #temp_breach: orderid and DateIdentified

    Edit:

    Here's the missing index hint for Query 3. Table TimeViolation isn't referenced in the query.

    /*

    Missing Index Details from Ex_Plan.sqlplan

    The Query Processor estimates that implementing the following index could improve the query cost by 59.6517%.

    */

    /*

    USE [EMSTBLSN]

    GO

    CREATE NONCLUSTERED INDEX [<Name of Missing Index, sysname,>]

    ON [dbo].[TimeViolation] ([ExcludeFromATV])

    INCLUDE ([SiteID],[StartDate],[EndDate])

    GO

    */

    “Write the query the simplest way. If through testing it becomes clear that the performance is inadequate, consider alternative query forms.” - Gail Shaw

    For fast, accurate and documented assistance in answering your questions, please read this article.
    Understanding and using APPLY, (I) and (II) Paul White
    Hidden RBAR: Triangular Joins / The "Numbers" or "Tally" Table: What it is and how it replaces a loop Jeff Moden

Viewing 15 posts - 1 through 15 (of 16 total)

You must be logged in to reply to this topic. Login to reply