Conditional Aggregation

  • To build the sequence can we order by wellnum,eval,topdepth or is there more to it?


  • Re. Cursor being slow.

    I stopped the process after 36 minutes. I then ran the query which finds the N records to be aggregated. It returned about 46,000 at the beginning of the 36 minutes it was 55,000 that means that in 36 minutes it had only processed about 9,000 records.

    I think I can speed it up quite a bit by doing one query in the update statement. That code that I scavenged is some very early code I wrote about 4 years ago. I'd do it much differently now.

  • Re. Building the Sequence number.

    You've got it right.

  • ok here is the winner. 12 minutes for 65K rows.

    here are the tables

    CREATE TABLE [wells] (

    [HOLEID] [varchar] (25) NOT NULL ,

    [EvalNum] [int] NOT NULL ,

    [Seq] [int] NULL ,

    [TopDepth] [numeric](9, 2) NOT NULL ,

    [BaseDepth] [numeric](9, 2) NULL ,

    [MBit] [numeric](9, 2) NULL ,

    [PHIE] [numeric](9, 2) NULL ,

    [Swe] [numeric](9, 2) NULL ,

    [Vsh] [numeric](9, 2) NULL ,

    [Length] [numeric](9, 2) NOT NULL ,

    [OreFlag] [varchar] (5) NOT NULL

    ) ON [PRIMARY]

    GO

    CREATE TABLE [wellsagg] (

    [HOLEID] [varchar] (25) NOT NULL ,

    [EvalNum] [int] NOT NULL ,

    [Seq] [int] NULL ,

    [TopDepth] [numeric](9, 2) NOT NULL ,

    [BaseDepth] [numeric](9, 2) NULL ,

    [MBit] [numeric](9, 2) NULL ,

    [PHIE] [numeric](9, 2) NULL ,

    [Swe] [numeric](9, 2) NULL ,

    [Vsh] [numeric](9, 2) NULL ,

    [Length] [numeric](9, 2) NOT NULL ,

    [OreFlag] [varchar] (5) NOT NULL ,

    [pseq] [int] NOT NULL ,

    [nseq] [int] NOT NULL

    ) ON [PRIMARY]

    GO

    CREATE TABLE [WELLSA] (

    [HOLEID] [varchar] (13) NOT NULL ,

    [EvalNum] [int] NOT NULL ,

    [Seq] [int] IDENTITY (1, 1) NOT NULL ,

    [TopDepth] [numeric](6, 2) NOT NULL ,

    [BaseDepth] [numeric](6, 2) NOT NULL ,

    [MBit] [numeric](6, 2) NULL ,

    [PHIE] [numeric](6, 2) NULL ,

    [SwE] [numeric](6, 2) NULL ,

    [Vsh] [numeric](6, 2) NULL ,

    [Length] [numeric](6, 2) NULL ,

    [OreFlag] [varchar] (1) NULL

    ) ON [PRIMARY]

    GO

    CREATE TABLE [WELLSB] (

    [HOLEID] [varchar] (13) NOT NULL ,

    [EvalNum] [int] NOT NULL ,

    [Seq] [int] IDENTITY (1, 1) NOT NULL ,

    [TopDepth] [numeric](6, 2) NOT NULL ,

    [BaseDepth] [numeric](6, 2) NOT NULL ,

    [MBit] [numeric](6, 2) NULL ,

    [PHIE] [numeric](6, 2) NULL ,

    [SwE] [numeric](6, 2) NULL ,

    [Vsh] [numeric](6, 2) NULL ,

    [Length] [numeric](6, 2) NULL ,

    [OreFlag] [varchar] (1) NULL

    ) ON [PRIMARY]

    GO

    here is the code

    truncate table wellsa

    insert into wellsa (holeid,evalnum,topdepth,basedepth,mbit,phie,swe,vsh,length,oreflag )

    select holeid,evalnum,topdepth,basedepth,mbit,phie,swe,vsh,length,oreflag from wells

    order by holeid,evalnum,topdepth

    declare @wellsacnt int, @wellsbcnt int, @x int,@starttime datetime,@secs int

    set @x = 1

    set @starttime = getdate()

    print 'Iteration ' + convert(varchar,@x) + ' : ' + convert(varchar,getdate())

    select @wellsacnt = 0

    select @wellsbcnt = 1

    while @wellsacnt <> @wellsbcnt

    begin

    --move aggreagated batch from wellsa to wellsagg

    truncate table wellsagg

    insert into wellsagg (holeid,evalnum,seq,basedepth,topdepth,mbit,phie,swe,vsh,length,oreflag,pseq,nseq)

    select holeid,evalnum,seq,basedepth,topdepth,mbit,phie,swe,vsh,length,oreflag,pseq,nseq from (

    select

    a.holeid,a.evalnum,c.seq

    ,case when case when a.basedepth > b.basedepth then a.basedepth else b.basedepth end > c.basedepth then case when a.basedepth > b.basedepth then a.basedepth else b.basedepth end else c.basedepth end basedepth

    ,case when case when a.topdepth < b.topdepth then a.topdepth else b.topdepth end < c.topdepth then case when a.topdepth < b.topdepth then a.topdepth else b.topdepth end else c.topdepth end topdepth

    ,((a.mbit*a.length) +(b.mbit*b.length) + (c.mbit*c.length)) /(a.length+b.length + c.length) mbit

    ,((a.phie*a.length) +(b.phie*b.length) + (c.phie*c.length)) /(a.length+b.length + c.length) phie

    ,((a.swe*a.length) +(b.swe*b.length) + (c.swe*c.length)) /(a.length+b.length + c.length) swe

    ,((a.vsh*a.length) +(b.vsh*b.length) + (c.vsh*c.length)) /(a.length+b.length + c.length) vsh

    , (a.length+b.length + c.length) length

    ,case when ((a.mbit*a.length) +(b.mbit*b.length) + (c.mbit*c.length)) /(a.length+b.length + c.length) > .06 then 'Y' else 'N' end oreflag,

    a.seq pseq,b.seq nseq

    from wellsa a

    join wellsa b on a.evalnum = b.evalnum and a.holeid=b.holeid and a.seq = b.seq -1 and b.oreflag = 'y'

    join wellsa c on a.evalnum = c.evalnum and a.holeid=c.holeid and a.seq = c.seq +1 and c.oreflag = 'y'

    join

    (select a.holeid,a.evalnum,min(a.seq) seq

    from wellsa a

    join wellsa b on a.evalnum = b.evalnum and a.holeid=b.holeid and a.seq = b.seq -1 and b.oreflag = 'y'

    join wellsa c on a.evalnum = c.evalnum and a.holeid=c.holeid and a.seq = c.seq +1 and c.oreflag = 'y'

    where a.oreflag = 'n' and a.length < 1 and a.length < b.length and a.length < c.length

    group by a.holeid,a.evalnum) fs on fs.seq = a.seq

    where a.oreflag = 'n' and a.length < 1 and a.length < b.length and a.length < c.length

    ) n

    order by seq asc

    --move aggs and remaining to wellsb for seqencing

    truncate table wellsb

    insert into wellsb (holeid,evalnum,topdepth,basedepth,mbit,phie,swe,vsh,length,oreflag )

    select holeid,evalnum,topdepth,basedepth,mbit,phie,swe,vsh,length,oreflag

    from (

    select holeid,evalnum,topdepth,basedepth,mbit,phie,swe,vsh,length,oreflag

    from wellsagg

    union

    select holeid,evalnum,topdepth,basedepth,mbit,phie,swe,vsh,length,oreflag

    from wellsa w

    left join (

    select seq from wellsagg

    union select pseq from wellsagg

    union select nseq from wellsagg

    )a on w.seq = a.seq

    where a.seq is null

    )n order by holeid,evalnum,topdepth

    --move aggreagated batch from wellsb to wellsagg

    truncate table wellsagg

    insert into wellsagg (holeid,evalnum,seq,basedepth,topdepth,mbit,phie,swe,vsh,length,oreflag,pseq,nseq)

    select holeid,evalnum,seq,basedepth,topdepth,mbit,phie,swe,vsh,length,oreflag,pseq,nseq from (

    select

    a.holeid,a.evalnum,c.seq

    ,case when case when a.basedepth > b.basedepth then a.basedepth else b.basedepth end > c.basedepth then case when a.basedepth > b.basedepth then a.basedepth else b.basedepth end else c.basedepth end basedepth

    ,case when case when a.topdepth < b.topdepth then a.topdepth else b.topdepth end < c.topdepth then case when a.topdepth < b.topdepth then a.topdepth else b.topdepth end else c.topdepth end topdepth

    ,((a.mbit*a.length) +(b.mbit*b.length) + (c.mbit*c.length)) /(a.length+b.length + c.length) mbit

    ,((a.phie*a.length) +(b.phie*b.length) + (c.phie*c.length)) /(a.length+b.length + c.length) phie

    ,((a.swe*a.length) +(b.swe*b.length) + (c.swe*c.length)) /(a.length+b.length + c.length) swe

    ,((a.vsh*a.length) +(b.vsh*b.length) + (c.vsh*c.length)) /(a.length+b.length + c.length) vsh

    , (a.length+b.length + c.length) length

    ,case when ((a.mbit*a.length) +(b.mbit*b.length) + (c.mbit*c.length)) /(a.length+b.length + c.length) > .06 then 'Y' else 'N' end oreflag,

    a.seq pseq,b.seq nseq

    from wellsb a

    join wellsb b on a.evalnum = b.evalnum and a.holeid=b.holeid and a.seq = b.seq -1 and b.oreflag = 'y'

    join wellsb c on a.evalnum = c.evalnum and a.holeid=c.holeid and a.seq = c.seq +1 and c.oreflag = 'y'

    join

    (select a.holeid,a.evalnum,min(a.seq) seq

    from wellsb a

    join wellsb b on a.evalnum = b.evalnum and a.holeid=b.holeid and a.seq = b.seq -1 and b.oreflag = 'y'

    join wellsb c on a.evalnum = c.evalnum and a.holeid=c.holeid and a.seq = c.seq +1 and c.oreflag = 'y'

    where a.oreflag = 'n' and a.length < 1 and a.length < b.length and a.length < c.length

    group by a.holeid,a.evalnum) fs on fs.seq = a.seq

    where a.oreflag = 'n' and a.length < 1 and a.length < b.length and a.length < c.length

    ) n

    order by seq asc

    --move aggs and remaining to wellsa for seqencing

    truncate table wellsa

    insert into wellsa (holeid,evalnum,topdepth,basedepth,mbit,phie,swe,vsh,length,oreflag )

    select holeid,evalnum,topdepth,basedepth,mbit,phie,swe,vsh,length,oreflag

    from (

    select holeid,evalnum,topdepth,basedepth,mbit,phie,swe,vsh,length,oreflag

    from wellsagg

    union

    select holeid,evalnum,topdepth,basedepth,mbit,phie,swe,vsh,length,oreflag

    from wellsb w

    left join (

    select seq from wellsagg

    union select pseq from wellsagg

    union select nseq from wellsagg

    )a on w.seq = a.seq

    where a.seq is null

    )n order by holeid,evalnum,topdepth

    select @wellsacnt = count(*) from wellsa

    print 'wellsa rowcount = ' + convert(varchar,(@wellsacnt))

    select @wellsbcnt = count(*) from wellsb

    print 'wellsb rowcount = ' + convert(varchar,(@wellsbcnt))

    select @x=@x+1

    print 'Iteration ' + convert(varchar,@x) + ' : ' + convert(varchar,getdate())

    end


  • I'll run this against the whole dataset and see how long it takes.

    I revised that cursor-based code that I put up yesterday to use one query in the update statement. For some reason when I did it that way it messed with the cursor and so I was getting a @@fetch_status of -2. Not sure why that was happening so I built the cursor off of a copy of the table (not the one I was deleting records out of) and it seems to work.

    It took 1:54:30 to run the whole thing. Still not very impressive. I don't know how long the old code would have taken but I'm sure it would have been longer.

  • I forgot to mention that I played around with a few indexing schemes and found that no indexes worked the fastest. Likely because they slow the inserts more than they speed up the selects. Not sure what indexes you were using on the tables with your cursor version but that may help as well.


  • I haven't run your code yet but I think I've found my solution - I had an epiphany this morning (I have to check specific wells to make sure that the math is done right so I'm not quite sold yet. It completes in about 45 seconds!

    It works by;

    - marking the rows which need to be aggregated

    - figuring out a grouping number (using a VERY slick method which I learned on this forum!)

    - grouping the records

    Voila!

    As long as the math product of aggregating all contiguous rows at once is the same as aggregating one group of three and then the next group of three then I think that this solution works.

    IF OBJECT_ID(N'TempDb..##TEST',N'U') IS NOT NULL DROP TABLE ##TEST

    SELECT *, MBit * Length AS MBit_Length_Product, PHIE * Length AS PHIE_Length_Product, SwE * Length AS SwE_Length_Product, Vsh * Length AS Vsh_Length_Product, ' ' AS Assemble, 0 AS GroupNum INTO ##TEST FROM ##Aggregated_Intervals

    CREATE UNIQUE CLUSTERED INDEX TEST_PK ON ##TEST(HoleId, EvalNum, Seq)

    UPDATE PREV

    SET Assemble = 'Y'

    FROM ##TEST CUR

    LEFT JOIN ##TEST PREV

    ON PREV.HoleId = CUR.HoleId

    AND PREV.EvalNum = CUR.EvalNum

    AND PREV.Seq = CUR.Seq - 1

    LEFT JOIN ##TEST NEX

    ON NEX.HoleId = CUR.HoleId

    AND NEX.EvalNum = CUR.EvalNum

    AND NEX.Seq = CUR.Seq + 1

    WHERE

    CUR.OreFlag = 'N'

    AND CUR.LENGTH <= 1

    AND CUR.LENGTH <= PREV.LENGTH

    AND CUR.LENGTH <= NEX.LENGTH

    AND (((PREV.MBit * PREV.Length) + (CUR.MBit * CUR.Length) + (NEX.MBit * NEX.Length)) / (PREV.Length + CUR.Length + NEX.Length)) >= 0.06

    UPDATE CUR

    SET Assemble = 'Y'

    FROM ##TEST CUR

    LEFT JOIN ##TEST PREV

    ON PREV.HoleId = CUR.HoleId

    AND PREV.EvalNum = CUR.EvalNum

    AND PREV.Seq = CUR.Seq - 1

    LEFT JOIN ##TEST NEX

    ON NEX.HoleId = CUR.HoleId

    AND NEX.EvalNum = CUR.EvalNum

    AND NEX.Seq = CUR.Seq + 1

    WHERE

    CUR.OreFlag = 'N'

    AND CUR.LENGTH <= 1

    AND CUR.LENGTH <= PREV.LENGTH

    AND CUR.LENGTH <= NEX.LENGTH

    AND (((PREV.MBit * PREV.Length) + (CUR.MBit * CUR.Length) + (NEX.MBit * NEX.Length)) / (PREV.Length + CUR.Length + NEX.Length)) >= 0.06

    UPDATE NEX

    SET Assemble = 'Y'

    FROM ##TEST CUR

    LEFT JOIN ##TEST PREV

    ON PREV.HoleId = CUR.HoleId

    AND PREV.EvalNum = CUR.EvalNum

    AND PREV.Seq = CUR.Seq - 1

    LEFT JOIN ##TEST NEX

    ON NEX.HoleId = CUR.HoleId

    AND NEX.EvalNum = CUR.EvalNum

    AND NEX.Seq = CUR.Seq + 1

    WHERE

    CUR.OreFlag = 'N'

    AND CUR.LENGTH <= 1

    AND CUR.LENGTH <= PREV.LENGTH

    AND CUR.LENGTH <= NEX.LENGTH

    AND (((PREV.MBit * PREV.Length) + (CUR.MBit * CUR.Length) + (NEX.MBit * NEX.Length)) / (PREV.Length + CUR.Length + NEX.Length)) >= 0.06

    DECLARE

    @PrevHoleId AS VARCHAR(13)

    ,@PrevEvalNum AS INT

    ,@PrevAssemble AS VARCHAR(1)

    ,@GroupNum AS INT

    SET @GroupNum = 0

    SET @PrevHoleId = ' '

    SET @PrevEvalNum = 0

    SET @PrevAssemble = 'N'

    -- Beginning of cool code from the forum

    UPDATE ##TEST

    SET

    @GroupNum = GroupNum = CASE

    WHEN @PrevHoleId <> HoleId OR @PrevEvalNum <> EvalNum THEN 1

    WHEN Assemble = ' ' THEN @GroupNum + 1

    WHEN @PrevHoleId = HoleId AND @PrevEvalNum = EvalNum AND @PrevAssemble = Assemble THEN @GroupNum

    ELSE @GroupNum + 1

    END

    ,@PrevHoleId = HoleId

    ,@PrevEvalNum = EvalNum

    ,@PrevAssemble = Assemble

    FROM ##TEST WITH (tablock, INDEX(TEST_PK))

    -- End of the cool code

    SELECT

    HoleId

    ,EvalNum

    ,MIN(TopDepth) AS TopDepth

    ,MAX(BaseDepth) AS BaseDepth

    ,SUM(MBit_Length_Product) / SUM(Length) AS MBit

    ,SUM(PHIE_Length_Product) / SUM(Length) AS PHIE

    ,SUM(SwE_Length_Product) / SUM(Length) AS SwE

    ,SUM(Vsh_Length_Product) / SUM(Length) AS Vsh

    ,MAX(OreFlag) AS OreFlag

    ,SUM(Length) AS Length

    FROM ##TEST

    GROUP BY

    HoleId

    ,EvalNum

    ,GroupNum

    ORDER BY

    HoleId

    ,EvalNum

    ,TopDepth

  • Your new method comes up with less rows in the result than mine on t my test data. I am running a compare now to see where the differences lie. Also I noticed another error in my code where I left Vsh out of a few selects and it gets lost. I compared some of my result rows to yours and the math seemed to be the same.

    Out of 65550 rows in the test data my method results in 50079 and yours comes up with 48253. We match on 46927 rows.

    here is one of the differences

    wellnum1001 evalnum 3

    mine

    WellNum1001376369.45369.75.07.26.40NULL.30Y

    WellNum1001377369.75370.35.05.19.48NULL.60N

    WellNum1001378370.35372.15.13.35.23NULL1.80Y

    WellNum1001379372.15373.05.01.08.74NULL.90N

    WellNum1001380373.05373.95.07.26.39NULL.90Y

    WellNum1001381373.95375.15.02.13.68NULL1.20N

    yours

    WellNum10013369.45369.75.070000.260000.400000.190000Y.30

    WellNum10013369.75370.35.050000.190000.480000.350000N.60

    WellNum10013370.35373.95.085000.260000.397500.227500Y3.60

    WellNum10013373.95375.15.020000.130000.680000.490000N1.20


  • I think that the difference is attributable to the where clause where we are comparing the length value of the N row to the length of the bounding Y rows.

    I'm using <= and you're using <

  • ok, I am changing it and running it again. All of the other calculations look like they match and it is definitely much faster 1m45s to 12m44s on my server with the test data. Also a lot less overhead having 1 temp table instead of 3.


  • I ran mine with >= and and < and ended up with 17 passes resulting in 45351 rows again I see a difference in wellnum1001 evalnum3 which resulted in 69 rows vs 71 rows from the new method.

    mine

    WellNum100134353.8554.45.02.13.61.50.60N

    WellNum100134454.4555.95.06.22.43.271.50N

    WellNum100134555.9556.85.04.16.51.42.90N

    WellNum100134656.8557.15.06.24.42.23.30Y

    WellNum100134757.1558.65.04.17.50.391.50N

    yours

    WellNum1001353.8554.45.020000.130000.610000.500000N.60

    WellNum1001354.4555.35.060000.220000.430000.270000Y.90

    WellNum1001355.3555.65.050000.210000.440000.300000N.30

    WellNum1001355.6555.95.060000.240000.420000.240000Y.30

    WellNum1001355.9556.85.040000.160000.510000.420000N.90

    WellNum1001356.8557.15.060000.240000.420000.230000Y.30


  • I am guessing that the differnce may be rounding (my columns are decimal(9,2)) or averaging multiple rows at once vs averaging 3 rows and then using the resulting average to average another 3 rows.


Viewing 12 posts - 31 through 41 (of 41 total)

You must be logged in to reply to this topic. Login to reply