enumerating gaps between islands, ideas?

Question

enumerating gaps between islands, ideas?

caffeinated

Ten Centuries

Points: 1271
More actions
August 31, 2014 at 10:14 pm

#289564

Hi all,
I have a system log with NULL gaps between a sequence of numbers...see "BEFORE" sample below.
The number of gaps between the Sequence_ID's are arbitrary, but generally less then 50 records.
I'd like enumerate the gaps to produce the "AFTER" result, but do it with a single query or view, not through procedures.
I've been playing with windowed functions and groupings with no success. I'm guessing it'll need some recursive CTE logic, but I haven't been able to figure it out the correct loop.
Anyone have any ideas?
Thanks!
BEFORE:
PK_IDSequence_ID
1035586935587
3035586234 NULL
8355585 NULL
1235584 NULL
4675583 35583
4035582 NULL
6035382 NULL
1435581 NULL
2035580 NULL
3435553 35563
6603589 NULL
9475559 35552
AFTER:
PK_IDSequence_ID
1035586935587
3035586234 3
8355585 2
1235584 1
4675583 35583
4035582 4
6035382 3
1435581 2
2035580 1
3435553 35563
6603589 1
9475559 35552

Viewing 13 posts - 1 through 12 (of 12 total)

You must be logged in to reply to this topic. Login to reply

Eirikur Eiriksson SSC Guru Points: 182881 More actions · Answer 1

Quick thought, no need to use recursion or loops, rather use the LAG window function. Here is a quick sample code with some test data.

😎

USE tempdb;

GO

DECLARE @TEST_DATA TABLE

(

TD_ID INT IDENTITY(1,1) NOT NULL PRIMARY KEY CLUSTERED

,TD_NUM INT NOT NULL

);

DECLARE @SAMPLE_SIZE INT = 10;

;WITH T(N) AS (SELECT N FROM ( VALUES (NULL),(NULL),(NULL),(NULL),(NULL)

,(NULL),(NULL),(NULL),(NULL),(NULL)) AS X(N))

,NUMS(N) AS(SELECT TOP(@SAMPLE_SIZE) ROW_NUMBER() OVER (ORDER BY (SELECT NULL)) AS N

FROM T T1, T T2, T T3, T T4, T T5, T T6, T T7)

INSERT INTO @TEST_DATA (TD_NUM)

SELECT

ABS(FLOOR(CHECKSUM(NEWID()) / (RAND() * 10)))

FROM NUMS NM;

SELECT

TD.TD_ID

,TD.TD_NUM

,TD.TD_NUM - LAG(TD.TD_NUM) OVER (ORDER BY TD.TD_ID) AS GAP

FROM @TEST_DATA TD;

Results

TD_ID TD_NUM GAP

----------- ----------- ----------

1 467441025 NULL

2 336642282 -130798743

3 336717627 75345

4 355014646 18297019

5 52905649 -302108997

6 303559379 250653730

7 316539004 12979625

8 181767044 -134771960

9 521113123 339346079

10 392579077 -128534046

caffeinated Ten Centuries Points: 1271 More actions · Answer 2

Thanks, I'll take a look.

Realized my sample data was bad. The PK will be in increasing order, which might help, but it'll have same gaps, like the Sequence_ID.

Here's better sample code:

DECLARE @system_log TABLE(

PK_ID int PRIMARY KEY

,Sequence_ID int null

)

INSERT @system_log(

PK_ID

,Sequence_ID

)VALUES

(1035590, 35587),

(1035589, NULL),

(1035586, NULL),

(1035585, NULL),

(1035584, NULL),

(1035583, 35583),

(1035582, NULL),

(1035581, NULL),

(1035579, NULL),

(1035578, 35553),

(1035554, NULL),

(1035550, 35550)

SELECT

*

FROM @system_log

ORDER by PK_ID desc

Result:

PK_IDSequence_ID

103559035587

1035589NULL

1035586NULL

1035585NULL

1035584NULL

103558335583

1035582NULL

1035581NULL

1035579NULL

103557835553

1035554NULL

103555035550

Expected Result:

PK_IDSequence_ID

103559035587

10355894

10355863

10355852

10355841

103558335583

10355823

10355812

10355791

103557835553

10355541

103555035550

Eirikur Eiriksson SSC Guru Points: 182881 More actions · Answer 3

Here is another code example using your data sample

😎

USE tempdb;

GO

;WITH system_log(PK_ID,Sequence_ID) AS

( SELECT PK_ID,Sequence_ID FROM

(VALUES

(1035590, 35587),

(1035589, NULL),

(1035586, NULL),

(1035585, NULL),

(1035584, NULL),

(1035583, 35583),

(1035582, NULL),

(1035581, NULL),

(1035579, NULL),

(1035578, 35553),

(1035554, NULL),

(1035550, 35550)) AS X(PK_ID,Sequence_ID)

)

SELECT

SL.PK_ID

,LEAD(SL.PK_ID,1,SL.PK_ID) OVER

(

ORDER BY SL.PK_ID

) AS LEAD_PK_ID

,LAG(SL.PK_ID,1,SL.PK_ID) OVER

(

ORDER BY SL.PK_ID

) AS LAG_PK_ID

,ISNULL(SL.Sequence_ID,LEAD(SL.PK_ID,1,SL.PK_ID) OVER

(

ORDER BY SL.PK_ID

) - SL.PK_ID) AS FORWARD_GAP

,ISNULL(SL.Sequence_ID,SL.PK_ID - LAG(SL.PK_ID,1,SL.PK_ID) OVER

(

ORDER BY SL.PK_ID

) ) AS REVERSE_GAP

FROM system_log SL;

Results

PK_ID LEAD_PK_ID LAG_PK_ID FORWARD_GAP REVERSE_GAP

----------- ----------- ----------- ----------- -----------

1035550 1035554 1035550 35550 35550

1035554 1035578 1035550 24 4

1035578 1035579 1035554 35553 35553

1035579 1035581 1035578 2 1

1035581 1035582 1035579 1 2

1035582 1035583 1035581 1 1

1035583 1035584 1035582 35583 35583

1035584 1035585 1035583 1 1

1035585 1035586 1035584 1 1

1035586 1035589 1035585 3 1

1035589 1035590 1035586 1 3

1035590 1035590 1035589 35587 35587

TheSQLGuru SSC Guru Points: 134017 More actions · Answer 4

If you Binoogle Itzik Ben-Gan gaps and islands TSQL all of the top hits are relevant to your need.

Best,
Kevin G. Boles
SQL Server Consultant
SQL MVP 2007-2012
TheSQLGuru on googles mail service

caffeinated Ten Centuries Points: 1271 More actions · Answer 5

Thanks for the tip. I've actually recently purchased "Microsoft SQL Server 2012 High-Performance T-SQL Using Window Functions" written by Itzik Ben-Gan.

I think the main problem is the data set doesn't have a predictable Window to partition against, so I can't use Window Functions w/o have a way to traverse the data.

I'm working on a query using recursive CTE and LEAD/LAG. Looking at using a CROSS APPLY or FOR XML subqueries to generate the "Window"

I might break down and write a procedural solution. Still really want this as a single query.

Lynn Pettis SSC Guru Points: 442467 More actions · Answer 6

Here is a potential solution. It works with the sample data provided.

DECLARE @system_log TABLE(

PK_ID int PRIMARY KEY

,Sequence_ID int null

)

INSERT @system_log(

PK_ID

,Sequence_ID

)VALUES

(1035590, 35587),

(1035589, NULL),

(1035586, NULL),

(1035585, NULL),

(1035584, NULL),

(1035583, 35583),

(1035582, NULL),

(1035581, NULL),

(1035579, NULL),

(1035578, 35553),

(1035554, NULL),

(1035550, 35550);

WITH BaseData as (

SELECT

PK_ID,

Sequence_ID,

rn3 = row_number() over (order by PK_ID) - row_number() over (partition by case when Sequence_ID is null then 0 else 1 end order by PK_ID)

FROM

@system_log

), InertimData as (

select

bd1.PK_ID,

bd1.Sequence_ID,

cn = row_number() over (partition by case when bd1.Sequence_ID is null then 0 else 1 end, bd1.rn3 order by bd1.PK_ID)

from

BaseData bd1

)

select

PK_ID,

coalesce(Sequence_ID, cn) Sequence_ID

from

InertimData

order by

PK_ID desc;

Dwain Camps SSC Guru Points: 86908 More actions · Answer 7

Not exactly a traditional gaps and islands problem but fun nevertheless. Here's a relatively simple alternative.

DECLARE @system_log TABLE(

PK_ID int PRIMARY KEY

,Sequence_ID int null

)

INSERT @system_log(

PK_ID

,Sequence_ID

)VALUES

(1035590, 35587),

(1035589, NULL),

(1035586, NULL),

(1035585, NULL),

(1035584, NULL),

(1035583, 35583),

(1035582, NULL),

(1035581, NULL),

(1035579, NULL),

(1035578, 35553),

(1035554, NULL),

(1035552, 35551),

(1035550, 35550)

SELECT PK_ID, Sequence_ID=CASE WHEN Sequence_ID IS NOT NULL THEN Sequence_ID ELSE

ROW_NUMBER() OVER (PARTITION BY rn2 ORDER BY PK_ID)-1 END

FROM

(

SELECT PK_ID, Sequence_ID

,rn2=MAX(Sequence_ID) OVER (ORDER BY PK_ID ROWS BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW)

FROM @system_log

) a

ORDER by PK_ID desc;

My mantra: No loops! No CURSORs! No RBAR! Hoo-uh![/I]
My thought question: Have you ever been told that your query runs too fast?

My advice:
INDEXing a poor-performing query is like putting sugar on cat food. Yeah, it probably tastes better but are you sure you want to eat it?
The path of least resistance can be a slippery slope. Take care that fixing your fixes of fixes doesn't snowball and end up costing you more than fixing the root cause would have in the first place.

Need to UNPIVOT? Why not CROSS APPLY VALUES instead?[/url]
Since random numbers are too important to be left to chance, let's generate some![/url]
Learn to understand recursive CTEs by example.[/url]
[url url=http://www.sqlservercentral.com/articles/St

caffeinated Ten Centuries Points: 1271 More actions · Answer 8

Awesome, thanks dwain! Very elegant solution. I should be able to transfer this logic to the actual data quite easily.

caffeinated Ten Centuries Points: 1271 More actions · Answer 9

Thanks Lynn as well. Love having alternatives in the toolbelt.

Lynn Pettis SSC Guru Points: 442467 More actions · Answer 10

dwain.c (9/1/2014)
Not exactly a traditional gaps and islands problem but fun nevertheless. Here's a relatively simple alternative.
DECLARE @system_log TABLE(
PK_ID int PRIMARY KEY
,Sequence_ID int null
)
INSERT @system_log(
PK_ID
,Sequence_ID
)VALUES
(1035590, 35587),
(1035589, NULL),
(1035586, NULL),
(1035585, NULL),
(1035584, NULL),
(1035583, 35583),
(1035582, NULL),
(1035581, NULL),
(1035579, NULL),
(1035578, 35553),
(1035554, NULL),
(1035552, 35551),
(1035550, 35550)
SELECT PK_ID, Sequence_ID=CASE WHEN Sequence_ID IS NOT NULL THEN Sequence_ID ELSE
ROW_NUMBER() OVER (PARTITION BY rn2 ORDER BY PK_ID)-1 END
FROM
(
SELECT PK_ID, Sequence_ID
,rn2=MAX(Sequence_ID) OVER (ORDER BY PK_ID ROWS BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW)
FROM @system_log
) a
ORDER by PK_ID desc;

Dwain, I like yours as it appears to be more efficient than mine. I rewrote it slightly to make it cleaner in my opinion.

DECLARE @system_log TABLE(

PK_ID int PRIMARY KEY

,Sequence_ID int null

)

INSERT @system_log(

PK_ID

,Sequence_ID

)VALUES

(1035590, 35587),

(1035589, NULL),

(1035586, NULL),

(1035585, NULL),

(1035584, NULL),

(1035583, 35583),

(1035582, NULL),

(1035581, NULL),

(1035579, NULL),

(1035578, 35553),

(1035554, NULL),

(1035550, 35550);

WITH BaseData as (

SELECT

PK_ID,

Sequence_ID,

rn2 = MAX(Sequence_ID) OVER (ORDER BY PK_ID ROWS BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW)

FROM

@system_log

)

SELECT

PK_ID,

Sequence_ID = CASE WHEN Sequence_ID IS NOT NULL

THEN Sequence_ID

ELSE ROW_NUMBER() OVER (PARTITION BY rn2 ORDER BY PK_ID) - 1

END

FROM

BaseData

ORDER BY

PK_ID DESC;

Dwain Camps SSC Guru Points: 86908 More actions · Answer 11

Lynn Pettis (9/1/2014)
dwain.c (9/1/2014)
Not exactly a traditional gaps and islands problem but fun nevertheless. Here's a relatively simple alternative.
DECLARE @system_log TABLE(
PK_ID int PRIMARY KEY
,Sequence_ID int null
)
INSERT @system_log(
PK_ID
,Sequence_ID
)VALUES
(1035590, 35587),
(1035589, NULL),
(1035586, NULL),
(1035585, NULL),
(1035584, NULL),
(1035583, 35583),
(1035582, NULL),
(1035581, NULL),
(1035579, NULL),
(1035578, 35553),
(1035554, NULL),
(1035552, 35551),
(1035550, 35550)
SELECT PK_ID, Sequence_ID=CASE WHEN Sequence_ID IS NOT NULL THEN Sequence_ID ELSE
ROW_NUMBER() OVER (PARTITION BY rn2 ORDER BY PK_ID)-1 END
FROM
(
SELECT PK_ID, Sequence_ID
,rn2=MAX(Sequence_ID) OVER (ORDER BY PK_ID ROWS BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW)
FROM @system_log
) a
ORDER by PK_ID desc;
Dwain, I like yours as it appears to be more efficient than mine. I rewrote it slightly to make it cleaner in my opinion.
DECLARE @system_log TABLE(
PK_ID int PRIMARY KEY
,Sequence_ID int null
)
INSERT @system_log(
PK_ID
,Sequence_ID
)VALUES
(1035590, 35587),
(1035589, NULL),
(1035586, NULL),
(1035585, NULL),
(1035584, NULL),
(1035583, 35583),
(1035582, NULL),
(1035581, NULL),
(1035579, NULL),
(1035578, 35553),
(1035554, NULL),
(1035550, 35550);
WITH BaseData as (
SELECT
PK_ID,
Sequence_ID,
rn2 = MAX(Sequence_ID) OVER (ORDER BY PK_ID ROWS BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW)
FROM
@system_log
)
SELECT
PK_ID,
Sequence_ID = CASE WHEN Sequence_ID IS NOT NULL
THEN Sequence_ID
ELSE ROW_NUMBER() OVER (PARTITION BY rn2 ORDER BY PK_ID) - 1
END
FROM
BaseData
ORDER BY
PK_ID DESC;

Thanks Lynn. Guess I skipped the last part of Make it Work, Make it Fast, Make it Pretty[/url]

🙂

My mantra: No loops! No CURSORs! No RBAR! Hoo-uh![/I]
My thought question: Have you ever been told that your query runs too fast?

My advice:
INDEXing a poor-performing query is like putting sugar on cat food. Yeah, it probably tastes better but are you sure you want to eat it?
The path of least resistance can be a slippery slope. Take care that fixing your fixes of fixes doesn't snowball and end up costing you more than fixing the root cause would have in the first place.

Need to UNPIVOT? Why not CROSS APPLY VALUES instead?[/url]
Since random numbers are too important to be left to chance, let's generate some![/url]
Learn to understand recursive CTEs by example.[/url]
[url url=http://www.sqlservercentral.com/articles/St

Lynn Pettis SSC Guru Points: 442467 More actions · Answer 12

Dwain,

Actually, I just wanted to understand the pieces. I really need to learn the extensions to the windowing functions in 2012.