Click here to monitor SSC
SQLServerCentral is supported by Red Gate Software Ltd.
 
Log in  ::  Register  ::  Not logged in
 
 
 
        
Home       Members    Calendar    Who's On


Add to briefcase 12»»

Finding Gaps in a Sequential Number Sequence Expand / Collapse
Author
Message
Posted Monday, March 20, 2006 1:58 PM
SSC Rookie

SSC RookieSSC RookieSSC RookieSSC RookieSSC RookieSSC RookieSSC RookieSSC Rookie

Group: General Forum Members
Last Login: Monday, August 1, 2011 2:05 PM
Points: 32, Visits: 47
Comments posted to this topic are about the content posted at http://www.sqlservercentral.com/columnists/slasham/findinggapsinasequentialnumbersequence.asp


Post #267061
Posted Monday, April 3, 2006 3:15 AM


SSCommitted

SSCommittedSSCommittedSSCommittedSSCommittedSSCommittedSSCommittedSSCommittedSSCommitted

Group: General Forum Members
Last Login: Today @ 7:09 AM
Points: 1,945, Visits: 3,009
Might want to look at Chapter 24 of SQL FOR SMARTIES,and ection 24.5.1. in particular which deals with finding the lowest available number in a sequence. Because optimizers tend to keep min and max data in their stats and indexes make them easy to find, I liked this version:

SELECT CASE WHEN MAX(seq_nbr) = COUNT(*)
THEN CAST(NULL AS INTEGER)
-- THEN MAX(seq_nbr) + 1 as other option
WHEN MIN(seq_nbr) > 1
THEN 1
WHEN MAX(seq_nbr) <> COUNT(*)
THEN (SELECT MIN(seq_nbr)+1
FROM List
WHERE (seq_nbr + 1)
NOT IN (SELECT seq_nbr FROM List))
ELSE NULL END
FROM List;

The first WHEN clause sees if the table is already full and returns a NULL; the NULL has to be cast as an INTEGER to become an expression that can then be used in the THEN clause. However, you might want to increment the list by the next value.

The second WHEN clause looks to see if the minimum sequence number is 1 or not. If so, it uses 1 as the next value

The third WHEN clause handles the situation when there is a gap in the middle of the sequence. It picks the lowest missing number. The ELSE clause is in case of errors and should not be executed. I am trusitng the optimizer to handle the IN() predicate.

The order of execution in the CASE expression is important. It is a way of forcing an inspection from front to back of the table's values.



Books in Celko Series for Morgan-Kaufmann Publishing
Analytics and OLAP in SQL
Data and Databases: Concepts in Practice
Data, Measurements and Standards in SQL
SQL for Smarties
SQL Programming Style
SQL Puzzles and Answers
Thinking in Sets
Trees and Hierarchies in SQL
Post #270441
Posted Monday, April 3, 2006 3:35 AM


SSCommitted

SSCommittedSSCommittedSSCommittedSSCommittedSSCommittedSSCommittedSSCommittedSSCommitted

Group: General Forum Members
Last Login: Today @ 7:09 AM
Points: 1,945, Visits: 3,009
Put in a zero as a sentinel value and the code is much easier:

CREATE TABLE Numbers (seq INTEGER NOT NULL PRIMARY KEY);

INSERT INTO Numbers VALUES (0); --sentinel
INSERT INTO Numbers VALUES (2);
INSERT INTO Numbers VALUES (3);
INSERT INTO Numbers VALUES (5);
INSERT INTO Numbers VALUES (7);
INSERT INTO Numbers VALUES (8);
INSERT INTO Numbers VALUES (14);
INSERT INTO Numbers VALUES (20);

SELECT Num1.seq+1 AS gap_start, Num2.seq-1 AS gap_end
FROM Numbers AS Num1, Numbers AS Num2
WHERE Num1.seq +1 < Num2.seq
AND (SELECT SUM(seq)
FROM Numbers AS Num3
WHERE Num3.seq BETWEEN Num1.seq AND Num2.seq)
= (Num1.seq + Num2.seq);


Books in Celko Series for Morgan-Kaufmann Publishing
Analytics and OLAP in SQL
Data and Databases: Concepts in Practice
Data, Measurements and Standards in SQL
SQL for Smarties
SQL Programming Style
SQL Puzzles and Answers
Thinking in Sets
Trees and Hierarchies in SQL
Post #270444
Posted Monday, April 3, 2006 8:32 AM
SSC Journeyman

SSC JourneymanSSC JourneymanSSC JourneymanSSC JourneymanSSC JourneymanSSC JourneymanSSC JourneymanSSC Journeyman

Group: General Forum Members
Last Login: Thursday, May 15, 2014 6:04 AM
Points: 91, Visits: 108

If you just need to identify IF a list of numbers is in sequence, but not where the gaps are, I believe you can do it in a single pass of the table.  (Warning -- to understand this algorithm, some high school math is required!)

-- Mathematical way of identifying if a list of numbers is sequential

-- Based on formula:

-- M + (M+1) + (M+2) + ... + (M+n-1) = n*M + n*(n-1)/2

-- Calling the left side S (for Sum), and rearranging yields:

-- n2 + (2*M - 1)*n - 2*S = 0

-- Then the quadratic formula produces:

-- n = ( (1-2*M) +- sqrt( (2*M -1)2 + 8*S) )/2

-- I believe it can be shown that a list of numbers is a sequence if

-- and only if the value in the sqrt() above is a perfect square.

-- The code below is based on that assumption

--

-- Mike Arney, 4/3/2006

set nocount on

create table Numbers (id int not null)

insert Numbers values (6)

insert Numbers values (7)

insert Numbers values (8)

insert Numbers values (9)

insert Numbers values (10)

insert Numbers values (11)

insert Numbers values (12)

-- insert Numbers values (14) -- uncomment for testing

declare @Sum bigint, @Count bigint, @Min bigint

select @Sum = sum(convert(bigint, id)), @Count = count(*), @Min = min(id) from Numbers

declare @sqrt float

select @sqrt = sqrt( power((2*@Min - 1), 2) + 8*@Sum)

if @sqrt = floor(@sqrt) select 'Sequence' else select 'Not Sequence'

go

drop table Numbers

go

 

Post #270556
Posted Monday, April 3, 2006 9:07 AM
Grasshopper

GrasshopperGrasshopperGrasshopperGrasshopperGrasshopperGrasshopperGrasshopperGrasshopper

Group: General Forum Members
Last Login: Friday, February 7, 2014 12:07 PM
Points: 15, Visits: 50

A self join solution:

select a.SeqNumber, max(b.SeqNumber)
from #SequenceTable a join #SequenceTable b on a.SeqNumber > b.SeqNumber
group by a.SeqNumber
having a.SeqNumber - max(b.SeqNumber) > 1

Post #270570
Posted Monday, April 3, 2006 11:19 AM
Forum Newbie

Forum NewbieForum NewbieForum NewbieForum NewbieForum NewbieForum NewbieForum NewbieForum Newbie

Group: General Forum Members
Last Login: Wednesday, November 28, 2007 5:38 AM
Points: 1, Visits: 7

This way is a few hundred times faster (on large datasets)...NOTE:the details are missing but it still identifies the key holes in the sequence, except beiging and end which are obvious...?

SELECT s1.SeqNumber
FROM #SequenceTable s1
LEFT JOIN #SequenceTable s2
ON s1.SeqNumber = s2.SeqNumber -1
WHERE s2.SeqNumber IS NULL

 

 

 

Post #270622
Posted Monday, April 3, 2006 2:34 PM
SSCrazy

SSCrazySSCrazySSCrazySSCrazySSCrazySSCrazySSCrazySSCrazy

Group: General Forum Members
Last Login: Thursday, September 11, 2014 8:07 AM
Points: 2,844, Visits: 1,154

I agree, the join on "where Seq2.SeqNumber < Seq1.SeqNumber" will be a killer with large tables.

This version uses a simple a = b-1 join to find the gaps, then a subquery to find the last value before the gap.  On 1 million rows it ran in 1/4 the elapsed time and 1/13 the CPU time of the original in the article.

select LastSeqNumber, NextSeqNumber
    , FirstAvailable = LastSeqNumber + 1
    , LastAvailable = NextSeqNumber - 1
    , NumbersAvailable = NextSeqNumber - (LastSeqNumber + 1)
from (
    select (SELECT TOP 1 SeqNumber FROM #SequenceTable WHERE SeqNumber < a.SeqNumber ORDER BY SeqNumber DESC) as LastSeqNumber,
    a.SeqNumber as NextSeqNumber
    from #SequenceTable a
    left join #SequenceTable b on a.SeqNumber = b.SeqNumber + 1
    where b.SeqNumber IS NULL
)
a
order by LastSeqNumber

 




Post #270685
Posted Wednesday, April 5, 2006 8:52 AM
Ten Centuries

Ten CenturiesTen CenturiesTen CenturiesTen CenturiesTen CenturiesTen CenturiesTen CenturiesTen Centuries

Group: General Forum Members
Last Login: Thursday, September 5, 2013 7:04 AM
Points: 1,197, Visits: 294
Nice solution!
I would like to contribute too. If we only want to find IF there is a gap in the sequence we only need to verify that COUNT(*) + MIN(ID) = MAX(ID), right?
In that case, if we have an index on the ID columns we only need a quick scan on it with very little additional CPU+Memory usage with:

create table Numbers (id int not null)
insert Numbers values (6)
insert Numbers values (7)
insert Numbers values (8)

-- insert Numbers values (14) -- uncomment for testing
-- DELETE Numbers WHERE id = 14
SELECT CASE WHEN COUNT(*) + MIN(ID) - 1 = MAX(ID) THEN 'Sequence' ELSE 'Not Sequence' END FROM NUMBERS


go

drop table Numbers



Post #271289
Posted Thursday, April 27, 2006 3:51 PM
SSC Rookie

SSC RookieSSC RookieSSC RookieSSC RookieSSC RookieSSC RookieSSC RookieSSC Rookie

Group: General Forum Members
Last Login: Monday, August 1, 2011 2:05 PM
Points: 32, Visits: 47

Thanks all for the great examples of other ways to get the same results.  It is interesting to see these different methods and the understandings of how SQL works under the hood so to speak, with different timings on the various methods.  For me timing was not a big issue, and my 31 second run over a table of 3.5 million rows seemed fair.  Timing becomes more significant if a process is being repetatively and frequently run. 

 

Over my dataset, Scott Coleman’s method ran in 25 seconds, being significantly slower than his own reported difference, indicating other factors come into play.

 

Brendan Smith’s method ran in 10 seconds, and although I accept this is faster, it does not do the computes for the additional rows in my output set, so is not a reliable comparison, but looks promising.

 

Mike Gress’s example did not complete, I bombed it off at over 14 minutes with no results (sorry Mike).

 

I didn’t try Hanslindgren’s method as it is not my database, so I was unable to add indexes to it. 

 

To Mike Arney, my high school maths was never that good, but I am sure there will be some that can work this one out from your example.

 

Joe Celko’s second method also did not complete, and I bombed this off at over 14 minutes with no results, and Joe’s first method took less than a second and produced no results (sorry Joe).

 

Looking deeper at my source dataset, the timing differences I get to your own results may be due to the extreme number of gaps.  My data set produced 1.1 million rows (of identified gaps).  Also within my data set are sequence number duplications, which may have contributed to the failure of some of the scripts.

 

Thanks you all for your feedback and suggestions.

 

Stephen




Post #276056
Posted Friday, April 28, 2006 4:02 AM
Ten Centuries

Ten CenturiesTen CenturiesTen CenturiesTen CenturiesTen CenturiesTen CenturiesTen CenturiesTen Centuries

Group: General Forum Members
Last Login: Thursday, September 5, 2013 7:04 AM
Points: 1,197, Visits: 294

Hi Stephen!

Thank you, it is not always you get such an exhausting feedback of how things turn out when you post replies and hope you can help.

Hanslindgren




Post #276127
« Prev Topic | Next Topic »

Add to briefcase 12»»

Permissions Expand / Collapse