Recent PostsRecent Posts Popular TopicsPopular Topics
 Home Search Members Calendar Who's On

 Finding Gaps in a Sequential Number Sequence Rate Topic Display Mode Topic Options
Author
 Message
 Posted Monday, March 20, 2006 1:58 PM
 SSC Rookie Group: General Forum Members Last Login: Thursday, February 26, 2015 12:25 PM Points: 32, Visits: 48
 Comments posted to this topic are about the content posted at http://www.sqlservercentral.com/columnists/slasham/findinggapsinasequentialnumbersequence.asp
Post #267061
 Posted Monday, April 3, 2006 8:32 AM
 SSC Journeyman Group: General Forum Members Last Login: Monday, April 27, 2015 10:25 AM Points: 91, Visits: 109
 If you just need to identify IF a list of numbers is in sequence, but not where the gaps are, I believe you can do it in a single pass of the table.  (Warning -- to understand this algorithm, some high school math is required!)-- Mathematical way of identifying if a list of numbers is sequential-- Based on formula:-- M + (M+1) + (M+2) + ... + (M+n-1) = n*M + n*(n-1)/2-- Calling the left side S (for Sum), and rearranging yields:-- n2 + (2*M - 1)*n - 2*S = 0-- Then the quadratic formula produces:-- n = ( (1-2*M) +- sqrt( (2*M -1)2 + 8*S) )/2-- I believe it can be shown that a list of numbers is a sequence if-- and only if the value in the sqrt() above is a perfect square.-- The code below is based on that assumption---- Mike Arney, 4/3/2006set nocount oncreate table Numbers (id int not null)insert Numbers values (6)insert Numbers values (7)insert Numbers values (8)insert Numbers values (9)insert Numbers values (10)insert Numbers values (11)insert Numbers values (12)-- insert Numbers values (14) -- uncomment for testingdeclare @Sum bigint, @Count bigint, @Min bigintselect @Sum = sum(convert(bigint, id)), @Count = count(*), @Min = min(id) from Numbersdeclare @sqrt floatselect @sqrt = sqrt( power((2*@Min - 1), 2) + 8*@Sum)if @sqrt = floor(@sqrt) select 'Sequence' else select 'Not Sequence'godrop table Numbersgo
Post #270556
 Posted Monday, April 3, 2006 9:07 AM
 Grasshopper Group: General Forum Members Last Login: Monday, September 14, 2015 5:44 PM Points: 15, Visits: 52
 A self join solution:select a.SeqNumber, max(b.SeqNumber)from #SequenceTable a join #SequenceTable b on a.SeqNumber > b.SeqNumbergroup by a.SeqNumberhaving a.SeqNumber - max(b.SeqNumber) > 1
Post #270570
 Posted Monday, April 3, 2006 11:19 AM
 Forum Newbie Group: General Forum Members Last Login: Wednesday, November 28, 2007 5:38 AM Points: 1, Visits: 7
 This way is a few hundred times faster (on large datasets)...NOTE:the details are missing but it still identifies the key holes in the sequence, except beiging and end which are obvious...?SELECT s1.SeqNumberFROM #SequenceTable s1LEFT JOIN #SequenceTable s2ON s1.SeqNumber = s2.SeqNumber -1WHERE s2.SeqNumber IS NULL
Post #270622
 Posted Monday, April 3, 2006 2:34 PM
 SSCrazy Group: General Forum Members Last Login: Tuesday, November 29, 2016 11:30 AM Points: 2,936, Visits: 1,410
 I agree, the join on "where Seq2.SeqNumber < Seq1.SeqNumber" will be a killer with large tables.This version uses a simple a = b-1 join to find the gaps, then a subquery to find the last value before the gap.  On 1 million rows it ran in 1/4 the elapsed time and 1/13 the CPU time of the original in the article.select LastSeqNumber, NextSeqNumber    , FirstAvailable = LastSeqNumber + 1    , LastAvailable = NextSeqNumber - 1    , NumbersAvailable = NextSeqNumber - (LastSeqNumber + 1) from (    select (SELECT TOP 1 SeqNumber FROM #SequenceTable WHERE SeqNumber < a.SeqNumber ORDER BY SeqNumber DESC) as LastSeqNumber,    a.SeqNumber as NextSeqNumber    from #SequenceTable a    left join #SequenceTable b on a.SeqNumber = b.SeqNumber + 1    where b.SeqNumber IS NULL) aorder by LastSeqNumber
Post #270685
 Posted Wednesday, April 5, 2006 8:52 AM
 Ten Centuries Group: General Forum Members Last Login: Tuesday, March 29, 2016 12:53 AM Points: 1,204, Visits: 365
 Nice solution!I would like to contribute too. If we only want to find IF there is a gap in the sequence we only need to verify that COUNT(*) + MIN(ID) = MAX(ID), right?In that case, if we have an index on the ID columns we only need a quick scan on it with very little additional CPU+Memory usage with:create table Numbers (id int not null)insert Numbers values (6)insert Numbers values (7)insert Numbers values (8)-- insert Numbers values (14) -- uncomment for testing-- DELETE Numbers WHERE id = 14SELECT CASE WHEN COUNT(*) + MIN(ID) - 1 = MAX(ID) THEN 'Sequence' ELSE 'Not Sequence' END FROM NUMBERSgodrop table Numbers
Post #271289
 Posted Thursday, April 27, 2006 3:51 PM
 SSC Rookie Group: General Forum Members Last Login: Thursday, February 26, 2015 12:25 PM Points: 32, Visits: 48
Post #276056
 Posted Friday, April 28, 2006 4:02 AM
 Ten Centuries Group: General Forum Members Last Login: Tuesday, March 29, 2016 12:53 AM Points: 1,204, Visits: 365
 Hi Stephen!Thank you, it is not always you get such an exhausting feedback of how things turn out when you post replies and hope you can help.Hanslindgren
Post #276127
 Posted Tuesday, April 3, 2007 2:59 AM
 Ten Centuries Group: General Forum Members Last Login: Thursday, September 25, 2014 12:38 PM Points: 1,385, Visits: 1,249
 Hi,I'm a latecomer to the party and didn't realize there had already been healthy debate, but here's what I came up with.Looking through the previous submissions above, it's the same concept as Brendan's, but extended to support the full functionality of the original post.For the original example (working on an unoptimized temp table) the time taken seems to be approx 0.25 secs/1000 records, so with 10000 records I'm getting something like 3 seconds run time. With the code below there seems to be no increase in time taken at all (around 0.25 secs for anything from 0 to 10000 records), but this is probably because for small datasets the main delay below is not due to the work itself, but rather the creation of the temp table. I'd be interested to see if anyone can make something faster! (ignoring the fact that I could probably have made it a table variable...)Thanks,Tao CREATE TABLE #StartsAndEnds (EventID Int, EventType NVarChar(20), IdentityTracker INT IDENTITY (1,1))INSERT INTO #StartsAndEnds (EventID, EventType)SELECT SeqNumber, TypeFROM ( SELECT Seq1.SeqNumber, 'Start' AS Type FROM #SequenceTable Seq1 LEFT JOIN #SequenceTable Seq2 ON Seq2.SeqNumber = Seq1.SeqNumber + 1 WHERE Seq2.SeqNumber Is Null  UNION ALL  SELECT Seq1.SeqNumber, 'End' AS Type FROM #SequenceTable Seq1 LEFT JOIN #SequenceTable Seq2 ON Seq2.SeqNumber = Seq1.SeqNumber - 1 WHERE Seq2.SeqNumber Is Null) AS PingPongORDER BY SeqNumberSELECT IsNull(FL1.EventID, 0) AS StartGap,  FL2.EventID As EndGap,  IsNull(FL1.EventID, 0) + 1 AS FirstAvailable,  FL2.EventID - 1 AS LastAvailable,  FL2.EventID - IsNull(FL1.EventID, 0) - 1 As AvailableCountFROM #StartsAndEnds FL2 LEFT JOIN #StartsAndEnds FL1 ON FL2.IdentityTracker = FL1.IdentityTracker + 1WHERE FL2.IdentityTracker % 2 = 1DROP TABLE #StartsAndEnds http://poorsql.com for T-SQL formatting: free as in speech, free as in beer, free to run in SSMS or on your version control server - free however you want it.
Post #355549
 Posted Tuesday, April 3, 2007 2:11 PM
 SSC Rookie Group: General Forum Members Last Login: Thursday, February 26, 2015 12:25 PM Points: 32, Visits: 48

Hi Tao, did you post the correct version of your code?

I copied your code above and did a find/replace on #SequenceTable and SeqNumber swapping these for my table name and sequence field respectively, then ran it.  It was certainly quick, very impressive, 16 seconds to produce 142874 identified breaks in the sequence from a table containing 3,827,625 rows.

First few rows of output are as below

 StartGap EndGap FirstAvailable LastAvailable AvailableCount 0 NULL 1 NULL NULL 0 NULL 1 NULL NULL 0 NULL 1 NULL NULL 0 NULL 1 NULL NULL 0 NULL 1 NULL NULL 0 NULL 1 NULL NULL 58224 62107 58225 62106 3882 78025 80086 78026 80085 2060

My original query over the same table takes 44 seconds and produces the following first few rows.

 StartGap EndGap FirstAvailable LastAvailable AvailableCount 0 2 1 1 1 20624 20626 20625 20625 1 52091 52093 52092 52092 1 55518 55520 55519 55519 1 58222 58224 58223 58223 1 62107 78025 62108 78024 15917 80086 80088 80087 80087 1 80160 80162 80161 80161 1

I've checked the results and my query is showing the correct gaps.  Your result line showing 58224 to 62107 represents my end gap result line 5 above to my start gap result line 6 above.

Love to see your revision as certainly looks like it will be very fast.

Stephen

Post #355744

 Permissions