

SSC Rookie
Group: General Forum Members
Last Login: Thursday, February 26, 2015 12:25 PM
Points: 32,
Visits: 48





SSC Journeyman
Group: General Forum Members
Last Login: Monday, April 27, 2015 10:25 AM
Points: 91,
Visits: 109


If you just need to identify IF a list of numbers is in sequence, but not where the gaps are, I believe you can do it in a single pass of the table. (Warning  to understand this algorithm, some high school math is required!)  Mathematical way of identifying if a list of numbers is sequential  Based on formula:  M + (M+1) + (M+2) + ... + (M+n1) = n*M + n*(n1)/2  Calling the left side S (for Sum), and rearranging yields:  n^{2} + (2*M  1)*n  2*S = 0  Then the quadratic formula produces:  n = ( (12*M) + sqrt( (2*M 1)^{2} + 8*S) )/2  I believe it can be shown that a list of numbers is a sequence if  and only if the value in the sqrt() above is a perfect square.  The code below is based on that assumption   Mike Arney, 4/3/2006 set nocount on create table Numbers (id int not null) insert Numbers values (6) insert Numbers values (7) insert Numbers values (8) insert Numbers values (9) insert Numbers values (10) insert Numbers values (11) insert Numbers values (12)  insert Numbers values (14)  uncomment for testing declare @Sum bigint, @Count bigint, @Min bigint select @Sum = sum(convert(bigint, id)), @Count = count(*), @Min = min(id) from Numbers declare @sqrt float select @sqrt = sqrt( power((2*@Min  1), 2) + 8*@Sum) if @sqrt = floor(@sqrt) select 'Sequence' else select 'Not Sequence' go drop table Numbers go




Grasshopper
Group: General Forum Members
Last Login: Monday, September 14, 2015 5:44 PM
Points: 15,
Visits: 52


A self join solution: select a.SeqNumber, max(b.SeqNumber) from #SequenceTable a join #SequenceTable b on a.SeqNumber > b.SeqNumber group by a.SeqNumber having a.SeqNumber  max(b.SeqNumber) > 1




Forum Newbie
Group: General Forum Members
Last Login: Wednesday, November 28, 2007 5:38 AM
Points: 1,
Visits: 7


This way is a few hundred times faster (on large datasets)...NOTE:the details are missing but it still identifies the key holes in the sequence, except beiging and end which are obvious...? SELECT s1.SeqNumber FROM #SequenceTable s1 LEFT JOIN #SequenceTable s2 ON s1.SeqNumber = s2.SeqNumber 1 WHERE s2.SeqNumber IS NULL




SSCrazy
Group: General Forum Members
Last Login: Today @ 12:18 PM
Points: 2,910,
Visits: 1,337


I agree, the join on "where Seq2.SeqNumber < Seq1.SeqNumber" will be a killer with large tables. This version uses a simple a = b1 join to find the gaps, then a subquery to find the last value before the gap. On 1 million rows it ran in 1/4 the elapsed time and 1/13 the CPU time of the original in the article. select LastSeqNumber, NextSeqNumber , FirstAvailable = LastSeqNumber + 1 , LastAvailable = NextSeqNumber  1 , NumbersAvailable = NextSeqNumber  (LastSeqNumber + 1) from ( select (SELECT TOP 1 SeqNumber FROM #SequenceTable WHERE SeqNumber < a.SeqNumber ORDER BY SeqNumber DESC) as LastSeqNumber, a.SeqNumber as NextSeqNumber from #SequenceTable a left join #SequenceTable b on a.SeqNumber = b.SeqNumber + 1 where b.SeqNumber IS NULL ) a order by LastSeqNumber




Ten Centuries
Group: General Forum Members
Last Login: Yesterday @ 6:55 AM
Points: 1,203,
Visits: 355


Nice solution! I would like to contribute too. If we only want to find IF there is a gap in the sequence we only need to verify that COUNT(*) + MIN(ID) = MAX(ID), right? In that case, if we have an index on the ID columns we only need a quick scan on it with very little additional CPU+Memory usage with:
create table Numbers (id int not null) insert Numbers values (6) insert Numbers values (7) insert Numbers values (8)
 insert Numbers values (14)  uncomment for testing  DELETE Numbers WHERE id = 14 SELECT CASE WHEN COUNT(*) + MIN(ID)  1 = MAX(ID) THEN 'Sequence' ELSE 'Not Sequence' END FROM NUMBERS
go
drop table Numbers




SSC Rookie
Group: General Forum Members
Last Login: Thursday, February 26, 2015 12:25 PM
Points: 32,
Visits: 48


Thanks all for the great examples of other ways to get the same results. It is interesting to see these different methods and the understandings of how SQL works under the hood so to speak, with different timings on the various methods. For me timing was not a big issue, and my 31 second run over a table of 3.5 million rows seemed fair. Timing becomes more significant if a process is being repetatively and frequently run. Over my dataset, Scott Coleman’s method ran in 25 seconds, being significantly slower than his own reported difference, indicating other factors come into play. Brendan Smith’s method ran in 10 seconds, and although I accept this is faster, it does not do the computes for the additional rows in my output set, so is not a reliable comparison, but looks promising. Mike Gress’s example did not complete, I bombed it off at over 14 minutes with no results (sorry Mike). I didn’t try Hanslindgren’s method as it is not my database, so I was unable to add indexes to it. To Mike Arney, my high school maths was never that good, but I am sure there will be some that can work this one out from your example. Joe Celko’s second method also did not complete, and I bombed this off at over 14 minutes with no results, and Joe’s first method took less than a second and produced no results (sorry Joe). Looking deeper at my source dataset, the timing differences I get to your own results may be due to the extreme number of gaps. My data set produced 1.1 million rows (of identified gaps). Also within my data set are sequence number duplications, which may have contributed to the failure of some of the scripts. Thanks you all for your feedback and suggestions. Stephen




Ten Centuries
Group: General Forum Members
Last Login: Yesterday @ 6:55 AM
Points: 1,203,
Visits: 355


Hi Stephen! Thank you, it is not always you get such an exhausting feedback of how things turn out when you post replies and hope you can help. Hanslindgren




Ten Centuries
Group: General Forum Members
Last Login: Thursday, September 25, 2014 12:38 PM
Points: 1,385,
Visits: 1,249


Hi, I'm a latecomer to the party and didn't realize there had already been healthy debate, but here's what I came up with. Looking through the previous submissions above, it's the same concept as Brendan's, but extended to support the full functionality of the original post. For the original example (working on an unoptimized temp table) the time taken seems to be approx 0.25 secs/1000 records, so with 10000 records I'm getting something like 3 seconds run time. With the code below there seems to be no increase in time taken at all (around 0.25 secs for anything from 0 to 10000 records), but this is probably because for small datasets the main delay below is not due to the work itself, but rather the creation of the temp table. I'd be interested to see if anyone can make something faster! (ignoring the fact that I could probably have made it a table variable...) Thanks, Tao CREATE TABLE #StartsAndEnds (EventID Int, EventType NVarChar(20), IdentityTracker INT IDENTITY (1,1)) INSERT INTO #StartsAndEnds (EventID, EventType) SELECT SeqNumber, Type FROM ( SELECT Seq1.SeqNumber, 'Start' AS Type FROM #SequenceTable Seq1 LEFT JOIN #SequenceTable Seq2 ON Seq2.SeqNumber = Seq1.SeqNumber + 1 WHERE Seq2.SeqNumber Is Null UNION ALL SELECT Seq1.SeqNumber, 'End' AS Type FROM #SequenceTable Seq1 LEFT JOIN #SequenceTable Seq2 ON Seq2.SeqNumber = Seq1.SeqNumber  1 WHERE Seq2.SeqNumber Is Null ) AS PingPong ORDER BY SeqNumber SELECT IsNull(FL1.EventID, 0) AS StartGap, FL2.EventID As EndGap, IsNull(FL1.EventID, 0) + 1 AS FirstAvailable, FL2.EventID  1 AS LastAvailable, FL2.EventID  IsNull(FL1.EventID, 0)  1 As AvailableCount FROM #StartsAndEnds FL2 LEFT JOIN #StartsAndEnds FL1 ON FL2.IdentityTracker = FL1.IdentityTracker + 1 WHERE FL2.IdentityTracker % 2 = 1 DROP TABLE #StartsAndEnds
http://poorsql.com for TSQL formatting: free as in speech, free as in beer, free to run in SSMS or on your version control server  free however you want it.




SSC Rookie
Group: General Forum Members
Last Login: Thursday, February 26, 2015 12:25 PM
Points: 32,
Visits: 48


Hi Tao, did you post the correct version of your code? I copied your code above and did a find/replace on #SequenceTable and SeqNumber swapping these for my table name and sequence field respectively, then ran it. It was certainly quick, very impressive, 16 seconds to produce 142874 identified breaks in the sequence from a table containing 3,827,625 rows. First few rows of output are as below StartGap   EndGap   FirstAvailable   LastAvailable   AvailableCount  0   NULL   1   NULL   NULL  0   NULL   1   NULL   NULL  0   NULL   1   NULL   NULL  0   NULL   1   NULL   NULL  0   NULL   1   NULL   NULL  0   NULL   1   NULL   NULL  58224   62107   58225   62106   3882  78025   80086   78026   80085   2060 
My original query over the same table takes 44 seconds and produces the following first few rows. StartGap   EndGap   FirstAvailable  LastAvailable  AvailableCount  0   2   1   1   1  20624   20626   20625   20625   1  52091   52093   52092   52092   1  55518   55520   55519   55519   1  58222   58224   58223   58223   1  62107   78025   62108   78024   15917  80086   80088   80087   80087   1  80160   80162   80161   80161   1 
I've checked the results and my query is showing the correct gaps. Your result line showing 58224 to 62107 represents my end gap result line 5 above to my start gap result line 6 above. Love to see your revision as certainly looks like it will be very fast. Stephen



