Find gaps in string values within a table

  • I have a table with ~500K rows in it. Here's a sample of the data

    BegVal EndVal Volume Prefix Padding begnum endnum
    ABC000063738 ABC000063738 VOL001 ABC 9 000063738 000063738
    ABC000063739 ABC000063740 VOL001 ABC 9 000063739 000063740
    ABC000063744 ABC000063746 VOL001 ABC 9 000063744 000063746
    DEF00000009 DEF00000011 VOL002 DEF 8 00000009 00000011
    DEF00000012 DEF00000014 VOL002 DEF 8 00000012 00000014
    DEF00000018 DEF00000026 VOL002 DEF 8 00000018 00000026

    Basically, I want to go through this table, and identify any gaps in the ranges. Ideally, the output would be something like:

    BegGap EndGap

    ABC000063741 ABC000063743

    DEF00000015 DEF00000017

    etc..

    Tried using LEAD and LAG but it didn't seem to produce the results I was looking for, and I felt like I was just spinning my wheels. Thoughts?

    • This topic was modified 4 days, 5 hours ago by  zackattack05.
  • Would you give us actual table data -- i.e. CREATE TABLE and INSERT statement with VALUES -- rather than just a picture?  We can't write SQL against a picture :-).

    This is a variant of Itzik Ben-Gan's famous "Gaps & Islands" queries, so it shouldn't take long to come up with a solution.

    SQL DBA,SQL Server MVP(07, 08, 09) Prosecutor James Blackburn, in closing argument in the Fatal Vision murders trial: "If in the future, you should cry a tear, cry one for them [the murder victims]. If in the future, you should say a prayer, say one for them. And if in the future, you should light a candle, light one for them."

  • I think I would approach this by using endnum+1 = LEAD(begnum,1).  This will help you determine if it is a gap or not; lets call this bitGAP.  You can use this to get a bit calculated field to see what is a GAP and what is not.  That would be my first step.

    Once you have your above data with 0's and 1's at the end, then your BegGap value is endnum+1 and EndGap is LEAD((begnum-1),1) with a WHERE clause on bitGAP = 1.  Will need to do some CASTING or CONVERTING to get those endnum and begnum into INTs

    The above is completely untested though as I have no data to go off of.  Will need to do some grouping based on the padding value (likely?) to ensure you don't have problems with gaps popping up with different padding values and you will need to order your data by begnum (likely?).

    If you could post some consumable code for getting the sample data, people on the forum are going to be more likely to help provide scripts to get your answer.

  • Ahh. Sorry about that. Here ya go:

    CREATE TABLE temp_docs (
    begval VARCHAR(100)
    ,endval VARCHAR(100)
    ,volume VARCHAR(20)
    ,prefix varchar (50)
    ,padding INT
    ,begnum VARCHAR(50)
    ,endnum VARCHAR(50)
    )
    GO


    insert into temp_docs (BegVal, EndVal, Volume, Prefix, Padding, begnum, endnum)
    values( 'ABC000063738','ABC000063738', 'VOL001', 'ABC', 9,'000063738','000063738')
    GO

    insert into temp_docs (BegVal, EndVal, Volume, Prefix, Padding, begnum, endnum)
    values( 'ABC000063739','ABC000063740', 'VOL001', 'ABC', 9,'000063739','000063740')
    GO



    insert into temp_docs (BegVal, EndVal, Volume, Prefix, Padding, begnum, endnum)
    values( 'ABC000063744','ABC000063746', 'VOL001', 'ABC', 9,'000063744','000063746')
    GO


    insert into temp_docs (BegVal, EndVal, Volume, Prefix, Padding, begnum, endnum)
    values( 'DEF00000009','DEF00000011', 'VOL002', 'DEF', 8,'00000009','00000011')
    GO


    insert into temp_docs (BegVal, EndVal, Volume, Prefix, Padding, begnum, endnum)
    values( 'DEF00000012','DEF00000014', 'VOL002', 'DEF', 8,'00000012','00000014')
    GO

    insert into temp_docs (BegVal, EndVal, Volume, Prefix, Padding, begnum, endnum)
    values( 'DEF00000018','DEF00000026', 'VOL002', 'DEF', 8,'00000018','00000026')
    GO
  • As pointed out - for these types of questions it is preferable that you post sample data.  I have put something together - assuming you really do have a BegNum and EndNum columns:

    Declare @testData Table (BegVal varchar(15), EndVal varchar(15), Volume char(6), Prefix char(3), Padding int, BegNum varchar(10), EndNum varchar(10));
    Insert Into @testData (BegVal, EndVal, Volume, Prefix, Padding, BegNum, EndNum)
    Values ('ABC000063738', 'ABC000063738', 'VOL001', 'ABC', '9', '000063738', '000063738')
    , ('ABC000063739', 'ABC000063740', 'VOL001', 'ABC', '9', '000063739', '000063740')
    , ('ABC000063744', 'ABC000063746', 'VOL001', 'ABC', '9', '000063744', '000063746')
    , ('DEF00000009' , 'DEF00000011' , 'VOL002', 'DEF', '8', '00000009' , '00000011')
    , ('DEF00000012' , 'DEF00000014' , 'VOL002', 'DEF', '8', '00000012' , '00000014')
    , ('DEF00000018' , 'DEF00000026' , 'VOL002', 'DEF', '8', '00000018' , '00000026');

    With gapData
    As (
    Select *
    , Gap = iif(lead(BegNum - 1, 1, EndNum) over(Partition By Volume Order By EndNum) <> EndNum, 1, 0)
    , NextBegNum = EndNum + 1
    , NextEndNum = lead(BegNum - 1, 1) over(Partition By Volume Order By EndNum)
    From @testData
    )
    Select BegGap = concat(Prefix, right(concat(replicate('0', Padding), NextBegNum), Padding))
    , EndGap = concat(Prefix, right(concat(replicate('0', Padding), NextEndNum), Padding))
    From gapData
    Where Gap = 1;

    The CTE calculates the rows where a gap exists - and generates the next beginning number and next ending number for every row.  To build your gap values - we can use the Prefix, Padding and NextBegNum and NextEndNum.

     

    Jeffrey Williams
    Problems are opportunities brilliantly disguised as insurmountable obstacles.

    How to post questions to get better answers faster
    Managing Transaction Logs

  • Similar to Jeffrey's answer

    with lead_cte as (
    select *, lead(begnum) over (partition by Prefix order by begnum) lead_bn
    from #temp_docs)
    select concat(lc.Prefix, right(concat(replicate('0', lc.Padding), v.int_en+1), lc.Padding)) calcBegGap,
    concat(lc.Prefix, right(concat(replicate('0', lc.Padding), v.int_bn-1), lc.Padding)) calcEndGap
    from lead_cte lc
    cross apply (values (cast(lc.endnum as int),
    cast(lc.lead_bn as int))) v(int_en, int_bn)
    where (v.int_bn-v.int_en)>1;

    Aus dem Paradies, das Cantor uns geschaffen, soll uns niemand vertreiben können

  • This can be simplified to:

       With gapData
    As (
    Select *
    , NextBegNum = EndNum + 1
    , NextEndNum = lead(BegNum - 1, 1) over(Partition By Volume Order By EndNum)
    From @testData
    )
    Select BegGap = concat(Prefix, right(concat(replicate('0', Padding), NextBegNum), Padding))
    , EndGap = concat(Prefix, right(concat(replicate('0', Padding), NextEndNum), Padding))
    From gapData
    Where EndNum <> NextEndNum;

    This removes the extra LEAD to determine the gaps.  There is no need to convert to integer - SQL will implicitly convert EndNum to an integer for the comparison to NextEndNum which is implicitly converted by adding 1.

    Jeffrey Williams
    Problems are opportunities brilliantly disguised as insurmountable obstacles.

    How to post questions to get better answers faster
    Managing Transaction Logs

  • All,

    Thanks a bunch for the help. Looks like everyone's replies got me what I was looking for. I did have one follow up, and i'll edit the original post if necessary. But in addition to the gaps, would there be a way to combine and provide an output of the consecutive ranges, with row count. For example, in the sample data, for the first consecutive range, i'd like to output

    BegRange   EndRange  Row Count

    ABC000063738  ABC000063740  2

    DEF00000009  DEF00000014   2

    etc. Note, any singular ranges without a consecutive one following it, would just be output as itself. So the third row would be output as

    ABC000063744 ABC000063746   1

    if that makes sense

     

  • It doesn't really make sense - there is no third row from your sample data and there could never be a gap on the first or last row in a sequence.  This code is basically creating the missing rows and a row number would naturally increase for each row in the set.

    What is the final result you really need?

    Jeffrey Williams
    Problems are opportunities brilliantly disguised as insurmountable obstacles.

    How to post questions to get better answers faster
    Managing Transaction Logs

  • Maybe something like this.  I added some additional example rows.  It's similar to this other post imo.

    insert #temp_docs(BegVal, EndVal, Volume, Prefix, Padding, begnum, endnum) values
    ('ABC000063738', 'ABC000063738', 'VOL001', 'ABC', 9, '000063738', '000063738'),
    ('ABC000063739', 'ABC000063740', 'VOL001', 'ABC', 9, '000063739', '000063740'),
    ('ABC000063741', 'ABC000063742', 'VOL001', 'ABC', 9, '000063741', '000063742'),
    ('ABC000063744', 'ABC000063746', 'VOL001', 'ABC', 9, '000063744', '000063746'),
    ('ABC000063754', 'ABC000063766', 'VOL001', 'ABC', 9, '000063754', '000063766'),
    ('ABC000063770', 'ABC000063771', 'VOL001', 'ABC', 9, '000063770', '000063771'),
    ('ABC000063772', 'ABC000063776', 'VOL001', 'ABC', 9, '000063772', '000063776'),
    ('DEF00000009', 'DEF00000011', 'VOL002', 'DEF', 8, '00000009', '00000011'),
    ('DEF00000012', 'DEF00000014', 'VOL002', 'DEF', 8, '00000012', '00000014'),
    ('DEF00000018', 'DEF00000026', 'VOL002', 'DEF', 8, '00000018', '00000026');

    with
    grp_cte as (
    select *,
    case when (row_number() over (partition by Prefix order by begnum))=1 then 1
    when cast((lag(endnum) over (partition by Prefix order by begnum)) as int)=cast(begnum as int)-1 then 0
    else 1 end grp
    from #temp_docs),
    ranges_cte as (
    select *, sum(grp) over (partition by Prefix order by begnum) sum_grp
    from grp_cte)
    select Prefix, min(BegVal) BegVal, max(EndVal) EndVal, count(*) grp_count
    from ranges_cte
    group by Prefix, sum_grp
    order by Prefix, BegVal;
    Prefix	BegVal		EndVal		grp_count
    ABC ABC000063738 ABC000063742 3
    ABC ABC000063744 ABC000063746 1
    ABC ABC000063754 ABC000063766 1
    ABC ABC000063770 ABC000063776 2
    DEF DEF00000009 DEF00000014 2
    DEF DEF00000018 DEF00000026 1

    Aus dem Paradies, das Cantor uns geschaffen, soll uns niemand vertreiben können

  • Thanks Steve and everyone for the replies. These queries were extremely helpful!

Viewing 11 posts - 1 through 11 (of 11 total)

You must be logged in to reply to this topic. Login to reply