Find gaps in string values within a table

  • I have a table with ~500K rows in it. Here's a sample of the data

    BegVal EndVal Volume Prefix Padding begnum endnum
    ABC000063738 ABC000063738 VOL001 ABC 9 000063738 000063738
    ABC000063739 ABC000063740 VOL001 ABC 9 000063739 000063740
    ABC000063744 ABC000063746 VOL001 ABC 9 000063744 000063746
    DEF00000009 DEF00000011 VOL002 DEF 8 00000009 00000011
    DEF00000012 DEF00000014 VOL002 DEF 8 00000012 00000014
    DEF00000018 DEF00000026 VOL002 DEF 8 00000018 00000026

    Basically, I want to go through this table, and identify any gaps in the ranges. Ideally, the output would be something like:

    BegGap EndGap

    ABC000063741 ABC000063743

    DEF00000015 DEF00000017

    etc..

    Tried using LEAD and LAG but it didn't seem to produce the results I was looking for, and I felt like I was just spinning my wheels. Thoughts?

    • This topic was modified 2 years, 9 months ago by  zackattack05.
  • Would you give us actual table data -- i.e. CREATE TABLE and INSERT statement with VALUES -- rather than just a picture?  We can't write SQL against a picture :-).

    This is a variant of Itzik Ben-Gan's famous "Gaps & Islands" queries, so it shouldn't take long to come up with a solution.

    SQL DBA,SQL Server MVP(07, 08, 09) A socialist is someone who will give you the shirt off *someone else's* back.

  • I think I would approach this by using endnum+1 = LEAD(begnum,1).  This will help you determine if it is a gap or not; lets call this bitGAP.  You can use this to get a bit calculated field to see what is a GAP and what is not.  That would be my first step.

    Once you have your above data with 0's and 1's at the end, then your BegGap value is endnum+1 and EndGap is LEAD((begnum-1),1) with a WHERE clause on bitGAP = 1.  Will need to do some CASTING or CONVERTING to get those endnum and begnum into INTs

    The above is completely untested though as I have no data to go off of.  Will need to do some grouping based on the padding value (likely?) to ensure you don't have problems with gaps popping up with different padding values and you will need to order your data by begnum (likely?).

    If you could post some consumable code for getting the sample data, people on the forum are going to be more likely to help provide scripts to get your answer.

    The above is all just my opinion on what you should do. 
    As with all advice you find on a random internet forum - you shouldn't blindly follow it.  Always test on a test server to see if there is negative side effects before making changes to live!
    I recommend you NEVER run "random code" you found online on any system you care about UNLESS you understand and can verify the code OR you don't care if the code trashes your system.

  • Ahh. Sorry about that. Here ya go:

    CREATE TABLE temp_docs (
    begval VARCHAR(100)
    ,endval VARCHAR(100)
    ,volume VARCHAR(20)
    ,prefix varchar (50)
    ,padding INT
    ,begnum VARCHAR(50)
    ,endnum VARCHAR(50)
    )
    GO


    insert into temp_docs (BegVal, EndVal, Volume, Prefix, Padding, begnum, endnum)
    values( 'ABC000063738','ABC000063738', 'VOL001', 'ABC', 9,'000063738','000063738')
    GO

    insert into temp_docs (BegVal, EndVal, Volume, Prefix, Padding, begnum, endnum)
    values( 'ABC000063739','ABC000063740', 'VOL001', 'ABC', 9,'000063739','000063740')
    GO



    insert into temp_docs (BegVal, EndVal, Volume, Prefix, Padding, begnum, endnum)
    values( 'ABC000063744','ABC000063746', 'VOL001', 'ABC', 9,'000063744','000063746')
    GO


    insert into temp_docs (BegVal, EndVal, Volume, Prefix, Padding, begnum, endnum)
    values( 'DEF00000009','DEF00000011', 'VOL002', 'DEF', 8,'00000009','00000011')
    GO


    insert into temp_docs (BegVal, EndVal, Volume, Prefix, Padding, begnum, endnum)
    values( 'DEF00000012','DEF00000014', 'VOL002', 'DEF', 8,'00000012','00000014')
    GO

    insert into temp_docs (BegVal, EndVal, Volume, Prefix, Padding, begnum, endnum)
    values( 'DEF00000018','DEF00000026', 'VOL002', 'DEF', 8,'00000018','00000026')
    GO
  • As pointed out - for these types of questions it is preferable that you post sample data.  I have put something together - assuming you really do have a BegNum and EndNum columns:

    Declare @testData Table (BegVal varchar(15), EndVal varchar(15), Volume char(6), Prefix char(3), Padding int, BegNum varchar(10), EndNum varchar(10));
    Insert Into @testData (BegVal, EndVal, Volume, Prefix, Padding, BegNum, EndNum)
    Values ('ABC000063738', 'ABC000063738', 'VOL001', 'ABC', '9', '000063738', '000063738')
    , ('ABC000063739', 'ABC000063740', 'VOL001', 'ABC', '9', '000063739', '000063740')
    , ('ABC000063744', 'ABC000063746', 'VOL001', 'ABC', '9', '000063744', '000063746')
    , ('DEF00000009' , 'DEF00000011' , 'VOL002', 'DEF', '8', '00000009' , '00000011')
    , ('DEF00000012' , 'DEF00000014' , 'VOL002', 'DEF', '8', '00000012' , '00000014')
    , ('DEF00000018' , 'DEF00000026' , 'VOL002', 'DEF', '8', '00000018' , '00000026');

    With gapData
    As (
    Select *
    , Gap = iif(lead(BegNum - 1, 1, EndNum) over(Partition By Volume Order By EndNum) <> EndNum, 1, 0)
    , NextBegNum = EndNum + 1
    , NextEndNum = lead(BegNum - 1, 1) over(Partition By Volume Order By EndNum)
    From @testData
    )
    Select BegGap = concat(Prefix, right(concat(replicate('0', Padding), NextBegNum), Padding))
    , EndGap = concat(Prefix, right(concat(replicate('0', Padding), NextEndNum), Padding))
    From gapData
    Where Gap = 1;

    The CTE calculates the rows where a gap exists - and generates the next beginning number and next ending number for every row.  To build your gap values - we can use the Prefix, Padding and NextBegNum and NextEndNum.

     

    Jeffrey Williams
    “We are all faced with a series of great opportunities brilliantly disguised as impossible situations.”

    ― Charles R. Swindoll

    How to post questions to get better answers faster
    Managing Transaction Logs

  • Similar to Jeffrey's answer

    with lead_cte as (
    select *, lead(begnum) over (partition by Prefix order by begnum) lead_bn
    from #temp_docs)
    select concat(lc.Prefix, right(concat(replicate('0', lc.Padding), v.int_en+1), lc.Padding)) calcBegGap,
    concat(lc.Prefix, right(concat(replicate('0', lc.Padding), v.int_bn-1), lc.Padding)) calcEndGap
    from lead_cte lc
    cross apply (values (cast(lc.endnum as int),
    cast(lc.lead_bn as int))) v(int_en, int_bn)
    where (v.int_bn-v.int_en)>1;

    Aus dem Paradies, das Cantor uns geschaffen, soll uns niemand vertreiben können

  • This can be simplified to:

       With gapData
    As (
    Select *
    , NextBegNum = EndNum + 1
    , NextEndNum = lead(BegNum - 1, 1) over(Partition By Volume Order By EndNum)
    From @testData
    )
    Select BegGap = concat(Prefix, right(concat(replicate('0', Padding), NextBegNum), Padding))
    , EndGap = concat(Prefix, right(concat(replicate('0', Padding), NextEndNum), Padding))
    From gapData
    Where EndNum <> NextEndNum;

    This removes the extra LEAD to determine the gaps.  There is no need to convert to integer - SQL will implicitly convert EndNum to an integer for the comparison to NextEndNum which is implicitly converted by adding 1.

    Jeffrey Williams
    “We are all faced with a series of great opportunities brilliantly disguised as impossible situations.”

    ― Charles R. Swindoll

    How to post questions to get better answers faster
    Managing Transaction Logs

  • All,

    Thanks a bunch for the help. Looks like everyone's replies got me what I was looking for. I did have one follow up, and i'll edit the original post if necessary. But in addition to the gaps, would there be a way to combine and provide an output of the consecutive ranges, with row count. For example, in the sample data, for the first consecutive range, i'd like to output

    BegRange   EndRange  Row Count

    ABC000063738  ABC000063740  2

    DEF00000009  DEF00000014   2

    etc. Note, any singular ranges without a consecutive one following it, would just be output as itself. So the third row would be output as

    ABC000063744 ABC000063746   1

    if that makes sense

     

  • It doesn't really make sense - there is no third row from your sample data and there could never be a gap on the first or last row in a sequence.  This code is basically creating the missing rows and a row number would naturally increase for each row in the set.

    What is the final result you really need?

    Jeffrey Williams
    “We are all faced with a series of great opportunities brilliantly disguised as impossible situations.”

    ― Charles R. Swindoll

    How to post questions to get better answers faster
    Managing Transaction Logs

  • Maybe something like this.  I added some additional example rows.  It's similar to this other post imo.

    insert #temp_docs(BegVal, EndVal, Volume, Prefix, Padding, begnum, endnum) values
    ('ABC000063738', 'ABC000063738', 'VOL001', 'ABC', 9, '000063738', '000063738'),
    ('ABC000063739', 'ABC000063740', 'VOL001', 'ABC', 9, '000063739', '000063740'),
    ('ABC000063741', 'ABC000063742', 'VOL001', 'ABC', 9, '000063741', '000063742'),
    ('ABC000063744', 'ABC000063746', 'VOL001', 'ABC', 9, '000063744', '000063746'),
    ('ABC000063754', 'ABC000063766', 'VOL001', 'ABC', 9, '000063754', '000063766'),
    ('ABC000063770', 'ABC000063771', 'VOL001', 'ABC', 9, '000063770', '000063771'),
    ('ABC000063772', 'ABC000063776', 'VOL001', 'ABC', 9, '000063772', '000063776'),
    ('DEF00000009', 'DEF00000011', 'VOL002', 'DEF', 8, '00000009', '00000011'),
    ('DEF00000012', 'DEF00000014', 'VOL002', 'DEF', 8, '00000012', '00000014'),
    ('DEF00000018', 'DEF00000026', 'VOL002', 'DEF', 8, '00000018', '00000026');

    with
    grp_cte as (
    select *,
    case when (row_number() over (partition by Prefix order by begnum))=1 then 1
    when cast((lag(endnum) over (partition by Prefix order by begnum)) as int)=cast(begnum as int)-1 then 0
    else 1 end grp
    from #temp_docs),
    ranges_cte as (
    select *, sum(grp) over (partition by Prefix order by begnum) sum_grp
    from grp_cte)
    select Prefix, min(BegVal) BegVal, max(EndVal) EndVal, count(*) grp_count
    from ranges_cte
    group by Prefix, sum_grp
    order by Prefix, BegVal;
    PrefixBegValEndValgrp_count
    ABCABC000063738ABC0000637423
    ABCABC000063744ABC0000637461
    ABCABC000063754ABC0000637661
    ABCABC000063770ABC0000637762
    DEFDEF00000009DEF000000142
    DEFDEF00000018DEF000000261

    Aus dem Paradies, das Cantor uns geschaffen, soll uns niemand vertreiben können

  • Thanks Steve and everyone for the replies. These queries were extremely helpful!

Viewing 11 posts - 1 through 10 (of 10 total)

You must be logged in to reply to this topic. Login to reply