Finding non-contiguous data

  • I have been given a data dump from a 3rd party system we use that allows employees to report work they've done. The data is denormalized and contains a flag that indicates whether or not the employee reported hours for that month. What I am trying to determine is who had a gap of at least one month between reporting. More specifically, I need to find employees who didn't report hours for one or more months, but that period of not reported data is "bookended" with months were data WAS reported. The whole goal of this has to do with managing the licenses for this system (which is used by several thousand people). My data dump contains approximately 28,000 rows, one row per employee with columns for months reported from October of 2009 through September of 2010. And every employee has a value of 0 or 1 for every month, there are no nulls.

    Sample Data (using a smaller range of data for example purposes):

    if OBJECT_ID('tempdb..#Employee') is not null

    drop table #Employee

    go

    create table #Employee

    (EmployeeID int,

    Work0110 bit,

    Work0210 bit,

    Work0310 bit,

    Work0410 bit,

    Work0510 bit,

    Work0610 bit)

    insert into #Employee values (1000, 1, 1, 1, 1, 1, 1)

    insert into #Employee values (1001, 0, 0, 0, 0, 0, 0)

    insert into #Employee values (1002, 0, 0, 1, 1, 1, 1)

    insert into #Employee values (1003, 0, 0, 1, 1, 0, 0)

    insert into #Employee values (1004, 1, 1, 1, 0, 1, 0)

    insert into #Employee values (1005, 1, 0, 0, 0, 1, 1)

    insert into #Employee values (1006, 0, 0, 1, 0, 1, 0)

    insert into #Employee values (1007, 1, 0, 1, 0, 1, 0)

    insert into #Employee values (1008, 1, 0, 0, 0, 0, 0)

    insert into #Employee values (1009, 0, 0, 0, 0, 0, 1)

    Desired Result:

    EmployeeID Work0110 Work0210 Work0310 Work0410 Work0510 Work0610 FLAG

    ----------- -------- -------- -------- -------- -------- -------- ----

    1000 1 1 1 1 1 1 N

    1001 0 0 0 0 0 0 N

    1002 0 0 1 1 1 1 N

    1003 0 0 1 1 0 0 N

    1004 1 1 1 0 1 0 Y

    1005 1 0 0 0 1 1 Y

    1006 0 0 1 0 1 0 Y

    1007 1 0 1 0 1 0 Y

    1008 1 0 0 0 0 0 N

    1009 0 0 0 0 0 1 N

    Strategy #1 - Unpivot the data and use ranking function to determine if a 0 is bookended by a 1. Then I realized that wouldn't work because my initial thought was to check if row 1 and row 6 were a 1 and if there was a zero anywhere in the other rows, then flag. But then I would miss employees such as 1004, 1006 and 1007.

    Strategy #2 - Using the original data dump as-is (denormalized), use multiple case statements and....and....this is where I just fell on my face. Lunacy. This will never work.

    Strategy #3 - Somehow utilize the famous Tally table. But here, I'm simply scratching my head.

    I think somewhere in here, I have my answer, but I can't seem to put it all together.

    Thank you -

    Lisa

  • Probably not the most elegant answer you will receive....

    select *,

    CASE WHEN

    CONVERT(char(1),Work0110)

    +CONVERT(char(1),Work0210)

    +CONVERT(char(1),Work0310)

    +CONVERT(char(1),Work0410)

    +CONVERT(char(1),Work0510)

    +CONVERT(char(1),Work0610) LIKE '%1%0%1%' THEN 'Y'

    ELSE 'N'

    END As [FLAG]

    from #Employee

    MM



    select geometry::STGeomFromWKB(0x0106000000020000000103000000010000000B0000001000000000000840000000000000003DD8CCCCCCCCCC0840000000000000003DD8CCCCCCCCCC08408014AE47E17AFC3F040000000000104000CDCCCCCCCCEC3F9C999999999913408014AE47E17AFC3F9C99999999991340000000000000003D0000000000001440000000000000003D000000000000144000000000000000400400000000001040000000000000F03F100000000000084000000000000000401000000000000840000000000000003D0103000000010000000B000000000000000000143D000000000000003D009E99999999B93F000000000000003D009E99999999B93F8014AE47E17AFC3F400000000000F03F00CDCCCCCCCCEC3FA06666666666FE3F8014AE47E17AFC3FA06666666666FE3F000000000000003D1800000000000040000000000000003D18000000000000400000000000000040400000000000F03F000000000000F03F000000000000143D0000000000000040000000000000143D000000000000003D, 0);

  • Forum Etiquette: How to post Reporting Services problems
  • [/url]
  • Forum Etiquette: How to post data/code on a forum to get the best help - by Jeff Moden
  • [/url]
  • How to Post Performance Problems - by Gail Shaw
  • [/url]

  • Solution, Round 1:

    if OBJECT_ID('tempdb..#Employee') is not null

    drop table #Employee

    go

    create table #Employee

    (EmployeeID int,

    Work0110 bit,

    Work0210 bit,

    Work0310 bit,

    Work0410 bit,

    Work0510 bit,

    Work0610 bit)

    insert into #Employee values (1000, 1, 1, 1, 1, 1, 1)

    insert into #Employee values (1001, 0, 0, 0, 0, 0, 0)

    insert into #Employee values (1002, 0, 0, 1, 1, 1, 1)

    insert into #Employee values (1003, 0, 0, 1, 1, 0, 0)

    insert into #Employee values (1004, 1, 1, 1, 0, 1, 0)

    insert into #Employee values (1005, 1, 0, 0, 0, 1, 1)

    insert into #Employee values (1006, 0, 0, 1, 0, 1, 0)

    insert into #Employee values (1007, 1, 0, 1, 0, 1, 0)

    insert into #Employee values (1008, 1, 0, 0, 0, 0, 0)

    insert into #Employee values (1009, 0, 0, 0, 0, 0, 1)

    ;WITH cte AS

    (SELECT

    *

    from

    #Employee

    UNPIVOT ( Worked For MonthCol IN ( Work0110, Work0210, Work0310, Work0410, Work0510, Work0610)

    ) AS drv

    )

    ,cte2 AS

    (

    SELECT

    EmployeeID,

    Worked,

    CASE MonthCol

    WHEN 'Work0110' THEN 1

    WHEN 'Work0210' THEN 2

    WHEN 'Work0310' THEN 3

    WHEN 'Work0410' THEN 4

    WHEN 'Work0510' THEN 5

    WHEN 'Work0610' THEN 6

    END AS OrderingColumn

    FROM

    cte

    )

    , cte3 AS

    (

    SELECT

    c2.EmployeeID,

    MAX( CASE WHEN drvBefore.EmployeeID IS NOT NULL AND drvAfter.EmployeeID IS NOT NULL

    THEN 1

    ELSE 0

    END) AS Flagged

    FROM

    cte2 AS c2

    LEFT JOIN

    (SELECT EmployeeID, OrderingColumn

    FROMcte2

    WHEREWorked = 1

    ) AS drvBefore

    ONc2.EmployeeID = drvBefore.EmployeeID

    AND c2.OrderingColumn > drvBefore.OrderingColumn

    LEFT JOIN

    (SELECT EmployeeID, OrderingColumn

    FROMcte2

    WHEREWorked = 1

    ) AS drvAfter

    ONc2.EmployeeID = drvAfter.EmployeeID

    AND c2.OrderingColumn < drvAfter.OrderingColumn

    WHERE

    Worked = 0

    GROUP BY

    c2.EmployeeID

    )

    SELECT e.*, c3.Flagged

    FROM

    #Employee AS e

    JOIN

    cte3 AS c3

    ONe.EmployeeID = c3.EmployeeID

    You were on the right track with the unpivot. Now, hopefully someone can come by and clean this up, the triangle joins are terrible.


    - Craig Farrell

    Never stop learning, even if it hurts. Ego bruises are practically mandatory as you learn unless you've never risked enough to make a mistake.

    For better assistance in answering your questions[/url] | Forum Netiquette
    For index/tuning help, follow these directions.[/url] |Tally Tables[/url]

    Twitter: @AnyWayDBA

  • Oh man! That's MUCH cleaner than where I was going! I was capturing the min and max rows (using row_number()) where hours were entered. Then capturing rows (using row_number()) where hours were not entered. Then I was going to see which employees had a not entered month that fell between the min and max of reported months. Etc. Very messy! Thank you so much for this simple solution!!

    Lisa

  • mister.magoo (12/6/2010)


    Probably not the most elegant answer you will receive....

    Just a head's up, that code will fail for EmployeeId 1005.

    I need more coffee, or glasses.


    - Craig Farrell

    Never stop learning, even if it hurts. Ego bruises are practically mandatory as you learn unless you've never risked enough to make a mistake.

    For better assistance in answering your questions[/url] | Forum Netiquette
    For index/tuning help, follow these directions.[/url] |Tally Tables[/url]

    Twitter: @AnyWayDBA

  • HI Craig, can you explain why? It produces the correct answer for me...

    MM



    select geometry::STGeomFromWKB(0x0106000000020000000103000000010000000B0000001000000000000840000000000000003DD8CCCCCCCCCC0840000000000000003DD8CCCCCCCCCC08408014AE47E17AFC3F040000000000104000CDCCCCCCCCEC3F9C999999999913408014AE47E17AFC3F9C99999999991340000000000000003D0000000000001440000000000000003D000000000000144000000000000000400400000000001040000000000000F03F100000000000084000000000000000401000000000000840000000000000003D0103000000010000000B000000000000000000143D000000000000003D009E99999999B93F000000000000003D009E99999999B93F8014AE47E17AFC3F400000000000F03F00CDCCCCCCCCEC3FA06666666666FE3F8014AE47E17AFC3FA06666666666FE3F000000000000003D1800000000000040000000000000003D18000000000000400000000000000040400000000000F03F000000000000F03F000000000000143D0000000000000040000000000000143D000000000000003D, 0);

  • Forum Etiquette: How to post Reporting Services problems
  • [/url]
  • Forum Etiquette: How to post data/code on a forum to get the best help - by Jeff Moden
  • [/url]
  • How to Post Performance Problems - by Gail Shaw
  • [/url]

  • mister.magoo (12/6/2010)


    HI Craig, can you explain why? It produces the correct answer for me...

    It works just fine. Apologies, very nice solution. I'll be --------------> way for a bit.

    I misread the wildcards. Self.smack.


    - Craig Farrell

    Never stop learning, even if it hurts. Ego bruises are practically mandatory as you learn unless you've never risked enough to make a mistake.

    For better assistance in answering your questions[/url] | Forum Netiquette
    For index/tuning help, follow these directions.[/url] |Tally Tables[/url]

    Twitter: @AnyWayDBA

  • No problem. It is ugly code, so by rights should have something wrong!

    MM



    select geometry::STGeomFromWKB(0x0106000000020000000103000000010000000B0000001000000000000840000000000000003DD8CCCCCCCCCC0840000000000000003DD8CCCCCCCCCC08408014AE47E17AFC3F040000000000104000CDCCCCCCCCEC3F9C999999999913408014AE47E17AFC3F9C99999999991340000000000000003D0000000000001440000000000000003D000000000000144000000000000000400400000000001040000000000000F03F100000000000084000000000000000401000000000000840000000000000003D0103000000010000000B000000000000000000143D000000000000003D009E99999999B93F000000000000003D009E99999999B93F8014AE47E17AFC3F400000000000F03F00CDCCCCCCCCEC3FA06666666666FE3F8014AE47E17AFC3FA06666666666FE3F000000000000003D1800000000000040000000000000003D18000000000000400000000000000040400000000000F03F000000000000F03F000000000000143D0000000000000040000000000000143D000000000000003D, 0);

  • Forum Etiquette: How to post Reporting Services problems
  • [/url]
  • Forum Etiquette: How to post data/code on a forum to get the best help - by Jeff Moden
  • [/url]
  • How to Post Performance Problems - by Gail Shaw
  • [/url]

  • Another "ugly" method...just for the sake of it...

    select *,

    CASE WHEN 32*Work0110+16*Work0210+8*Work0310+4*Work0410+2*Work0510+Work0610 NOT IN (0,1,2,3,4,6,7,8,12,14,15,24,28,30,31,32,48,56,60,62,63) THEN 'Y' ELSE 'N' END

    from #Employee

    MM



    select geometry::STGeomFromWKB(0x0106000000020000000103000000010000000B0000001000000000000840000000000000003DD8CCCCCCCCCC0840000000000000003DD8CCCCCCCCCC08408014AE47E17AFC3F040000000000104000CDCCCCCCCCEC3F9C999999999913408014AE47E17AFC3F9C99999999991340000000000000003D0000000000001440000000000000003D000000000000144000000000000000400400000000001040000000000000F03F100000000000084000000000000000401000000000000840000000000000003D0103000000010000000B000000000000000000143D000000000000003D009E99999999B93F000000000000003D009E99999999B93F8014AE47E17AFC3F400000000000F03F00CDCCCCCCCCEC3FA06666666666FE3F8014AE47E17AFC3FA06666666666FE3F000000000000003D1800000000000040000000000000003D18000000000000400000000000000040400000000000F03F000000000000F03F000000000000143D0000000000000040000000000000143D000000000000003D, 0);

  • Forum Etiquette: How to post Reporting Services problems
  • [/url]
  • Forum Etiquette: How to post data/code on a forum to get the best help - by Jeff Moden
  • [/url]
  • How to Post Performance Problems - by Gail Shaw
  • [/url]

  • mister.magoo (12/6/2010)


    Another "ugly" method...just for the sake of it...

    select *,

    CASE WHEN 32*Work0110+16*Work0210+8*Work0310+4*Work0410+2*Work0510+Work0610 NOT IN (0,1,2,3,4,6,7,8,12,14,15,24,28,30,31,32,48,56,60,62,63) THEN 'Y' ELSE 'N' END

    from #Employee

    Ewww.... yeah, I'll agree with you on this one. Definitely "ugly". The first one? Most definitely elegant!

  • mister.magoo (12/6/2010)


    Another "ugly" method...just for the sake of it...

    select *,

    CASE WHEN 32*Work0110+16*Work0210+8*Work0310+4*Work0410+2*Work0510+Work0610 NOT IN (0,1,2,3,4,6,7,8,12,14,15,24,28,30,31,32,48,56,60,62,63) THEN 'Y' ELSE 'N' END

    from #Employee

    I believe this method is for when your juniors get involved in the process you want a way to say "You do not touch this without talking to me..."

    I love it though. Beautifully random use of a bitmask. 😀


    - Craig Farrell

    Never stop learning, even if it hurts. Ego bruises are practically mandatory as you learn unless you've never risked enough to make a mistake.

    For better assistance in answering your questions[/url] | Forum Netiquette
    For index/tuning help, follow these directions.[/url] |Tally Tables[/url]

    Twitter: @AnyWayDBA

  • CELKO (12/6/2010)


    A little change in the DDL makes life more relational. Never sue BIT flags in SQL; that is assembley language.m Alo, MySQL has a nice conventon for numeric momth names based on ISO-8601:

    So, to understand this, instead of one byte per row physical storage (6 bits, 1 byte can hold them), you're recommending INT, which is 4 bytes per flag, for 24 bytes per row of storage. Besides the fact that it's "archaic", what does this data expansion bring to the table? Does it run faster? Does it optimize better? I can't see over a million rows that more physical pages being necessary will be helpful.

    Even going to tinyint at a byte a piece is 6 bytes to the 1 it takes up.

    As per the MySQL, it's a shame that nifty function isn't around for us, but this is the SQL Server 2k8 board. Though I hope you leveraging your knowledge of it will produce a locally usable workaround. Will be interesting to see.


    - Craig Farrell

    Never stop learning, even if it hurts. Ego bruises are practically mandatory as you learn unless you've never risked enough to make a mistake.

    For better assistance in answering your questions[/url] | Forum Netiquette
    For index/tuning help, follow these directions.[/url] |Tally Tables[/url]

    Twitter: @AnyWayDBA

  • Craig Farrell (12/6/2010)


    mister.magoo (12/6/2010)


    HI Craig, can you explain why? It produces the correct answer for me...

    It works just fine. Apologies, very nice solution. I'll be --------------> way for a bit.

    I misread the wildcards. Self.smack.

    No problem, I can beat that. Me.smack.Craig!

    Wayne
    Microsoft Certified Master: SQL Server 2008
    Author - SQL Server T-SQL Recipes


    If you can't explain to another person how the code that you're copying from the internet works, then DON'T USE IT on a production system! After all, you will be the one supporting it!
    Links:
    For better assistance in answering your questions
    Performance Problems
    Common date/time routines
    Understanding and Using APPLY Part 1 & Part 2

  • WayneS (12/6/2010)


    No problem, I can beat that. Me.smack.Craig!

    Me.Duck (WayneS.smack.Craig, const_ImitateNeo). 😛 :w00t:

    What can I tell ya? Looked too good to be that easy and my eyes played tricks on me.


    - Craig Farrell

    Never stop learning, even if it hurts. Ego bruises are practically mandatory as you learn unless you've never risked enough to make a mistake.

    For better assistance in answering your questions[/url] | Forum Netiquette
    For index/tuning help, follow these directions.[/url] |Tally Tables[/url]

    Twitter: @AnyWayDBA

  • mister.magoo (12/6/2010)


    Probably not the most elegant answer you will receive....

    select *,

    CASE WHEN

    CONVERT(char(1),Work0110)

    +CONVERT(char(1),Work0210)

    +CONVERT(char(1),Work0310)

    +CONVERT(char(1),Work0410)

    +CONVERT(char(1),Work0510)

    +CONVERT(char(1),Work0610) LIKE '%1%0%1%' THEN 'Y'

    ELSE 'N'

    END As [FLAG]

    from #Employee

    I guess were both not elegant then because that's similar to the way I was going to do it. The only thing I may have done differently is the column names to make it a bit more generic. 🙂

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

    Change is inevitable... Change for the better is not.


    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)

  • Viewing 15 posts - 1 through 15 (of 16 total)

    You must be logged in to reply to this topic. Login to reply