Number of employees active per date

  • Hi,
    I have a very simple table 

    create table #employee
    (
    EmpNumber int,
    HireDate datetime,
    TerminationDate datetime
    )

    insert into #employee (EmpNumber, HireDate, TerminationDate) values (12, '09/02/1986', '12/31/2022' );
    insert into #employee (EmpNumber, HireDate, TerminationDate) values (13, '09/01/1986', '12/31/2012' );
    insert into #employee (EmpNumber, HireDate, TerminationDate) values (14, '09/02/2001', '12/31/2002' );
    insert into #employee (EmpNumber, HireDate, TerminationDate) values (15, '09/02/2011', '12/31/2022' );

    And on the end result of the query I would like to get the employees active per day.

    Date                                    ActiveEmployees
    01/01/1986                          10
    02/01/1986                          11

    etc etc.
    I am not sure how to do it, because I would also like to have every single date, even though if I didn't have any employee active.
    Thanks
    Astrid  

  • Do you have an calendar table?  Otherwise, you'd have to 'generate' one.  I used Jeff Moden's tally table (attached) to generate the dates for 2019 and modified your data to show people hired and terminated at different times

    create table #employee

    (
    EmpNumber int,
    HireDate datetime,
    TerminationDate datetime
    )

    insert into #employee (EmpNumber, HireDate, TerminationDate) values (12, '09/02/1986', '12/31/2022' );
    insert into #employee (EmpNumber, HireDate, TerminationDate) values (13, '09/01/1986', '12/31/2012' );
    insert into #employee (EmpNumber, HireDate, TerminationDate) values (14, '09/02/2001', '01/15/2019' );
    insert into #employee (EmpNumber, HireDate, TerminationDate) values (15, '09/02/2011', '01/20/2019' );

    ;with cte as (
    select DateAdd(day, N, '1/1/2019') CalDate
    from dbo.Tally(0,365))

    select c.Caldate, count(e.EmpNumber) EmpCount
    from cte c
        left join #employee e
            on e.HireDate <= c.CalDate
            and e.TerminationDate >= c.CalDate
    group by C.CalDate

    For better, quicker answers, click on the following...
    http://www.sqlservercentral.com/articles/Best+Practices/61537/

    For better answers on performance questions, click on the following...
    http://www.sqlservercentral.com/articles/SQLServerCentral/66909/

  • I am confused, I have a date table, but I am not sure how to join it. that is my problem, now i see that cte, it is not my day.... I am checking

  • Mike01 - Thursday, January 17, 2019 2:11 PM

    Do you have an calendar table?  Otherwise, you'd have to 'generate' one.  I used Jeff Moden's tally table (attached) to generate the dates for 2019 and modified your data to show people hired and terminated at different times

    create table #employee

    (
    EmpNumber int,
    HireDate datetime,
    TerminationDate datetime
    )

    insert into #employee (EmpNumber, HireDate, TerminationDate) values (12, '09/02/1986', '12/31/2022' );
    insert into #employee (EmpNumber, HireDate, TerminationDate) values (13, '09/01/1986', '12/31/2012' );
    insert into #employee (EmpNumber, HireDate, TerminationDate) values (14, '09/02/2001', '01/15/2019' );
    insert into #employee (EmpNumber, HireDate, TerminationDate) values (15, '09/02/2011', '01/20/2019' );

    ;with cte as (
    select DateAdd(day, N, '1/1/2019') CalDate
    from dbo.Tally(0,365))

    select c.Caldate, count(e.EmpNumber) EmpCount
    from cte c
        left join #employee e
            on e.HireDate <= c.CalDate
            and e.TerminationDate >= c.CalDate
    group by C.CalDate

    I am not sure what you mean by 

    ;with cte as (
    select DateAdd(day, N, '1/1/2019') CalDate
    from dbo.Tally(0,365))

    I dont know what table tally is.

  • I dont know what table tally is.

    You will after reading this: The "Numbers" or "Tally" Table: What it is and how it replaces a loop

    You can create one like so (the code here will generate a tally table with 100K numbers:

    CREATE TABLE dbo.tally(N bigint NOT NULL);
     ALTER TABLE dbo.tally ADD CONSTRAINT pk_cl__dbo_tally__N PRIMARY KEY CLUSTERED(N)
      WITH (FILLFACTOR=100);
     ALTER TABLE dbo.tally ADD CONSTRAINT uq_nc__dbo_tally__N UNIQUE NONCLUSTERED(N);

    INSERT dbo.tally (N)
    SELECT TOP (100000) ROW_NUMBER() OVER (ORDER BY (SELECT NULL))
    FROM sys.all_columns a, sys.all_columns b;

    "I cant stress enough the importance of switching from a sequential files mindset to set-based thinking. After you make the switch, you can spend your time tuning and optimizing your queries instead of maintaining lengthy, poor-performing code."

    -- Itzik Ben-Gan 2001

  • astrid 69000 - Thursday, January 17, 2019 2:20 PM

    I am confused, I have a date table, but I am not sure how to join it. that is my problem, now i see that cte, it is not my day.... I am checking

    The CTE is generating a dynamic table of dates... Just think of it as a black box that returns a table of dates, if that helps. (Yeah, I know you're not supposed to use code you don't understand, but sometimes you have to worry about one piece at a time.

  • Duplicated post deleted.

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

    Change is inevitable... Change for the better is not.


    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)

  • astrid 69000 - Thursday, January 17, 2019 1:20 PM

    Hi,
    I have a very simple table 

    create table #employee
    (
    EmpNumber int,
    HireDate datetime,
    TerminationDate datetime
    )

    insert into #employee (EmpNumber, HireDate, TerminationDate) values (12, '09/02/1986', '12/31/2022' );
    insert into #employee (EmpNumber, HireDate, TerminationDate) values (13, '09/01/1986', '12/31/2012' );
    insert into #employee (EmpNumber, HireDate, TerminationDate) values (14, '09/02/2001', '12/31/2002' );
    insert into #employee (EmpNumber, HireDate, TerminationDate) values (15, '09/02/2011', '12/31/2022' );

    And on the end result of the query I would like to get the employees active per day.

    Date                                    ActiveEmployees
    01/01/1986                          10
    02/01/1986                          11

    etc etc.
    I am not sure how to do it, because I would also like to have every single date, even though if I didn't have any employee active.
    Thanks
    Astrid  

    I don't know how may employees are involved but I do know that the methods demonstrated so far will produce a Cartesian Product of the # of employees times the number of days in the range, which is 13,514 days to cover the complete whole year date range prescribed by the 4 employee's data that you provided. That means that just 4 employees you have will produce an internal CROSS JOIN of 54,056 rows to produce the desired result using the current methods. That means that for every 1,000 employee rows, the internal CROSS JOIN will have 13,514,000 internal rows generated. This can get expensive quickly.

    With that extra bit of knowledge, I have xx questions for you...
    1. How many total employee rows are we talking about?
    2. What is the full range of dates you're expecting?
    3. How often to you have to calculate this?                       

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

    Change is inevitable... Change for the better is not.


    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)

  • Alan.B - Thursday, January 17, 2019 3:10 PM

    I dont know what table tally is.

    You will after reading this: The "Numbers" or "Tally" Table: What it is and how it replaces a loop

    You can create one like so (the code here will generate a tally table with 100K numbers:

    CREATE TABLE dbo.tally(N bigint NOT NULL);
     ALTER TABLE dbo.tally ADD CONSTRAINT pk_cl__dbo_tally__N PRIMARY KEY CLUSTERED(N)
      WITH (FILLFACTOR=100);
     ALTER TABLE dbo.tally ADD CONSTRAINT uq_nc__dbo_tally__N UNIQUE NONCLUSTERED(N);

    INSERT dbo.tally (N)
    SELECT TOP (100000) ROW_NUMBER() OVER (ORDER BY (SELECT NULL))
    FROM sys.all_columns a, sys.all_columns b;

    First, you should use a table value constructor to generate the appropriate number of rows, because reading from disk is relatively expensive.

    Second, you should use an inline table-valued function instead of an instantiated table for your tally table, because, again, reading from disk is relatively expensive.  Itzik Ben Gan created a blazing fast function that you should look up.  If you can't create user-defined functions, you can still use the vast majority of the code as a CTE.

    Drew

    J. Drew Allen
    Business Intelligence Analyst
    Philadelphia, PA

  • astrid 69000 - Thursday, January 17, 2019 1:20 PM

    Your simple table is very badly done. You don't know about current data types. You don't know that a table has to have a primary key. Since a table models a set of things, its name has to be a plural or collective – unless you actually do only have one employee.

    You don't know that an integer or any other numeric can never be an identifier by definition. In short, you are not writing SQL yet, but only mimic punch cards.

    CREATE TABLE Personnel
    (emp_nbr CHAR (3) NOT NULL PRIMARY KEY,
    hire_date DATE NOT NULL,
    termination_date DATE, --- null is still employes
    CHECK(hire_date <= termination_date));

    Please look at the collective name, the character string identifier, the ISO 11179 format on the names and constraint among the dates. Get a copy of the temporal SQL book from the University of Arizona by Rick Snodgrass; is a free PDF download and should be part of every working SQL programmers library. Among other things, you will learn that the only legal format in ANSI-ISO standard SQL is"yyyy-mm-dd" and not the ambiguous local dialect you used. I had to guess at exactly what those dates were to translate them into ANSI-ISO standards.

    You also need to learn for the insertion statement that Microsoft is had for quite a few years now. You are still using the "one punch card at a time" style that Sybase originally had decades ago.

    INSERT INTO Personnel
    VALUES
    ('012', '1986-09-02', '2022-02-31' ),
    ('013', '1986-09-01', '2012-12-31' ),
    ('014', '2001-09-02', '2002-12-31' ),
    ('015', '2011-09-02', '2022-12-31' );

    Build a calendar table with one column for the calendar data and other columns to show whatever your business needs in the way of temporal information. Do not try to calculate holidays in SQL -- Easter alone requires too much math and there are two of them (Orthodox and Catholic).

    The Ordinal business day is a good trick. Number the days from whenever your calendar starts and repeat a number for a weekend or company holiday.

    CREATE TABLE Calendar
    (cal_date DATE NOT NULL PRIMARY KEY,
    ordinal_business_nbr INTEGER NOT NULL,
    ...);

    INSERT INTO Calendar
    VALUES ('2007-04-05', 42);
     ('2007-04-06', 43); -- Good Friday
     ('2007-04-07', 43);
     ('2007-04-08', 43); -- Easter Sunday
     ('2007-04-09', 44);
     ('2007-04-10', 45); -- Tuesday, back to work

    To compute the business days from '2007-04-05', Thursday before Easter in 2007 to '2007-04-10', the next Tuesday:

    SELECT (C2.ordinal_business_nbr - C1.ordinal_business_nbr -1) AS business_day_cnt
    FROM Calendar AS C1, Calendar AS C2
    WHERE C1.cal_date = '2007-04-05'
     AND C2.cal_date = '2007-04-10';

    This saves having to scan all of the rows within the range to get a sum. The -1 is to avoid counting the current day as one of the business days, but you might have a different rule in your company.

    Please post DDL and follow ANSI/ISO standards when asking for help. 

  • drew.allen - Friday, January 18, 2019 10:12 AM

    Alan.B - Thursday, January 17, 2019 3:10 PM

    I dont know what table tally is.

    You will after reading this: The "Numbers" or "Tally" Table: What it is and how it replaces a loop

    You can create one like so (the code here will generate a tally table with 100K numbers:

    CREATE TABLE dbo.tally(N bigint NOT NULL);
     ALTER TABLE dbo.tally ADD CONSTRAINT pk_cl__dbo_tally__N PRIMARY KEY CLUSTERED(N)
      WITH (FILLFACTOR=100);
     ALTER TABLE dbo.tally ADD CONSTRAINT uq_nc__dbo_tally__N UNIQUE NONCLUSTERED(N);

    INSERT dbo.tally (N)
    SELECT TOP (100000) ROW_NUMBER() OVER (ORDER BY (SELECT NULL))
    FROM sys.all_columns a, sys.all_columns b;

    First, you should use a table value constructor to generate the appropriate number of rows, because reading from disk is relatively expensive.

    Second, you should use an inline table-valued function instead of an instantiated table for your tally table, because, again, reading from disk is relatively expensive.  Itzik Ben Gan created a blazing fast function that you should look up.  If you can't create user-defined functions, you can still use the vast majority of the code as a CTE.

    Drew

    If you read from an instantiated table, especially if you just built it, you won't be reading it from memory.  A physical Tally table will usually beat Itzik's wonderful cascading CTE method for sheer performance.

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

    Change is inevitable... Change for the better is not.


    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)

  • jcelko212 32090 - Friday, January 18, 2019 2:48 PM

    astrid 69000 - Thursday, January 17, 2019 1:20 PM

    Your simple table is very badly done. You don't know about current data types. You don't know that a table has to have a primary key. Since a table models a set of things, its name has to be a plural or collective – unless you actually do only have one employee.

    You don't know that an integer or any other numeric can never be an identifier by definition. In short, you are not writing SQL yet, but only mimic punch cards.

    CREATE TABLE Personnel
    (emp_nbr CHAR (3) NOT NULL PRIMARY KEY,
    hire_date DATE NOT NULL,
    termination_date DATE, --- null is still employes
    CHECK(hire_date <= termination_date));

    Please look at the collective name, the character string identifier, the ISO 11179 format on the names and constraint among the dates. Get a copy of the temporal SQL book from the University of Arizona by Rick Snodgrass; is a free PDF download and should be part of every working SQL programmers library. Among other things, you will learn that the only legal format in ANSI-ISO standard SQL is"yyyy-mm-dd" and not the ambiguous local dialect you used. I had to guess at exactly what those dates were to translate them into ANSI-ISO standards.

    You also need to learn for the insertion statement that Microsoft is had for quite a few years now. You are still using the "one punch card at a time" style that Sybase originally had decades ago.

    INSERT INTO Personnel
    VALUES
    ('012', '1986-09-02', '2022-02-31' ),
    ('013', '1986-09-01', '2012-12-31' ),
    ('014', '2001-09-02', '2002-12-31' ),
    ('015', '2011-09-02', '2022-12-31' );

    Build a calendar table with one column for the calendar data and other columns to show whatever your business needs in the way of temporal information. Do not try to calculate holidays in SQL -- Easter alone requires too much math and there are two of them (Orthodox and Catholic).

    The Ordinal business day is a good trick. Number the days from whenever your calendar starts and repeat a number for a weekend or company holiday.

    CREATE TABLE Calendar
    (cal_date DATE NOT NULL PRIMARY KEY,
    ordinal_business_nbr INTEGER NOT NULL,
    ...);

    INSERT INTO Calendar
    VALUES ('2007-04-05', 42);
     ('2007-04-06', 43); -- Good Friday
     ('2007-04-07', 43);
     ('2007-04-08', 43); -- Easter Sunday
     ('2007-04-09', 44);
     ('2007-04-10', 45); -- Tuesday, back to work

    To compute the business days from '2007-04-05', Thursday before Easter in 2007 to '2007-04-10', the next Tuesday:

    SELECT (C2.ordinal_business_nbr - C1.ordinal_business_nbr -1) AS business_day_cnt
    FROM Calendar AS C1, Calendar AS C2
    WHERE C1.cal_date = '2007-04-05'
     AND C2.cal_date = '2007-04-10';

    This saves having to scan all of the rows within the range to get a sum. The -1 is to avoid counting the current day as one of the business days, but you might have a different rule in your company.

    If, by current data types, you mean not using DATETIME, then you'll be "pleased" to know that the newer datatypes are actually NOT ISO compliant when it comes to date math whereas the DATETIME and, SMALLDATETIME datatypes actually are.

    You continued observation about not using an Integer or similar datatype as a row identifier also continues to demonstrate that you know nothing of the advantages of doing so nor are you aware of the math that goes on behind the scenes for such a column.

    Last but not least, the OP isn't trying to calculate business days.  He's trying to calculate the number of employees for any given date for all dates in the possible temporal range.

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

    Change is inevitable... Change for the better is not.


    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)

  • @astrid 69000 ,

    Do you have answers to my questions? 😉

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

    Change is inevitable... Change for the better is not.


    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)

Viewing 13 posts - 1 through 12 (of 12 total)

You must be logged in to reply to this topic. Login to reply