How to return a range value based on the first day of year and the value of row

Question

Post reply

How to return a range value based on the first day of year and the value of row

IdealPR

Old Hand

Points: 335

September 28, 2021 at 6:16 pm

#3934247

Hello,

i have a table called Events, the value of that table like as bellow :

create table Events (
    id int,
    code_events varchar(10),
    Events varchar(10),
    Events_start datetime,
    Events_end datetime
 )
    
 insert into Events values(1,'AC_83','Event 1','2020-07-15','2020-07-30')
 insert into Events values(2,'AC_84','Event 2','2019-06-01','2030-07-30')
 insert into Events values(3,'AC_86','Event 3','2020-07-15','2022-12-15')
 insert into Events values(4,'AC_83','Event 1','2020-09-15','2025-04-30') 
 insert into Events values(5,'AC_87','Event 4','2021-07-15','2022-12-15')

I want to return a range of missing dates from the first date of 2020 and only if Events_start>=2020

As bellow what i want to get :

code_events    Events    Events_start    Events_end                       IS_Messing
 AC_83    Event 1    2020-01-01 00:00:00.000    2020-07-15 00:00:00.000   1
 AC_83    Event 1    2020-07-15 00:00:00.000    2020-07-30 00:00:00.000   0
 AC_83    Event 1    2020-07-30 00:00:00.000    2020-09-15 00:00:00.000   1  
 AC_83    Event 1    2020-09-15 00:00:00.000    2025-04-30 00:00:00.000   0
 AC_84    Event 2    2019-06-01 00:00:00.000    2030-07-30 00:00:00.000   0
 AC_86    Event 3    2020-01-01 00:00:00.000    2020-07-15 00:00:00.000   1
 AC_86    Event 3    2020-07-15 00:00:00.000    2022-12-15 00:00:00.000   0
 AC_87    Event 4    2020-01-01 00:00:00.000    2021-07-15 00:00:00.000   1
 AC_87    Event 4    2021-07-15 00:00:00.000    2022-12-15 00:00:00.000   0

Any idea how can i do that ?

Thanks for help !

This topic was modified 3 years, 9 months ago by IdealPR.

Viewing 8 posts - 1 through 8 (of 8 total)

You must be logged in to reply to this topic. Login to reply

Steve Collins SSCrazy Points: 2227 More actions · Answer 1

It's a one time only type query or it's intended to be run frequently? The id value in the last row of the sample data should be 5, no?

Aus dem Paradies, das Cantor uns geschaffen, soll uns niemand vertreiben können

Ken McKelvey SSCoach Points: 18927 More actions · Answer 2

WITH PrevEnds
AS
(
    SELECT E.code_events, E.[Events], E.Events_start, E.Events_end
        ,LAG(E.Events_end) OVER (PARTITION BY E.code_events ORDER BY E.Events_start) AS Prev_Events_end
    FROM Events E
)
,Results
AS
(
    SELECT P.code_events, P.[Events]
        ,CASE
            WHEN N.N = 1 AND X.Keep1 = 1
            THEN COALESCE(P.Prev_Events_end, '20200101')
            ELSE P.Events_start
        END AS Events_start
        ,CASE
            WHEN N.N = 1 AND X.Keep1 = 1
            THEN P.Events_start
            ELSE P.Events_end
        END AS Events_end
        ,CASE
            WHEN N.N = 1 AND X.Keep1 = 1
            THEN 1
            ELSE 0
        END AS IsMissing
        ,CASE
            WHEN N.N = 2 OR X.Keep1 = 1
            THEN 1
            ELSE 0
        END AS KeepRow
    FROM PrevEnds P
            CROSS APPLY (VALUES (1),(2)) N(N)
            CROSS APPLY
            (
                VALUES
                (
                    CASE
                        WHEN P.Events_start <> COALESCE(P.Prev_Events_end, '19000101')
                            AND P.Events_start > '20200101'
                        THEN 1
                        ELSE 0
                    END
                )
            ) X (Keep1)
)
SELECT code_events, [Events], Events_start, Events_end, IsMissing
FROM Results
WHERE KeepRow = 1
ORDER BY Code_events, Events_start;

This reply was modified 3 years, 9 months ago by Ken McKelvey.

IdealPR Old Hand Points: 335 More actions · Answer 3

the query will be run frequently

yes the id should be 5 it is a mistake

This reply was modified 3 years, 9 months ago by IdealPR.

Steve Collins SSCrazy Points: 2227 More actions · Answer 4

Ken McKelvey this looks familiar

https://www.sqlservercentral.com/forums/topic/need-to-create-gaps-in-a-table-of-date-ranges

Again one "brute force" way to do this would be to expand all the date ranges and use the set operator EXCEPT to remove the (also expanded) gaps. Then the results could be grouped by splitting the calculated dates into contiguous ranges. Reliable and simple (sort of) but maybe not the fastest way

with
 unq_events_cte as (
    select code_events, min(Events_end) min_end
    from #Events
    group by code_events
    having year(min(Events_start))>=2020),
exp_cte as (
    select code_events, v.dt
    from unq_events_cte e
         cross apply dbo.fnTally(0, datediff(day, '20200101', iif(e.min_end<'20201231', '20201231', e.min_end))) fn
         cross apply (values (dateadd(day, fn.n, '20200101'))) v(dt)
    except
    select code_events, v.dt
    from #Events e
         cross apply dbo.fnTally(0, datediff(day, e.Events_start, e.Events_end)) fn
         cross apply (values (dateadd(day, fn.n, e.Events_start))) v(dt)),
 gap_cte as (
    select *, case when datediff(day, dt, lag(dt, 1, 1) 
                                              over (partition by code_events
                                              order by dt))<>-1 
                   then 1 else 0 end gap
    from exp_cte),
grp_cte as (
    select *, sum(gap) over (partition by code_events order by dt) grp
    from gap_cte)
select code_events, 
       iif(min(dt)='20200101', '20200101', dateadd(day, 1, min(dt))) Events_start,
       iif(max(dt)='20201231', '20201231', dateadd(day, 1, max(dt))) Events_end,
       1 Is_missing
from grp_cte
group by code_events, grp
union all
select code_events, Events_start, Events_end, 0
from #Events
order by code_events, Events_start;

Aus dem Paradies, das Cantor uns geschaffen, soll uns niemand vertreiben können

jcelko212 32090 SSCrazy Eights Points: 9304 More actions · Answer 5

>> I have a table called Events, the value of that table like as bellow :<<

Actually you don't have a table in your posting. By definition, a table must have a key and that means it's columns cannot all be NULL. You also don't know there's no such thing as a generic "_id" in RDBMS. It must be the identifier of something in particular, it can't be a numeric because it's always measured on a nominal scale.

You might also want to read the ISO 11179 standards and learn how to properly name columns. For example, since a column is made of rows, and each row is made of columns, you can't have plural column names. They are by their very nature, in the definition of RDBMS, scalar values.

CREAT TABLE Events

(event_code VARCHAR(10) NOT NULL

CHECK (event_code IN (..)),

event_name VARCHAR(10) NOT NULL,

event_start_date DATE NOT NULL,

event_end_date DATE,

CHECK (event_start_date <= event_end_date),

PRIMARY KEY (event_code, event_name, event_start_date)

);

Did you know we've had a DATE data type for several years now? There is no need to carry the time and in fact simply waste storage. You're still writing that old Sybase code!

Please note the use of a check clause to guarantee that your codes are drawn from a list. But it looks like what you're calling a code is not what we call a code. It looks more like the name of an event with the year attached to it. Think of an area code and phone numbe, as an example of what a code is. You also want to use a check constraint to be sure the start and end dates are in order. Notice the use of a null for the end date of an event; this means event is still ongoing and has not yet completed. You want to download a copy of temporal queries and SQL by Rick Snodgrass. It's a free PDF and the University of Arizona on their website.

Unfortunate this is still pretty bad design, as far as I can tell. This is why you wind up with a three column primary key when a unique event name would have allowed a single column. You will eventually discover that most of the work and SQL is done in the DDL, and not the DML you have to do elaborate things in the DML, it's usually because you screwed up the basic design and have to include everything.

We've had no need for the old-fashioned Sybase insertion one at a time. If you put in a table constructor, this will let the optimizer stand a chance of picking the best way to do this. If you do it as if you were still using punch cards, the SQL engine is obligated to treat them as punchcards one at a time. The semi-colon was also left out of Sybase in the early days, but now many of the SQL Server statements require it. Since it is ANSI and ISO standard. You might want to start actually writing real SQL and not dialect.

INSERT INTO Events

VALUES

('AC_83', 'Event 1', '2020-07-15', '2020-07-30'),

('AC_83', 'Event 1', '2020-09-15', '2025-04-30'),

('AC_84', 'Event 2', '2019-06-01', '2030-07-30'),

('AC_86', 'Event 3', '2020-07-15', '2022-12-15'),

('AC_87', 'Event 4', '2021-07-15', '2022-12-15');

>> I want to return a range of missing dates from the first date of 2020 and only if event_start>=2020 <<

Why are you writing SQL as if it was assembly language? We don't use bit flags in high-level languages, and in particular since oriented high-level languages. Again, this is designed failure will lead to elaborate queries and poor performance.

As below what I want to get :

code_events Events event_start event_end is_missing_flg

AC_83 Event 1 2020-07-15 2020-07-30 0

AC_83 Event 1 2020-07-30 2020-09-15 1

AC_83 Event 1 2020-09-15 2025-04-30 0

AC_84 Event 2 2019-06-01 2030-07-30 0

AC_86 Event 3 2020-01-01 2020-07-15 1

AC_86 Event 3 2020-07-15 2022-12-15 0

AC_87 Event 4 2020-01-01 2021-07-15 1

AC_87 Event 4 2021-07-15 2022-12-15 0

did you notice that the end of one of your intervals is the start of the following interval? So the event status at the intersection of these two intervals is ambiguous. Is it missing or not missing? Or is it a quantum thin so it is both?

Do you have a calendar table? Instead of actually materializing the missing data flags, you could compute them with a simple lookup. But right now your specs are too vague to make any judgments. Why don't you post specs that tell us what you're actually trying to do?.

Please post DDL and follow ANSI/ISO standards when asking for help.

Ken McKelvey SSCoach Points: 18927 More actions · Answer 6

Steve Collins wrote:

this looks familiar

I think both windowed and brute force approaches have their place. I have just tidied up my code to try and make it more understandable:

WITH PrevEnds
AS
(
    -- Get previous Events_end as will be needed to determine if there are missing rows
    -- and also for the Events_start value of missing rows
    SELECT E.code_events, E.[Events], E.Events_start, E.Events_end
        ,LAG(E.Events_end)
            OVER (PARTITION BY E.code_events ORDER BY E.Events_start) AS Prev_Events_end
    FROM Events E
)
SELECT P.code_events, P.[Events]
    ,CASE
        WHEN X.IsMissing = 1
        THEN COALESCE(P.Prev_Events_end, '2020')
        ELSE P.Events_start
    END AS Events_start
    ,CASE
        WHEN X.IsMissing = 1
        THEN P.Events_start
        ELSE P.Events_end
    END AS Events_end
    ,X.IsMissing
FROM PrevEnds P
        -- The maximum number of missing rows = the number of original rows
        CROSS APPLY (VALUES ('PossibleMissing'),('Original')) R(RowType)
        -- Determine if the row is missing
        CROSS APPLY
        (
            VALUES
            (
                CASE
                    WHEN R.RowType = 'PossibleMissing'
                        AND P.Events_start <> COALESCE(P.Prev_Events_end, '1900')
                        AND P.Events_start > '2020'
                    THEN 1
                    ELSE 0
                END
            )
        ) X (IsMissing)
WHERE R.RowType = 'Original' OR X.IsMissing = 1
ORDER BY Code_events, Events_start;

This reply was modified 3 years, 9 months ago by Ken McKelvey.
This reply was modified 3 years, 9 months ago by Ken McKelvey.
This reply was modified 3 years, 9 months ago by Ken McKelvey.
This reply was modified 3 years, 9 months ago by Ken McKelvey.
This reply was modified 3 years, 9 months ago by Ken McKelvey.
This reply was modified 3 years, 9 months ago by Ken McKelvey.

Steve Collins SSCrazy Points: 2227 More actions · Answer 7

Ken McKelvey wrote:

Steve Collins wrote:
this looks familiar
I think both windowed and brute force approaches have their place. I have just tidied up my code to try and make it more understandable:

It looks good. Next time I'm going to start with your code

Aus dem Paradies, das Cantor uns geschaffen, soll uns niemand vertreiben können

How to return a range value based on the first day of year and the value of row

Cookies on SQLServerCentral