April 12, 2016 at 3:35 am
I am trying to create a dataset for a range chart in SSRS, the purpose of this chart is to act as a promotional calendar for our sales team to be able to see where they have promotions and at times multiple concurrent promotions running at retail. I have got the bar chart working well but I am struggling with overlapping dates.
I have a test dataset that highlights my problem
CREATE TABLE #promotiontest
(
Brand VARCHAR(20),
Product VARCHAR(20),
Deals VARCHAR(20),
StartDate DATE,
EndDate DATE,
Duration INT,
LineID INT
)
insert into #promotiontest values('Brand A','Product A','Deal 1','2016-06-01','2016-06-30','29',1);
insert into #promotiontest values('Brand A','Product A','Deal 2','2016-06-10','2016-09-30','112',2);
insert into #promotiontest values('Brand A','Product A','Deal 3','2016-06-25','2016-07-31','36',3);
insert into #promotiontest values('Brand A','Product A','Deal 4','2016-06-25','2016-07-31','36',4);
insert into #promotiontest values('Brand A','Product A','Deal 5','2016-06-30','2016-07-14','14',1);
insert into #promotiontest values('Brand A','Product A','Deal 6','2016-08-31','2016-09-30','30',1);
insert into #promotiontest values('Brand A','Product B','Deal 7','2016-06-01','2016-06-30','29',1);
insert into #promotiontest values('Brand A','Product B','Deal 8','2016-07-01','2016-07-30','29',1);
insert into #promotiontest values('Brand A','Product B','Deal 9','2016-07-31','2016-08-31','31',1);
insert into #promotiontest values('Brand A','Product B','Deal 10','2016-09-01','2016-09-30','29',1);
SELECT *
FROM #promotiontest
My challenge, is I need to create a dynamic column that works just like LineID (which I have hardcoded to show how it should look). LineID is used to signify if there are overlapping promotions, 1 means it is the first one, 2 means its the second etc etc. The challenge is getting Deal 5 to have a lineID of 1, as the promotion that is on Line 1 would have finished. All of product B should be on one line as there are no overlaps. This needs to be dynamic because if deal 2 was filtered via a parameter by the user I need that line to disappear in the report and the column to recalculate.
I use the LineID in the range chart to work as a grouping, which I have uploaded.
I have got close with the following code with the help of @KhTan link here
; with data (Brand, Product, Deal, StartDate, EndDate, Duration, LineID) as
(
SELECT 'Brand A','Product A','Deal 1','2016-06-01','2016-06-30','29',1 UNION all
SELECT 'Brand A','Product A','Deal 2','2016-06-10','2016-09-30','112',2 UNION all
SELECT 'Brand A','Product A','Deal 3','2016-06-25','2016-07-31','36',3 UNION all
SELECT 'Brand A','Product A','Deal 4','2016-06-25','2016-07-31','36',4 UNION all
SELECT 'Brand A','Product A','Deal 5','2016-06-30','2016-07-14','14',1 UNION all
SELECT 'Brand A','Product A','Deal 6','2016-08-31','2016-09-30','30',1 UNION all
SELECT 'Brand A','Product B','Deal 7','2016-06-01','2016-06-30','29',1 UNION all
SELECT 'Brand A','Product B','Deal 8','2016-07-01','2016-07-30','29',1 UNION all
SELECT 'Brand A','Product B','Deal 9','2016-07-31','2016-08-31','31',1 UNION ALL
SELECT 'Brand A','Product B','Deal 10','2016-09-01','2016-09-30','29',1
),
cte as
(
select rn = row_number() over (partition by Brand,Product order by StartDate), *
from data
),
rcte AS
(
select rn, Brand, Product,Deal , StartDate, EndDate,LineID,
grp = 1, grpStart = StartDate, grpEnd = EndDate
from cte
where rn = 1
union ALL
select c.rn, c.Brand, c.Product, c.Deal, c.StartDate, c.EndDate,c.LineID,
grp = case when c.StartDate BETWEEN r.grpStart and r.grpEnd
then r.grp +1
else r.grp
end,
grpStart = case when c.StartDate BETWEEN r.grpStart and r.grpEnd
then r.grpStart
else c.StartDate
end,
grpEnd = case when c.EndDate >= r.grpEnd then c.EndDate else r.grpEnd END
from rcte r
inner join cte c
ON r.Brand = c.Brand
AND r.Product = c.Product
AND r.rn = c.rn -1
)
SELECT * FROM rcte
ORDER BY Brand, Product, deal
but the condition of looking back to the previous rn doesn't work as I need to do a full table scan rather than looking at a row.
Apologies if this doesn't make sense and I am rambling.
Regards
Rich
April 12, 2016 at 2:56 pm
You are not rambling at all and your problem is in fact intrigueing. Unfortunately I had a long day and I cannot get my brains to clear out the fog, so I cannot give you more than a few thoughts.
Once I understood the issue (and for those still struggling, I suggest looking at the attached picture - it helps), I got the feeling that this problem might be a good candidate for the "set-based iteration" approach - that is the name I invented for the basic algorithm I came up with over a decade ago when helping someone with a bin packing problem. (I wrote about it extensively on my blog, back then - you should still be able to find it on SqlBlog.com)
The idea is that instead of either trying to come up with a single set-based approach (which I expect to require so many self-joins that it'll be both dog-slow and unmaintanable) or using a one-row-at-a-time iteration (which in this case would probably be a relatively simple approach and fast enough when the amount of data is limited - but where is the fun in that?), you use an algorith that does just a few iterations and each iteration uses a set-based query that does as much work as it can - but not all.
In this specific case, I would use one iteration per line. So four iterations total for your sample data. (Probably the fifth iteration would be where the execution actually ends because no rows are affected anymore).
In the first iteration, you would assign line number 1 to the first row for each product and for each next row of the same product that does not overlap with the previous rows. So for your test data, it would assign line number 1 to deals 1, 5, 6, 7, 8, 9, and 10. I have a feeling that there must be a set-based way to do this, using smart windowing function techniques, but this is where the foggy brain kicks in.
In the next iteration, you do the same, but now assigning the next line number (2), and only looking at rows that do not yet have a line number. For the test data, only deal 2 applies, since deals 3 and 4 both overlap with it and all other deals already have line 1.
Repeat this for lines 3 and 4, and when you run the query one last time for line 5, you'll get @@ROWCOUNT = 0 so you are ready.
For the sample data, this is a lot of work for just 10 rows. But if your actual data is tens of thousands of rows and ends up using just a dozen or so lines, the entire process will probably run faster than a cursor that processes one row at a time. If, that is, you can get that set-based query done (and fast enough).
I hope someone else will jump in and give you some pointers for the missing part of this solution (and in fact the only hard part :w00t:).
April 12, 2016 at 3:51 pm
richjlamb (4/12/2016)
I am trying to create a dataset for a range chart in SSRS, the purpose of this chart is to act as a promotional calendar for our sales team to be able to see where they have promotions and at times multiple concurrent promotions running at retail. I have got the bar chart working well but I am struggling with overlapping dates.I have a test dataset that highlights my problem
CREATE TABLE #promotiontest
(
Brand VARCHAR(20),
Product VARCHAR(20),
Deals VARCHAR(20),
StartDate DATE,
EndDate DATE,
Duration INT,
LineID INT
)
insert into #promotiontest values('Brand A','Product A','Deal 1','2016-06-01','2016-06-30','29',1);
insert into #promotiontest values('Brand A','Product A','Deal 2','2016-06-10','2016-09-30','112',2);
insert into #promotiontest values('Brand A','Product A','Deal 3','2016-06-25','2016-07-31','36',3);
insert into #promotiontest values('Brand A','Product A','Deal 4','2016-06-25','2016-07-31','36',4);
insert into #promotiontest values('Brand A','Product A','Deal 5','2016-06-30','2016-07-14','14',1);
insert into #promotiontest values('Brand A','Product A','Deal 6','2016-08-31','2016-09-30','30',1);
insert into #promotiontest values('Brand A','Product B','Deal 7','2016-06-01','2016-06-30','29',1);
insert into #promotiontest values('Brand A','Product B','Deal 8','2016-07-01','2016-07-30','29',1);
insert into #promotiontest values('Brand A','Product B','Deal 9','2016-07-31','2016-08-31','31',1);
insert into #promotiontest values('Brand A','Product B','Deal 10','2016-09-01','2016-09-30','29',1);
SELECT *
FROM #promotiontest
My challenge, is I need to create a dynamic column that works just like LineID (which I have hardcoded to show how it should look). LineID is used to signify if there are overlapping promotions, 1 means it is the first one, 2 means its the second etc etc. The challenge is getting Deal 5 to have a lineID of 1, as the promotion that is on Line 1 would have finished. All of product B should be on one line as there are no overlaps. This needs to be dynamic because if deal 2 was filtered via a parameter by the user I need that line to disappear in the report and the column to recalculate.
I use the LineID in the range chart to work as a grouping, which I have uploaded.
I have got close with the following code with the help of @KhTan link here
; with data (Brand, Product, Deal, StartDate, EndDate, Duration, LineID) as
(
SELECT 'Brand A','Product A','Deal 1','2016-06-01','2016-06-30','29',1 UNION all
SELECT 'Brand A','Product A','Deal 2','2016-06-10','2016-09-30','112',2 UNION all
SELECT 'Brand A','Product A','Deal 3','2016-06-25','2016-07-31','36',3 UNION all
SELECT 'Brand A','Product A','Deal 4','2016-06-25','2016-07-31','36',4 UNION all
SELECT 'Brand A','Product A','Deal 5','2016-06-30','2016-07-14','14',1 UNION all
SELECT 'Brand A','Product A','Deal 6','2016-08-31','2016-09-30','30',1 UNION all
SELECT 'Brand A','Product B','Deal 7','2016-06-01','2016-06-30','29',1 UNION all
SELECT 'Brand A','Product B','Deal 8','2016-07-01','2016-07-30','29',1 UNION all
SELECT 'Brand A','Product B','Deal 9','2016-07-31','2016-08-31','31',1 UNION ALL
SELECT 'Brand A','Product B','Deal 10','2016-09-01','2016-09-30','29',1
),
cte as
(
select rn = row_number() over (partition by Brand,Product order by StartDate), *
from data
),
rcte AS
(
select rn, Brand, Product,Deal , StartDate, EndDate,LineID,
grp = 1, grpStart = StartDate, grpEnd = EndDate
from cte
where rn = 1
union ALL
select c.rn, c.Brand, c.Product, c.Deal, c.StartDate, c.EndDate,c.LineID,
grp = case when c.StartDate BETWEEN r.grpStart and r.grpEnd
then r.grp +1
else r.grp
end,
grpStart = case when c.StartDate BETWEEN r.grpStart and r.grpEnd
then r.grpStart
else c.StartDate
end,
grpEnd = case when c.EndDate >= r.grpEnd then c.EndDate else r.grpEnd END
from rcte r
inner join cte c
ON r.Brand = c.Brand
AND r.Product = c.Product
AND r.rn = c.rn -1
)
SELECT * FROM rcte
ORDER BY Brand, Product, deal
but the condition of looking back to the previous rn doesn't work as I need to do a full table scan rather than looking at a row.
Apologies if this doesn't make sense and I am rambling.
Regards
Rich
Looking at the data you posted, some will say the Deal 5 overlaps Deal 1 as it starts on the same day that Deal 1 ends. I also base this on your decription of why Deal 7 through Deal 10 do not overlap.
I am also curious why you consider that Deal 5 and Deal aren't part of the overlap when Deal 2 is part of the overlap since its EndDate is past the EdnDate of both Deal 5 and Deal 6. I could potential see it if Deal 2 is filtered out and Deal 5 had a StartDate 2017-07-01 instead of 2016-06-30.
Edit: Except that even with Deal 2 filtered out, Deal 5 still overlaps with Deal 3 and Deal 4.
Unless you have a different definition of overlapping.
April 13, 2016 at 1:58 am
Lynn Pettis (4/12/2016)
Looking at the data you posted, some will say the Deal 5 overlaps Deal 1 as it starts on the same day that Deal 1 ends.
I originally had the same thought. but looking at the picture attached and looking at the data some more, I realised that the data represents half-open intervals. Those are very often used when working with date + time, but most people tend to mentally switch to clsoed intervals when using date only.
Illustration of the difference:
* Half-open: "We have an appointment tomorrow from 3 o'clock to 4 o'clock" - people perceive this as a one-hour appointment. It does not overlap an appointment that starts at 4 o'clock, since the previous appointment ends "the microsecond before 4 o'clock".
* Closed: "We have a short break from June 3 to June 4" - people perceive this as a 2-day break. It will overlap any appointment on June 4.
In the case of the data posted in this thread, it uses days only but still uses half-open intervals. So the "short break from June 3 to June 4" actually ends "the microsecond before June 4 starts", and does not overlap with an interval that starts on June 4.
(In SQL Server, when working with intervals, half-open actually is often easier to program against, even when only dates are involved - as long as you manage to avoid beinig sucked into the mental expectation of closed intervals for dates)
April 13, 2016 at 2:53 am
Thank you Lynn and Hugo,
The rules in the business can get a little confusing and fuzzy, you are right from a logic point of view that most the deals overlap as they start and finish on the same day but this is just how the data is captured and out in the real world aren't overlaps as one promotion in-store would be removed the next would start. This is a debate that has been had and I lost the argument around poor data capture :w00t:
The dataset isn't huge and I am looking at about 5k-10k rows depending on the selection so working through the iterations may work, I have tried this several times but my logic has fallen over but will give your way a try. My issue is that Deal 5 is overlapping with multiple deals (2,3,4) so getting it to have a row number of 1 is a challenge I am struggling to overcome. Thinking about it, maybe I need to find a row number 1 for a customer & product and place them in a table, then doing a second row number count against the remaining records looking for anything that doesn't overlap (my version of an overlap) with the table I have just created. Then run through that same process for row number 2 then so on.
Thank you both for your responses much appreciated, I will let you know how I get on.
Viewing 5 posts - 1 through 5 (of 5 total)
You must be logged in to reply to this topic. Login to reply