June 14, 2010 at 9:34 am
I have a table in which certain IDs contain an "Event" while other IDs do not contain any "Event".
What I require is that an "Event" be added to the rows where the Event is NULL based on the percentage of the events that are already actually present for a given day.
For example, if on a certain day there 20 rows. 10 of those rows already have an Event while the other 10 do not. Of the rows that already have an event, 2 rows have the Event = 1. Therefore for 20% of the rows that already have an Event, the event = 1. I then want to backfill the table (or create a new table) so that of the rows that do not have an Event, 20% will have the Event = 1.
Based on the query I will then create a procedure that does this. I will then create a job that runs this procedure on a daily basis.
To help you visualize what I need, I have created 2 tables. The table #Test contains the data that I currently have while the table #Test1 contains the results that I need.
The table that I actually have contains over 100,000 rows on a daily basis
Any help is appreciated.
-- This table contains the data I have
if OBJECT_ID('tempdb..#Test') IS NOT NULL DROP TABLE #Test
CREATE TABLE #Test (
ID int,
Datecreated datetime,
[Event] int
)
INSERT INTO #Test (ID, Datecreated, [Event])
SELECT 1, '2010-04-01 06:16:16.623', 0 UNION ALL
SELECT 2, '2010-04-01 06:25:18.940', 2 UNION ALL
SELECT 3, '2010-04-01 07:03:07.110', 1 UNION ALL
SELECT 4, '2010-04-01 07:40:27.217', 2 UNION ALL
SELECT 5, '2010-04-01 07:48:42.450', 4 UNION ALL
SELECT 6, '2010-04-01 07:51:24.890', 3 UNION ALL
SELECT 7, '2010-04-01 08:37:29.953', 1 UNION ALL
SELECT 8, '2010-04-01 08:48:40.890', 4 UNION ALL
SELECT 9, '2010-04-01 09:13:56.633', 4 UNION ALL
SELECT 10, '2010-04-01 09:13:57.634', 3 UNION ALL
SELECT 11, '2010-04-01 08:28:26.523', NULL UNION ALL
SELECT 12, '2010-04-01 07:59:56.269', NULL UNION ALL
SELECT 13, '2010-04-01 07:01:46.733', NULL UNION ALL
SELECT 14, '2010-04-01 05:48:42.450', NULL UNION ALL
SELECT 15, '2010-04-01 09:56:32.620', NULL UNION ALL
SELECT 16, '2010-04-01 08:01:56.455', NULL UNION ALL
SELECT 17, '2010-04-01 06:42:42.480', NULL UNION ALL
SELECT 18, '2010-04-01 10:26:21.235', NULL UNION ALL
SELECT 19, '2010-04-01 08:48:42.450', NULL UNION ALL
SELECT 20,'2010-04-01 09:52:41.653', NULL
--This table contains the result
if OBJECT_ID('tempdb..#Test1') IS NOT NULL DROP TABLE #Test
CREATE TABLE #Test1 (
ID int,
Datecreated datetime,
[Event] int
)
INSERT INTO #Test1 (ID, Datecreated, [Event])
SELECT 1, '2010-04-01 06:16:16.623', 0 UNION ALL
SELECT 2, '2010-04-01 06:25:18.940', 2 UNION ALL
SELECT 3, '2010-04-01 07:03:07.110', 1 UNION ALL
SELECT 4, '2010-04-01 07:40:27.217', 2 UNION ALL
SELECT 5, '2010-04-01 07:48:42.450', 4 UNION ALL
SELECT 6, '2010-04-01 07:51:24.890', 3 UNION ALL
SELECT 7, '2010-04-01 08:37:29.953', 1 UNION ALL
SELECT 8, '2010-04-01 08:48:40.890', 4 UNION ALL
SELECT 9, '2010-04-01 09:13:56.633', 4 UNION ALL
SELECT 10, '2010-04-01 09:13:57.634', 3 UNION ALL
SELECT 11, '2010-04-01 08:28:26.523', 0 UNION ALL
SELECT 12, '2010-04-01 07:59:56.269', 1 UNION ALL
SELECT 13, '2010-04-01 07:01:46.733', 1 UNION ALL
SELECT 14, '2010-04-01 05:48:42.450', 2 UNION ALL
SELECT 15, '2010-04-01 09:56:32.620', 2 UNION ALL
SELECT 16, '2010-04-01 08:01:56.455', 3 UNION ALL
SELECT 17, '2010-04-01 06:42:42.480', 3 UNION ALL
SELECT 18, '2010-04-01 10:26:21.235', 4 UNION ALL
SELECT 19, '2010-04-01 08:48:42.450', 4 UNION ALL
SELECT 20,'2010-04-01 09:52:41.653', 4
June 14, 2010 at 1:56 pm
How are you planning to deal with tie values?
Eg. 20% = 1 and 20% =2?
June 14, 2010 at 2:04 pm
This is what I require:
if 10% = 0, 20% = 1, 20% = 2, 20% = 3 and 30% = 4 then for the IDs that do not have an Event, I want to insert the Event = 0 for the top 10%, Event = 1 for the next 20%, Event = 2 for the next 20%, Event = 3 for the next 20%, and Event = 4 for the rest of them (which should be the last 30%).
June 14, 2010 at 2:27 pm
What would be the sorting order to get the "top x" to apply the new events?
What is the business rule in case of non-integer results? (e.g. 10% of 5 rows with NULL values?)
June 14, 2010 at 2:39 pm
What would be the sorting order to get the "top x" to apply the new events?
We can use the "datecreated" for the sorting order
What is the business rule in case of non-integer results? (e.g. 10% of 5 rows with NULL values?)
In case of a non-integer then take the closest integer that occurs before the non integer
e.g the answer for 10% of 5 rows will be 0,
10% of 72 rows will be 7
10% of 108 rows will be 10
Thank you.
June 14, 2010 at 2:54 pm
FYI - I have been using the following query to find the percentage of each Event but after this I cannot find a way to use these Event percentages to apply Events to the IDs that currently do not have any Events.
; with cte1 as
(
select
EVENT,
COUNT(ID)/(select cast(count(event) as float) from #Test) as Percentage
from
#Test
where
Event is not null
group by
Event
)
select * from cte1
June 14, 2010 at 5:25 pm
It's getting late over here in Europe...
So, instead of a coded version here's a brief description of how I'd do it:
Store the results of the query you just posted in a temp table and build the running total (getting two columns: range_from and range_to) according to the sequence you need.
Based on the data in your source table with event IS NULL build another temp table holding those data and the running total percentage per unassigned row.
Update the final table and set the event to the event value of the first temp table where the unassigned_running_total is between range_from and range_to.
... or, at least, something along those lines... 😉
Maybe one of the U.S. folks will come across this thread and post a coded version. If not, I'll do it in about 15 to 20hrs...
Viewing 7 posts - 1 through 7 (of 7 total)
You must be logged in to reply to this topic. Login to reply