Backfilling a table based on percentage obtained from query

Question

Post reply

Backfilling a table based on percentage obtained from query

a1011

SSC Enthusiast

Points: 180
More actions
June 14, 2010 at 9:34 am

#220051

I have a table in which certain IDs contain an "Event" while other IDs do not contain any "Event".
What I require is that an "Event" be added to the rows where the Event is NULL based on the percentage of the events that are already actually present for a given day.
For example, if on a certain day there 20 rows. 10 of those rows already have an Event while the other 10 do not. Of the rows that already have an event, 2 rows have the Event = 1. Therefore for 20% of the rows that already have an Event, the event = 1. I then want to backfill the table (or create a new table) so that of the rows that do not have an Event, 20% will have the Event = 1.
Based on the query I will then create a procedure that does this. I will then create a job that runs this procedure on a daily basis.
To help you visualize what I need, I have created 2 tables. The table #Test contains the data that I currently have while the table #Test1 contains the results that I need.
The table that I actually have contains over 100,000 rows on a daily basis
Any help is appreciated.
-- This table contains the data I have
if OBJECT_ID('tempdb..#Test') IS NOT NULL DROP TABLE #Test
CREATE TABLE #Test (
ID int,
Datecreated datetime,
[Event] int
)
INSERT INTO #Test (ID, Datecreated, [Event])
SELECT 1, '2010-04-01 06:16:16.623', 0 UNION ALL
SELECT 2, '2010-04-01 06:25:18.940', 2 UNION ALL
SELECT 3, '2010-04-01 07:03:07.110', 1 UNION ALL
SELECT 4, '2010-04-01 07:40:27.217', 2 UNION ALL
SELECT 5, '2010-04-01 07:48:42.450', 4 UNION ALL
SELECT 6, '2010-04-01 07:51:24.890', 3 UNION ALL
SELECT 7, '2010-04-01 08:37:29.953', 1 UNION ALL
SELECT 8, '2010-04-01 08:48:40.890', 4 UNION ALL
SELECT 9, '2010-04-01 09:13:56.633', 4 UNION ALL
SELECT 10, '2010-04-01 09:13:57.634', 3 UNION ALL
SELECT 11, '2010-04-01 08:28:26.523', NULL UNION ALL
SELECT 12, '2010-04-01 07:59:56.269', NULL UNION ALL
SELECT 13, '2010-04-01 07:01:46.733', NULL UNION ALL
SELECT 14, '2010-04-01 05:48:42.450', NULL UNION ALL
SELECT 15, '2010-04-01 09:56:32.620', NULL UNION ALL
SELECT 16, '2010-04-01 08:01:56.455', NULL UNION ALL
SELECT 17, '2010-04-01 06:42:42.480', NULL UNION ALL
SELECT 18, '2010-04-01 10:26:21.235', NULL UNION ALL
SELECT 19, '2010-04-01 08:48:42.450', NULL UNION ALL
SELECT 20,'2010-04-01 09:52:41.653', NULL
--This table contains the result
if OBJECT_ID('tempdb..#Test1') IS NOT NULL DROP TABLE #Test
CREATE TABLE #Test1 (
ID int,
Datecreated datetime,
[Event] int
)
INSERT INTO #Test1 (ID, Datecreated, [Event])
SELECT 1, '2010-04-01 06:16:16.623', 0 UNION ALL
SELECT 2, '2010-04-01 06:25:18.940', 2 UNION ALL
SELECT 3, '2010-04-01 07:03:07.110', 1 UNION ALL
SELECT 4, '2010-04-01 07:40:27.217', 2 UNION ALL
SELECT 5, '2010-04-01 07:48:42.450', 4 UNION ALL
SELECT 6, '2010-04-01 07:51:24.890', 3 UNION ALL
SELECT 7, '2010-04-01 08:37:29.953', 1 UNION ALL
SELECT 8, '2010-04-01 08:48:40.890', 4 UNION ALL
SELECT 9, '2010-04-01 09:13:56.633', 4 UNION ALL
SELECT 10, '2010-04-01 09:13:57.634', 3 UNION ALL
SELECT 11, '2010-04-01 08:28:26.523', 0 UNION ALL
SELECT 12, '2010-04-01 07:59:56.269', 1 UNION ALL
SELECT 13, '2010-04-01 07:01:46.733', 1 UNION ALL
SELECT 14, '2010-04-01 05:48:42.450', 2 UNION ALL
SELECT 15, '2010-04-01 09:56:32.620', 2 UNION ALL
SELECT 16, '2010-04-01 08:01:56.455', 3 UNION ALL
SELECT 17, '2010-04-01 06:42:42.480', 3 UNION ALL
SELECT 18, '2010-04-01 10:26:21.235', 4 UNION ALL
SELECT 19, '2010-04-01 08:48:42.450', 4 UNION ALL
SELECT 20,'2010-04-01 09:52:41.653', 4

Viewing 7 posts - 1 through 7 (of 7 total)

You must be logged in to reply to this topic. Login to reply

LutzM SSC Guru Points: 107047 More actions · Answer 1

How are you planning to deal with tie values?

Eg. 20% = 1 and 20% =2?

Lutz
A pessimist is an optimist with experience.

How to get fast answers to your question[/url]
How to post performance related questions[/url]
Links for Tally Table [/url] , Cross Tabs [/url] and Dynamic Cross Tabs [/url], Delimited Split Function[/url]

a1011 SSC Enthusiast Points: 180 More actions · Answer 2

This is what I require:

if 10% = 0, 20% = 1, 20% = 2, 20% = 3 and 30% = 4 then for the IDs that do not have an Event, I want to insert the Event = 0 for the top 10%, Event = 1 for the next 20%, Event = 2 for the next 20%, Event = 3 for the next 20%, and Event = 4 for the rest of them (which should be the last 30%).

LutzM SSC Guru Points: 107047 More actions · Answer 3

What would be the sorting order to get the "top x" to apply the new events?

What is the business rule in case of non-integer results? (e.g. 10% of 5 rows with NULL values?)

Lutz
A pessimist is an optimist with experience.

How to get fast answers to your question[/url]
How to post performance related questions[/url]
Links for Tally Table [/url] , Cross Tabs [/url] and Dynamic Cross Tabs [/url], Delimited Split Function[/url]

a1011 SSC Enthusiast Points: 180 More actions · Answer 4

What would be the sorting order to get the "top x" to apply the new events?

We can use the "datecreated" for the sorting order

What is the business rule in case of non-integer results? (e.g. 10% of 5 rows with NULL values?)

In case of a non-integer then take the closest integer that occurs before the non integer

e.g the answer for 10% of 5 rows will be 0,

10% of 72 rows will be 7

10% of 108 rows will be 10

Thank you.

a1011 SSC Enthusiast Points: 180 More actions · Answer 5

FYI - I have been using the following query to find the percentage of each Event but after this I cannot find a way to use these Event percentages to apply Events to the IDs that currently do not have any Events.

; with cte1 as

(

select

EVENT,

COUNT(ID)/(select cast(count(event) as float) from #Test) as Percentage

from

#Test

where

Event is not null

group by

Event

)

select * from cte1

LutzM SSC Guru Points: 107047 More actions · Answer 6

It's getting late over here in Europe...

So, instead of a coded version here's a brief description of how I'd do it:

Store the results of the query you just posted in a temp table and build the running total (getting two columns: range_from and range_to) according to the sequence you need.

Based on the data in your source table with event IS NULL build another temp table holding those data and the running total percentage per unassigned row.

Update the final table and set the event to the event value of the first temp table where the unassigned_running_total is between range_from and range_to.

... or, at least, something along those lines... 😉

Maybe one of the U.S. folks will come across this thread and post a coded version. If not, I'll do it in about 15 to 20hrs...

Lutz
A pessimist is an optimist with experience.

How to get fast answers to your question[/url]
How to post performance related questions[/url]
Links for Tally Table [/url] , Cross Tabs [/url] and Dynamic Cross Tabs [/url], Delimited Split Function[/url]