Backfilling a table based on percentage obtained from query

  • I have a table in which certain IDs contain an "Event" while other IDs do not contain any "Event".

    What I require is that an "Event" be added to the rows where the Event is NULL based on the percentage of the events that are already actually present for a given day.

    For example, if on a certain day there 20 rows. 10 of those rows already have an Event while the other 10 do not. Of the rows that already have an event, 2 rows have the Event = 1. Therefore for 20% of the rows that already have an Event, the event = 1. I then want to backfill the table (or create a new table) so that of the rows that do not have an Event, 20% will have the Event = 1.

    Based on the query I will then create a procedure that does this. I will then create a job that runs this procedure on a daily basis.

    To help you visualize what I need, I have created 2 tables. The table #Test contains the data that I currently have while the table #Test1 contains the results that I need.

    The table that I actually have contains over 100,000 rows on a daily basis

    Any help is appreciated.

    -- This table contains the data I have

    if OBJECT_ID('tempdb..#Test') IS NOT NULL DROP TABLE #Test

    CREATE TABLE #Test (

    ID int,

    Datecreated datetime,

    [Event] int

    )

    INSERT INTO #Test (ID, Datecreated, [Event])

    SELECT 1, '2010-04-01 06:16:16.623', 0 UNION ALL

    SELECT 2, '2010-04-01 06:25:18.940', 2 UNION ALL

    SELECT 3, '2010-04-01 07:03:07.110', 1 UNION ALL

    SELECT 4, '2010-04-01 07:40:27.217', 2 UNION ALL

    SELECT 5, '2010-04-01 07:48:42.450', 4 UNION ALL

    SELECT 6, '2010-04-01 07:51:24.890', 3 UNION ALL

    SELECT 7, '2010-04-01 08:37:29.953', 1 UNION ALL

    SELECT 8, '2010-04-01 08:48:40.890', 4 UNION ALL

    SELECT 9, '2010-04-01 09:13:56.633', 4 UNION ALL

    SELECT 10, '2010-04-01 09:13:57.634', 3 UNION ALL

    SELECT 11, '2010-04-01 08:28:26.523', NULL UNION ALL

    SELECT 12, '2010-04-01 07:59:56.269', NULL UNION ALL

    SELECT 13, '2010-04-01 07:01:46.733', NULL UNION ALL

    SELECT 14, '2010-04-01 05:48:42.450', NULL UNION ALL

    SELECT 15, '2010-04-01 09:56:32.620', NULL UNION ALL

    SELECT 16, '2010-04-01 08:01:56.455', NULL UNION ALL

    SELECT 17, '2010-04-01 06:42:42.480', NULL UNION ALL

    SELECT 18, '2010-04-01 10:26:21.235', NULL UNION ALL

    SELECT 19, '2010-04-01 08:48:42.450', NULL UNION ALL

    SELECT 20,'2010-04-01 09:52:41.653', NULL

    --This table contains the result

    if OBJECT_ID('tempdb..#Test1') IS NOT NULL DROP TABLE #Test

    CREATE TABLE #Test1 (

    ID int,

    Datecreated datetime,

    [Event] int

    )

    INSERT INTO #Test1 (ID, Datecreated, [Event])

    SELECT 1, '2010-04-01 06:16:16.623', 0 UNION ALL

    SELECT 2, '2010-04-01 06:25:18.940', 2 UNION ALL

    SELECT 3, '2010-04-01 07:03:07.110', 1 UNION ALL

    SELECT 4, '2010-04-01 07:40:27.217', 2 UNION ALL

    SELECT 5, '2010-04-01 07:48:42.450', 4 UNION ALL

    SELECT 6, '2010-04-01 07:51:24.890', 3 UNION ALL

    SELECT 7, '2010-04-01 08:37:29.953', 1 UNION ALL

    SELECT 8, '2010-04-01 08:48:40.890', 4 UNION ALL

    SELECT 9, '2010-04-01 09:13:56.633', 4 UNION ALL

    SELECT 10, '2010-04-01 09:13:57.634', 3 UNION ALL

    SELECT 11, '2010-04-01 08:28:26.523', 0 UNION ALL

    SELECT 12, '2010-04-01 07:59:56.269', 1 UNION ALL

    SELECT 13, '2010-04-01 07:01:46.733', 1 UNION ALL

    SELECT 14, '2010-04-01 05:48:42.450', 2 UNION ALL

    SELECT 15, '2010-04-01 09:56:32.620', 2 UNION ALL

    SELECT 16, '2010-04-01 08:01:56.455', 3 UNION ALL

    SELECT 17, '2010-04-01 06:42:42.480', 3 UNION ALL

    SELECT 18, '2010-04-01 10:26:21.235', 4 UNION ALL

    SELECT 19, '2010-04-01 08:48:42.450', 4 UNION ALL

    SELECT 20,'2010-04-01 09:52:41.653', 4

  • How are you planning to deal with tie values?

    Eg. 20% = 1 and 20% =2?



    Lutz
    A pessimist is an optimist with experience.

    How to get fast answers to your question[/url]
    How to post performance related questions[/url]
    Links for Tally Table [/url] , Cross Tabs [/url] and Dynamic Cross Tabs [/url], Delimited Split Function[/url]

  • This is what I require:

    if 10% = 0, 20% = 1, 20% = 2, 20% = 3 and 30% = 4 then for the IDs that do not have an Event, I want to insert the Event = 0 for the top 10%, Event = 1 for the next 20%, Event = 2 for the next 20%, Event = 3 for the next 20%, and Event = 4 for the rest of them (which should be the last 30%).

  • What would be the sorting order to get the "top x" to apply the new events?

    What is the business rule in case of non-integer results? (e.g. 10% of 5 rows with NULL values?)



    Lutz
    A pessimist is an optimist with experience.

    How to get fast answers to your question[/url]
    How to post performance related questions[/url]
    Links for Tally Table [/url] , Cross Tabs [/url] and Dynamic Cross Tabs [/url], Delimited Split Function[/url]

  • What would be the sorting order to get the "top x" to apply the new events?

    We can use the "datecreated" for the sorting order

    What is the business rule in case of non-integer results? (e.g. 10% of 5 rows with NULL values?)

    In case of a non-integer then take the closest integer that occurs before the non integer

    e.g the answer for 10% of 5 rows will be 0,

    10% of 72 rows will be 7

    10% of 108 rows will be 10

    Thank you.

  • FYI - I have been using the following query to find the percentage of each Event but after this I cannot find a way to use these Event percentages to apply Events to the IDs that currently do not have any Events.

    ; with cte1 as

    (

    select

    EVENT,

    COUNT(ID)/(select cast(count(event) as float) from #Test) as Percentage

    from

    #Test

    where

    Event is not null

    group by

    Event

    )

    select * from cte1

  • It's getting late over here in Europe...

    So, instead of a coded version here's a brief description of how I'd do it:

    Store the results of the query you just posted in a temp table and build the running total (getting two columns: range_from and range_to) according to the sequence you need.

    Based on the data in your source table with event IS NULL build another temp table holding those data and the running total percentage per unassigned row.

    Update the final table and set the event to the event value of the first temp table where the unassigned_running_total is between range_from and range_to.

    ... or, at least, something along those lines... 😉

    Maybe one of the U.S. folks will come across this thread and post a coded version. If not, I'll do it in about 15 to 20hrs...



    Lutz
    A pessimist is an optimist with experience.

    How to get fast answers to your question[/url]
    How to post performance related questions[/url]
    Links for Tally Table [/url] , Cross Tabs [/url] and Dynamic Cross Tabs [/url], Delimited Split Function[/url]

Viewing 7 posts - 1 through 7 (of 7 total)

You must be logged in to reply to this topic. Login to reply