Fill missing rows in dataset

  • I have data in table like this.

    im

    I want to fill/insert the rows with missing quarters from 2019-Q1 to 2022-Q4 with Sales as 0 for missing quarters and rest of the dimensions as it is.

    For example: West has only two rows of quarter 2019-Q1 AND 2020-Q3. I also need other 14 rows of remaining quarters between 2019-Q1 AND 2022-Q4.

    Sample fiddle data set: https://www.db-fiddle.com/f/a819dNsKzdk8kXiYjkRNzz/2

    Can some please help with sql code to achieve the same ?

  • ;WITH cte_distinct_customer_regions AS (
    SELECT DISTINCT Customer_Key, Customer_Name, Component, Region
    FROM Fill_Gaps
    )

    SELECT
    cdc.Customer_Key, cdc.Customer_Name, cdc.Component, cdc.Region,
    q.Quarter, ISNULL(fg.Sales, 0) AS Sales
    FROM cte_distinct_customer_regions cdc
    CROSS JOIN Quarters q
    LEFT OUTER JOIN Fill_Gaps fg ON fg.Customer_Key = cdc.Customer_key AND fg.Region = cdc.Region AND fg.Quarter = q.Quarter
    ORDER BY cdc.Customer_Name, q.Quarter

    SQL DBA,SQL Server MVP(07, 08, 09) A socialist is someone who will give you the shirt off *someone else's* back.

  • @ombir ,

    I know that the Fill_Gaps table is just a test table here.  What value is returned when you run the following code against the real table?

       WITH cteUnique AS
    (
    SELECT DISTINCT Customer_Key,Customer_Name,Component,Region
    FROM PutRealTableNameHre --<-----<<<<< LOOK!!! CHANGE THIS BEFORE YOU RUN IT!!!!
    )
    SELECT COUNT(*)
    FROM cteUnique
    ;

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

    Change is inevitable... Change for the better is not.


    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)

  • Thanks Scott. I will check this out.

  • Jeff Moden wrote:

    @Ombir ,

    I know that the Fill_Gaps table is just a test table here.  What value is returned when you run the following code against the real table?

       WITH cteUnique AS
    (
    SELECT DISTINCT Customer_Key,Customer_Name,Component,Region
    FROM PutRealTableNameHre --<-----<<<<< LOOK!!! CHANGE THIS BEFORE YOU RUN IT!!!!
    )
    SELECT COUNT(*)
    FROM cteUnique
    ;

    Thanks Jeff for having a look. The count is 8,518,436.

     

  • Let me give more context on this. I am trying to identify if multiple components have been purchased by customers (Sales>0) in Last 4 quarters and then on basis of components purchased club the customer into 3 groups.

    Group 1: Purchased Component 1 and Component 2 (Rolling 4 quarters)

    Group 2: Purchased Component 1 and Component 3 (Rolling 4 quarters)

    Group 3: Purchased All three components (Rolling 4 quarters)

    Basically a customer can fall into Group 1 ,say in 2020-Q3 and then into Group 3 in 2021-Q1 and then into Group 2 in 2022-Q3. The group is decided on rolling sales in last 4 quarters at any point of time. I have generated another data set https://www.db-fiddle.com/f/neBPKhKbTu7cYgUwLf2naa/1 which is sufficient to generate all 3 groups. In my data if there is no sales in a quarter for component then that row isn't available.

    Thats why I am trying to fill the missing quarters. May be there is direct solution which don't need to fill the gaps but I am not sure how to do it.

  • Ombir wrote:

    Let me give more context on this. I am trying to identify if multiple components have been purchased by customers (Sales>0) in Last 4 quarters and then on basis of components purchased club the customer into 3 groups.

    Group 1: Purchased Component 1 and Component 2 (Rolling 4 quarters) Group 2: Purchased Component 1 and Component 3 (Rolling 4 quarters) Group 3: Purchased All three components (Rolling 4 quarters)

    Basically a customer can fall into Group 1 ,say in 2020-Q3 and then into Group 3 in 2021-Q1 and then into Group 2 in 2022-Q3. The group is decided on rolling sales in last 4 quarters at any point of time. I have generated another data set https://www.db-fiddle.com/f/neBPKhKbTu7cYgUwLf2naa/1 which is sufficient to generate all 3 groups. In my data if there is no sales in a quarter for component then that row isn't available. Thats why I am trying to fill the missing quarters. May be there is direct solution which don't need to fill the gaps but I am not sure how to do it.

    "Rolling 4 quarters"... from what date?

    For example, always the previous 4 quarters not including the current quarter?

    ... or ...

    From a given completed quarter?

    ... or ???

    And what should the output look like?

    AND, are the 3 rules for components 1, 2, and 3 fixed or variable and are there more than 3 components?

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

    Change is inevitable... Change for the better is not.


    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)

  • Jeff Moden wrote:

    Ombir wrote:

    Let me give more context on this. I am trying to identify if multiple components have been purchased by customers (Sales>0) in Last 4 quarters and then on basis of components purchased club the customer into 3 groups.

    Group 1: Purchased Component 1 and Component 2 (Rolling 4 quarters) Group 2: Purchased Component 1 and Component 3 (Rolling 4 quarters) Group 3: Purchased All three components (Rolling 4 quarters)

    Basically a customer can fall into Group 1 ,say in 2020-Q3 and then into Group 3 in 2021-Q1 and then into Group 2 in 2022-Q3. The group is decided on rolling sales in last 4 quarters at any point of time. I have generated another data set https://www.db-fiddle.com/f/neBPKhKbTu7cYgUwLf2naa/1 which is sufficient to generate all 3 groups. In my data if there is no sales in a quarter for component then that row isn't available. Thats why I am trying to fill the missing quarters. May be there is direct solution which don't need to fill the gaps but I am not sure how to do it.

    "Rolling 4 quarters"... from what date?

    For example, always the previous 4 quarters not including the current quarter?

    ... or ...

    From a given completed quarter?

    ... or ???

    And what should the output look like?

    AND, are the 3 rules for components 1, 2, and 3 fixed or variable and are there more than 3 components?

    1.  Rolling 4 quarters include current quarter/row. For example, if we have to report the Sales for Group 1 for 2021-Q3, then we have to check if that customer has purchased both component 1 and 2 in any of last 4 quarters(2021-Q3,2021-Q2,2021-Q1,2020-Q4). If yes then in 2021-Q3, we report that customer under Group1 and whatever is sales in 2021-Q3 will add with other customers of similar Group 1. This grouping is mutually exclusive. Priority of Group 3 will be highest. If in any quarter if a customer falls into say both Group 1 and Group 3 then it will assigned to Group 3. Group 2 and 1 falls under same priority. In any quarter customer cannot falls into more than 1 Group.
    2. Updated the fiddle with required Grouping column here https://www.db-fiddle.com/f/neBPKhKbTu7cYgUwLf2naa/3
    3. ex
    4. Component is fixed at 3 types only. There can be one more category i.e Group 4. If a customer has purchased only Single component in any of the last 4 quarters.

     

     

    • This reply was modified 1 year, 9 months ago by  Ombir.
  • So, just to be sure, "Component N" and "Region" can actually be anything, correct?  And the goal is to assign the "N" of "Group N"?  Do the requirements of "Group N" change because "Component "N" changes?

    For example, for one company, Components 1, 2, and 3 might be Pistons, Sawdust Pumps, and Cow Catchers while Components 1, 2, and 3 might be Seat Covers, Bum  Warmers, and Back Scratchers for a different company?

    And can the definitions of "Group" change?

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

    Change is inevitable... Change for the better is not.


    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)

  • Jeff Moden wrote:

    So, just to be sure, "Component N" and "Region" can actually be anything, correct?  And the goal is to assign the "N" of "Group N"?  Do the requirements of "Group N" change because "Component "N" changes?

    For example, for one company, Components 1, 2, and 3 might be Pistons, Sawdust Pumps, and Cow Catchers while Components 1, 2, and 3 might be Seat Covers, Bum  Warmers, and Back Scratchers for a different company?

    And can the definitions of "Group" change?

    No. Components will be same (Fixed 3) for every company and definition of group will be also same.

  • @jeff Any update on above ?

Viewing 11 posts - 1 through 10 (of 10 total)

You must be logged in to reply to this topic. Login to reply