Comparing and splitting data

  • I apologise upfront if I don't explain this very well. I'm trying to create some records for a data table (Table1) based on values in another table (Table2).

    Table1 has the simple structure of

    ID (INT), PolicyID (INT) (this is a foreign key linked to a Policies table), Premium (FLOAT), StartDate (DATETIME), EndDate (DATETIME)

    Here is the structure and some example data from Table2

    [font="Courier New"]- PremID - PolicyID - Year - Prem01 - Prem02 - Prem03 - Prem04 - Prem05 - Prem06 - Prem07 - Prem08 - Prem09 - Prem10 - Prem11 - Prem12 -

    - 000001 - 00000001 - 2013 - 100.00 - 100.00 - 100.00 - 100.00 - 100.00 - 100.00 - 100.00 - 100.00 - 100.00 - 130.00 - 130.00 - 130.00 -

    - 000002 - 00000001 - 2014 - 130.00 - 130.00 - 130.00 - 140.00 - 140.00 - 140.00 - 140.00 - 140.00 - 140.00 - 140.00 - 140.00 - 140.00 -[/font]

    PremID is just the PrimaryKey. Then there's the year, and then a load of float fields, each representing a premium paid for each month of the year (01=Jan, 02=Feb etc).

    Now what I am looking to achieve is to create separate rows for Table1 from this data.

    Starting in January (Prem01), I wish to record a separate line each time the premium amount changes. The best way of exlaining this is to give an example based on the above data.

    For the first row (2013), the premium at the start of the year (Prem01) is 100.00. This premium remains until it changes in Prem10 to 130.00. It then remains 130.00 for the rest of the year (Prem10 to Prem12). For this I would want to create 2 rows of data for Table1. The result should look like below.

    [font="Courier New"]

    - ID - PolicyID - Premium - StartDate - EndDate -

    - 000001 - 00000001 - 100.00 - 01-Jan-2013 - 30-Sep-2013 -

    - 000002 - 00000001 - 130.00 - 01-Oct-2013 - 31-Dec-2013 -

    [/font]

    I'd then repeat this for each record in Table2, so based on the 2 lines I have shown above, the end result would be:

    [font="Courier New"]

    - ID - PolicyID - Premium - StartDate - EndDate -

    - 000001 - 00000001 - 100.00 - 01-Jan-2013 - 30-Sep-2013 -

    - 000002 - 00000001 - 130.00 - 01-Oct-2013 - 31-Dec-2013 -

    - 000003 - 00000001 - 130.00 - 01-Jan-2013 - 31-Mar-2014 -

    - 000004 - 00000001 - 140.00 - 01-Apr-2013 - 31-Dec-2014 -

    [/font]

    The source data is inherited from another system which I'm trying to change the layout of how the data is recorded to fit in with the target database.

    Would it be easier to create a view of some sort from Table2, then use some form of WHILE LOOP to do this? I'm just after some ideas on how best to handle this. My T-SQL knowledge is limited and I'm thinking a function of some sort may be required.

    Many thanks in advance. Please ask for more info if needed as I've tried to explain the best I can

    Regards

    Steve

  • Check the following, try to understand it, follow the links and read the articles. Post any questions that you might have.

    CREATE TABLE #Table1 (

    ID INT IDENTITY(1,1), --Try to use something more significant (SomethingID)

    PolicyID INT,

    Premium FLOAT, --Don't use float unless you absolutely need it because it's an approximate numeric. Try numeric/decimal.

    StartDate DATETIME, --If you only care about the date part, use Date

    EndDate DATETIME --Same as above

    )

    --This is how you should post sample data so we don't waste time on creating this.

    CREATE TABLE #Table2(

    PremID int,

    PolicyID int,

    Year int,

    Prem01 decimal(10,2),

    Prem02 decimal(10,2),

    Prem03 decimal(10,2),

    Prem04 decimal(10,2),

    Prem05 decimal(10,2),

    Prem06 decimal(10,2),

    Prem07 decimal(10,2),

    Prem08 decimal(10,2),

    Prem09 decimal(10,2),

    Prem10 decimal(10,2),

    Prem11 decimal(10,2),

    Prem12 decimal(10,2),

    );

    INSERT INTO #Table2

    VALUES

    ( 000001, 00000001, 2013, 100.00, 100.00, 100.00, 100.00, 100.00, 100.00, 100.00, 100.00, 100.00, 130.00, 130.00, 130.00 ),

    ( 000002, 00000001, 2014, 130.00, 130.00, 130.00, 140.00, 140.00, 140.00, 140.00, 140.00, 140.00, 140.00, 140.00, 140.00 );

    --First unpivot the columns. Reference in here: http://www.sqlservercentral.com/articles/CROSS+APPLY+VALUES+UNPIVOT/91234/

    WITH UnpivotedValues AS(

    SELECT PremID,

    PolicyID,

    CAST( CAST( Year AS char(4)) + Prem + '01' AS date) PremMonth,

    PremValue

    FROM #Table2

    CROSS APPLY (VALUES ('01', Prem01),

    ('02', Prem02),

    ('03', Prem03),

    ('04', Prem04),

    ('05', Prem05),

    ('06', Prem06),

    ('07', Prem07),

    ('08', Prem08),

    ('09', Prem09),

    ('10', Prem10),

    ('11', Prem11),

    ('12', Prem12)) u(Prem, PremValue)

    ),

    --Create the groups for the values as needed, more information in here: https://www.simple-talk.com/sql/t-sql-programming/the-sql-of-gaps-and-islands-in-sequences/

    GroupedValues AS(

    SELECT *,

    ROW_NUMBER() OVER( PARTITION BY PremID, PolicyID ORDER BY PremMonth) -

    ROW_NUMBER() OVER( PARTITION BY PremID, PolicyID, PremValue ORDER BY PremMonth) Grouper

    FROM UnpivotedValues

    )

    --Finally, generate the expected results by using the new groups.

    INSERT INTO #Table1(

    PolicyID,

    Premium,

    StartDate,

    EndDate)

    SELECT PolicyID,

    PremValue,

    MIN(PremMonth) AS StartDate, --Needed for the ORDER BY

    DATEADD( dd, -1, DATEADD( MM, 1, MAX(PremMonth)))

    FROM GroupedValues

    GROUP BY PolicyID,

    PremValue,

    Grouper

    ORDER BY PolicyID,

    StartDate;

    SELECT * FROM #Table1;

    GO

    DROP TABLE #Table2;

    DROP TABLE #Table1;

    Luis C.
    General Disclaimer:
    Are you seriously taking the advice and code from someone from the internet without testing it? Do you at least understand it? Or can it easily kill your server?

    How to post data/code on a forum to get the best help: Option 1 / Option 2
  • Hi Lweis,

    Thats great, many thanks. I'll start trying to digest that. One complication I have realised though is it seems the designer of the source database didn't make the Prem01 column January, Prem02 February etc. Instead, they used the tax year of April to March. So:

    Prem01 = April

    Prem02 = May

    Prem03 = June

    Prem04 = July

    Prem05 = August

    Prem06 = September

    Prem07 = October

    Prem08 = November

    Prem09 = December

    Prem10 = January

    Prem11 = February

    Prem12 = March

    So I've changed the UnpivotedValues CTE to allow for this.

    SELECT PremID,

    PolicyID,

    CASE

    WHEN Col IN ('Prem10', 'Prem11', 'Prem12')

    THEN DATEADD(yyyy, 1, (CAST(CAST(TaxYear AS char(4)) + Prem + '01' AS date)))

    ELSE CAST( CAST( TaxYear AS char(4)) + Prem + '01' AS date)

    END AS PremMonth,

    PremValue

    FROM #Table2

    CROSS APPLY (VALUES ('04', 'Prem01', Prem01),

    ('05', 'Prem02', Prem02),

    ('06', 'Prem03', Prem03),

    ('07', 'Prem04', Prem04),

    ('08', 'Prem05', Prem05),

    ('09', 'Prem06', Prem06),

    ('10', 'Prem07', Prem07),

    ('11', 'Prem08', Prem08),

    ('12', 'Prem09', Prem09),

    ('01', 'Prem10', Prem10),

    ('02', 'Prem11', Prem11),

    ('03', 'Prem12', Prem12)) u(Prem, Col, PremValue)

    ),

    It appears to work 🙂 Many thanks for your help

    Regards

    Steve

  • It might be easier to just add 3 months. 🙂

    SELECT PremID,

    PolicyID,

    DATEADD( MM, 3, CAST( CAST( TaxYear AS char(4)) + Prem + '01' AS date)) PremMonth,

    PremValue

    FROM #Table2

    CROSS APPLY (VALUES ('01', Prem01),

    ('02', Prem02),

    ('03', Prem03),

    ('04', Prem04),

    ('05', Prem05),

    ('06', Prem06),

    ('07', Prem07),

    ('08', Prem08),

    ('09', Prem09),

    ('10', Prem10),

    ('11', Prem11),

    ('12', Prem12)) u(Prem, PremValue)

    Luis C.
    General Disclaimer:
    Are you seriously taking the advice and code from someone from the internet without testing it? Do you at least understand it? Or can it easily kill your server?

    How to post data/code on a forum to get the best help: Option 1 / Option 2
  • Ah yeah, and let SQL handle the year change. Nice one 🙂

    I've come across another stumbling block. I think when the end of each tax year is reached, a new line needs to be created, even if the premium hasn;t changed.

    e.g. Say you had this data

    | Premium | StartDate | EndDate |

    | 100.00 | 01/04/2014 | 31/06/2015 |

    | 150.00 | 01/06/2015 | 31/08/2015 |

    This would need to be split into separate lines like this

    | Premium | StartDate | EndDate |

    | 100.00 | 01/04/2014 | 31/03/2015 |

    | 100.00 | 01/04/2015 | 31/05/2015 |

    | 150.00 | 01/06/2015 | 31/08/2015 |

    I'm not quite sure how that would be handled. Essentially forcing a new line after 31st March. Or in the instance of the actaul data, March is actually the last month in each record, so when it finishes splitting 1 existing record in the source data, it runs the procedure against the next record, splits where it needs to (where the premium changes), then onto the next record etc. Does that make sense :ermm:

    I'm going to recommend we don't do this though because it seems daft to me, making more work where it's not necessary

    Regards

    Steve

  • I assumed that PremID was a row identifier, which would handle that when we include it in the partition of the row_number() functions. If it's not, add the TaxYear column to the PARTITION BY and GROUP BY and it will be done.

    Luis C.
    General Disclaimer:
    Are you seriously taking the advice and code from someone from the internet without testing it? Do you at least understand it? Or can it easily kill your server?

    How to post data/code on a forum to get the best help: Option 1 / Option 2
  • Excellent. Thanks for all your help 🙂

    Regards

    Steve

Viewing 7 posts - 1 through 6 (of 6 total)

You must be logged in to reply to this topic. Login to reply