RBAR to Set Based Solution; for the honour and the glory.

  • Hello,

    I have a table in a 3rd party db that is -imho - very badly designed. There is nothing i can do about this and i need the data to build some reports. Currently i import the data using DTS and through a series of rather longwinded procedures including an RBAR activex script sort this mess out. This data is a subset of a larger problem but if i can find a set based solution to it will help me answer the larger question more efficiently.

    This data is a list of bonuses for accounts that buy particular products. i.e. if an account buys 23 of A at £10 per bonus then they get £230 as long as the transaction occurs in the bonus period.

    Once i've done some preliminary chopping up i get to this stage- just a small sample.

    Account--Product--------Start Date--------End Date-----------Bonus

    20943--------A--------26/01/2005-------------------------------10

    20943--------A--------01/02/2005-------------------------------10

    20943--------A--------01/05/2007--------------------------------0

    20943--------B--------01/04/2003-------------------------------30

    20943--------C--------01/04/2003-------------------------------20

    20943--------C--------01/08/2004-------------------------------25

    20943--------C--------16/02/2006-------------------------------35

    20943--------C--------01/03/2006-------------------------------35

    20943--------D--------01/04/2003-------------------------------15

    20943--------D--------01/08/2004-------------------------------30

    21000--------A--------28/01/2005-------------------------------10

    21000--------A--------01/04/2005-------------------------------10

    21000--------B--------01/04/2006-------------------------------30

    21000--------B--------01/07/2006-------------------------------25

    21000--------C--------15/04/2003-------------------------------20

    21000--------C--------01/08/2004-------------------------------25

    21000--------C--------16/02/2006-------------------------------35

    21000--------C--------01/03/2006-------------------------------35

    The data should end up looking like this.

    Account--Product------Start Date-----------End Date------Bonus

    20943--------A--------26/01/2005--------31/01/2005--------10

    20943--------A--------01/02/2005--------30/04/2007--------10

    20943--------A--------01/05/2007--------01/05/2007--------0

    20943--------B--------01/04/2003--------01/10/2007--------30

    20943--------C--------01/04/2003--------31/07/2004--------20

    20943--------C--------01/08/2004--------15/02/2006--------25

    20943--------C--------16/02/2006--------28/02/2006--------35

    20943--------C--------01/03/2006--------01/10/2007--------35

    20943--------D--------01/04/2003--------31/07/2004--------15

    20943--------D--------01/08/2004--------01/10/2007--------30

    21000--------A--------28/01/2005--------31/03/2005--------10

    21000--------A--------01/04/2005--------01/10/2007--------10

    21000--------B--------01/04/2006--------30/06/2006--------30

    21000--------B--------01/07/2006--------01/10/2007--------25

    21000--------C--------15/04/2003--------31/07/2004--------20

    21000--------C--------01/08/2004--------15/02/2006--------25

    21000--------C--------16/02/2006--------28/02/2006--------35

    21000--------C--------01/03/2006--------01/10/2007--------35

    21000--------D--------01/04/2003--------31/07/2004--------15

    21000--------D--------01/08/2004--------01/10/2007--------30

    The logic should be.

    If a product appears in an account only once then the end date = CurrentDate()

    Account--------Product-------Start Date------End Date--------Bonus

    20943-------------B-----------01/04/2003-----------------------30

    Becomes;

    20943-------------B-----------01/04/2003---------01/10/2007---30

    If a product appears in an account more than once then the newest record by date has an end date = currentdate unless that record has a bonus of 0 in which case this is a terminating record. You could use the currentdate as the end date in which case the report query will simply calculate 0 for all products bought between the start date and current date - i think that's inefficient - or you could use the start date of that record.

    Then in reverse chronological order each record takes it's end date from the start date -1 day of the next newest record.

    Account---Product--------Start Date--------End Date-------Bonus

    20943-------A-----------26/01/2005-------------------------10

    20943-------A-----------01/02/2005-------------------------10

    20943-------A-----------01/05/2007-------------------------0

    Becomes;

    Account---Product--------Start Date--------End Date-------Bonus

    20943-------A-----------26/01/2005-------------------------10

    20943-------A-----------01/02/2005-------------------------10

    20943-------A-----------01/05/2007---------01/05/2007------0

    Becomes;

    Account---Product--------Start Date--------End Date-------Bonus

    20943-------A-----------26/01/2005-------------------------10

    20943-------A-----------01/02/2005---------30/04/2007-----10

    20943-------A-----------01/05/2007---------01/05/2007------0

    Becomes;

    Account---Product--------Start Date--------End Date-------Bonus

    20943-------A-----------26/01/2005--------31/01/2005------10

    20943-------A-----------01/02/2005---------30/04/2007-----10

    20943-------A-----------01/05/2007---------01/05/2007------0

    Of course you could start from the oldest record first it makes no real difference as far as i can see 🙂

    Now i enjoy a challenge as much as the next person but a year ago i spent far to long trying to work this out in a set based way and gave up in disgust angry: So for the honour and the glory has anyone got a way of doing this?

    Thanks

    K.

  • probably won't help much, but can you post a bit of the schema.


    Everything you can imagine is real.

  • Let's see if I understood it correctly. You have a series of rows, where each row has a certain StartDate. Inside each series, you want to generate EndDate for each row, so that it is always one day before the next member's StartDate.

    Is that all what you are trying to achieve? Or is there more to it?

  • still looking at the problem, but here's the DDL I created for testing:

    SET DATEFORMAT dmy

    DECLARE @Sample Table(Account varchar(30), Product varchar(30), StartDate datetime,EndDate datetime,Bonus int)

    INSERT INTO @Sample

    SELECT '20943','A','26/01/2005',NULL,10 UNION

    SELECT '20943','A','01/02/2005',NULL,10 UNION

    SELECT '20943','A','01/05/2007',NULL, 0 UNION

    SELECT '20943','B','01/04/2003',NULL,30 UNION

    SELECT '20943','C','01/04/2003',NULL,20 UNION

    SELECT '20943','C','01/08/2004',NULL,25 UNION

    SELECT '20943','C','16/02/2006',NULL,35 UNION

    SELECT '20943','C','01/03/2006',NULL,35 UNION

    SELECT '20943','D','01/04/2003',NULL,15 UNION

    SELECT '20943','D','01/08/2004',NULL,30 UNION

    SELECT '21000','A','28/01/2005',NULL,10 UNION

    SELECT '21000','A','01/04/2005',NULL,10 UNION

    SELECT '21000','B','01/04/2006',NULL,30 UNION

    SELECT '21000','B','01/07/2006',NULL,25 UNION

    SELECT '21000','C','15/04/2003',NULL,20 UNION

    SELECT '21000','C','01/08/2004',NULL,25 UNION

    SELECT '21000','C','16/02/2006',NULL,35 UNION

    SELECT '21000','C','01/03/2006',NULL,35 UNION

    SELECT '21000','D','01/04/2003',NULL,15 UNION

    SELECT '21000','D','01/08/2004',NULL,30

    SET DATEFORMAT mdy

    SELECT Account,Product,MAX(StartDate) from @Sample group by Account,Product

    Lowell


    --help us help you! If you post a question, make sure you include a CREATE TABLE... statement and INSERT INTO... statement into that table to give the volunteers here representative data. with your description of the problem, we can provide a tested, verifiable solution to your question! asking the question the right way gets you a tested answer the fastest way possible!

  • Lowell,

    I'm afraid you probably got it wrong. It is "Account", "Product", "StartDate", "EndDate" and "Bonus"... at least I think so. Unfortunately the values are not delimited in any way and there are some NULLs that are not explicitly mentioned, so it is pretty hard to get it right.

    Karma, could you please post a CREATE TABLE and INSERT INTO statement for the supplied example, so that we don't have to guess what column ends where?

    OK, I see Karma has already fixed it at least by delimiting the data. Good, although DDL would be even better 😉

  • Now, if I get it right, since the bonus does not change, the second row is not necessary (nothing would change if you delete it). I suppose the system has some rules and generates a row here and then (like on 1st day of each month), but practically if the bonus does not change, the row is unnecessary. Am I right? I'm just trying to understand the data at the moment...

    20943--------A--------26/01/2005-------------------------------10

    20943--------A--------01/02/2005-------------------------------10

  • This should be correct definition of test data:

    SET DATEFORMAT dmy

    CREATE TABLE #Sample(Account VARCHAR(10), Product VARCHAR(10), StartDate DATETIME, EndDate DATETIME, Bonus INT)

    INSERT INTO #Sample

    SELECT '20943','A','26/01/2005', NULL, 10 UNION

    SELECT '20943','A','01/02/2005', NULL, 10 UNION

    SELECT '20943','A','01/05/2007', NULL, 0 UNION

    SELECT '20943','B','01/04/2003', NULL, 30 UNION

    SELECT '20943','C','01/04/2003', NULL, 20 UNION

    SELECT '20943','C','01/08/2004', NULL, 25 UNION

    SELECT '20943','C','16/02/2006', NULL, 35 UNION

    SELECT '20943','C','01/03/2006', NULL, 35 UNION

    SELECT '20943','D','01/04/2003', NULL, 15 UNION

    SELECT '20943','D','01/08/2004', NULL, 30 UNION

    SELECT '21000','A','28/01/2005', NULL, 10 UNION

    SELECT '21000','A','01/04/2005', NULL, 10 UNION

    SELECT '21000','B','01/04/2006', NULL, 30 UNION

    SELECT '21000','B','01/07/2006', NULL, 25 UNION

    SELECT '21000','C','15/04/2003', NULL, 20 UNION

    SELECT '21000','C','01/08/2004', NULL, 25 UNION

    SELECT '21000','C','16/02/2006', NULL, 35 UNION

    SELECT '21000','C','01/03/2006', NULL, 35 UNION

    SELECT '21000','D','01/04/2003', NULL, 15 UNION

    SELECT '21000','D','01/08/2004', NULL, 30

    -- SET DATEFORMAT mdy /*depends on user whether they want to set this dateformat or not, so commented out*/

  • Hi,

    :blush:

    So many replies so quick.

    I've modified Lowell's statement to generate the data in a table

    thanks for the pointer i'll remeber to do that next time.

    Create Table BONUS_TEMP (Account varchar(30), Product varchar(30), StartDate datetime,EndDate datetime null,Bonus int)

    SET DATEFORMAT dmy

    DECLARE @Sample Table(Account varchar(30), Product varchar(30), StartDate datetime,EndDate datetime,Bonus int)

    INSERT INTO @Sample

    SELECT '20943','A','26/01/2005',NULL,10 UNION

    SELECT '20943','A','01/02/2005',NULL,10 UNION

    SELECT '20943','A','01/05/2007',NULL, 0 UNION

    SELECT '20943','B','01/04/2003',NULL,30 UNION

    SELECT '20943','C','01/04/2003',NULL,20 UNION

    SELECT '20943','C','01/08/2004',NULL,25 UNION

    SELECT '20943','C','16/02/2006',NULL,35 UNION

    SELECT '20943','C','01/03/2006',NULL,35 UNION

    SELECT '20943','D','01/04/2003',NULL,15 UNION

    SELECT '20943','D','01/08/2004',NULL,30 UNION

    SELECT '21000','A','28/01/2005',NULL,10 UNION

    SELECT '21000','A','01/04/2005',NULL,10 UNION

    SELECT '21000','B','01/04/2006',NULL,30 UNION

    SELECT '21000','B','01/07/2006',NULL,25 UNION

    SELECT '21000','C','15/04/2003',NULL,20 UNION

    SELECT '21000','C','01/08/2004',NULL,25 UNION

    SELECT '21000','C','16/02/2006',NULL,35 UNION

    SELECT '21000','C','01/03/2006',NULL,35 UNION

    SELECT '21000','D','01/04/2003',NULL,15 UNION

    SELECT '21000','D','01/08/2004',NULL,30

    SET DATEFORMAT mdy

    Insert into BONUS_Temp

    SELECT Account,Product,StartDate, EndDate, Bonus from @Sample

    and one with the "results" in

    Create Table BONUS_TEMP_results (Account varchar(30), Product varchar(30), StartDate datetime,EndDate datetime null,Bonus int)

    SET DATEFORMAT dmy

    DECLARE @sample1-2 Table(Account varchar(30), Product varchar(30), StartDate datetime,EndDate datetime,Bonus int)

    INSERT INTO @sample1-2

    SELECT '20943','A','26/01/2005','31/01/2005',10 UNION

    SELECT '20943','A','01/02/2005','30/04/2007',10 UNION

    SELECT '20943','A','01/05/2007','01/05/2007', 0 UNION

    SELECT '20943','B','01/04/2003','01/10/2007',30 UNION

    SELECT '20943','C','01/04/2003','31/07/2004',20 UNION

    SELECT '20943','C','01/08/2004','15/02/2006',25 UNION

    SELECT '20943','C','16/02/2006','28/02/2006',35 UNION

    SELECT '20943','C','01/03/2006','01/10/2007',35 UNION

    SELECT '20943','D','01/04/2003','31/07/2004',15 UNION

    SELECT '20943','D','01/08/2004','01/10/2007',30 UNION

    SELECT '21000','A','28/01/2005','31/03/2005',10 UNION

    SELECT '21000','A','01/04/2005','01/10/2007',10 UNION

    SELECT '21000','B','01/04/2006','30/06/2006',30 UNION

    SELECT '21000','B','01/07/2006','01/10/2007',25 UNION

    SELECT '21000','C','15/04/2003','31/07/2004',20 UNION

    SELECT '21000','C','01/08/2004','15/02/2006',25 UNION

    SELECT '21000','C','16/02/2006','28/02/2006',35 UNION

    SELECT '21000','C','01/03/2006','01/10/2007',35 UNION

    SELECT '21000','D','01/04/2003','31/07/2004',15 UNION

    SELECT '21000','D','01/08/2004','01/10/2007',30

    SET DATEFORMAT mdy

    Insert into BONUS_Temp_results

    SELECT Account,Product,StartDate, EndDate, Bonus from @sample1-2

    I hope that helps.

    Vladan:

    Sounds as if you can do this 🙂 I must admit when i first tried i thought it would be a cinch, guess i'm not as up to date on my TSQL as i thought :crying: You are right in that effectively -in the example you gave - it could be considered that the bonus runs from the 26/01/2005 to 01/10/2007. I have found such examples in the data, i have no idea why they choose to do this!

    Thanks

    K

  • Yeah, well I had to do some similar things when we were migrating from old system to SQL Server... try this. It creates the test table, inserts data and then performs update on it.

    SET DATEFORMAT dmy

    CREATE TABLE #Sample(Account VARCHAR(10), Product VARCHAR(10), StartDate DATETIME, EndDate DATETIME, Bonus INT)

    /*test data*/

    INSERT INTO #Sample

    SELECT '20943','A','26/01/2005', NULL, 10 UNION

    SELECT '20943','A','01/02/2005', NULL, 10 UNION

    SELECT '20943','A','01/05/2007', NULL, 0 UNION

    SELECT '20943','B','01/04/2003', NULL, 30 UNION

    SELECT '20943','C','01/04/2003', NULL, 20 UNION

    SELECT '20943','C','01/08/2004', NULL, 25 UNION

    SELECT '20943','C','16/02/2006', NULL, 35 UNION

    SELECT '20943','C','01/03/2006', NULL, 35 UNION

    SELECT '20943','D','01/04/2003', NULL, 15 UNION

    SELECT '20943','D','01/08/2004', NULL, 30 UNION

    SELECT '21000','A','28/01/2005', NULL, 10 UNION

    SELECT '21000','A','01/04/2005', NULL, 10 UNION

    SELECT '21000','B','01/04/2006', NULL, 30 UNION

    SELECT '21000','B','01/07/2006', NULL, 25 UNION

    SELECT '21000','C','15/04/2003', NULL, 20 UNION

    SELECT '21000','C','01/08/2004', NULL, 25 UNION

    SELECT '21000','C','16/02/2006', NULL, 35 UNION

    SELECT '21000','C','01/03/2006', NULL, 35 UNION

    SELECT '21000','D','01/04/2003', NULL, 15 UNION

    SELECT '21000','D','01/08/2004', NULL, 30

    -- SET DATEFORMAT mdy /*depends on what is your default...*/

    /*now calculate the EndDate*/

    UPDATE supd

    SET EndDate = Q.newdate

    FROM #sample supd

    JOIN

    (SELECT s.account, s.product, s.startdate, ISNULL(DATEADD(d,-1,MAX(s1.StartDate)),DATEADD(d, DATEDIFF(d, 0, GETDATE()), 0)) as newdate

    FROM #sample s

    LEFT JOIN #sample s1 ON s1.account = s.account AND s1.product=s.product AND s1.StartDate > s.StartDate

    GROUP BY s.account, s.product, s.startdate, s1.startdate) as Q

    ON Q.account = supd.account AND Q.product=supd.product AND Q.StartDate = supd.StartDate

    select * from #sample

    Is this what you needed? If not, please post what needs to be corrected.

  • Hi,

    Vladan, that works perfectly. Thanks very much for this. I wish you could see the mess this table is in. It not only holds this type of bonus but incremental volume related bonuses which pay out weekly and volume bonuses which pay out monthly and all of these come in two different varieties!

    With this as a starting point i should be able to make the procedure for generating some decent tables a lot quicker and a damn site more obvious for the unlucky person who follows me.

    Thanks v.much

    K.

  • I'm glad it worked - but I guess a little warning is in place. This method uses triangular join on StartDate column, so under certain circumstances it could perform badly. I hope that in this particular case the other columns used to join will limit the number of rows enough to make it work well (or at least not too slow) - the warning applies more in case you would decide to use a similar method on some other table under different circumstances.

    There are other ways to achieve the same (e.g. using correlated subquery or by adding IDENTITY column to the table - or inserting rows into temporary table with identity column), but my guess is that posted solution could work best. I don't guarantee that it really IS the best solution, it's up to you to find out whether it's fine or not 😎

    You may need to add some index(es) to get better performance.

  • Hi,

    Yes i did spot that however this is for loading data into a reporting system and is done after close of business so ultimate speed is not essential. More importantly in this case i needed clarity of process and as i managed to understand what you had done i think that has been achieved 🙂 As it stands it takes a small fraction of the time that the ActiveX script took to do the same job so no worries on that score.

    Thanks

    K.

Viewing 12 posts - 1 through 11 (of 11 total)

You must be logged in to reply to this topic. Login to reply