Update query with historic accumulative SUM in fields by date range

  • MatiOli

    SSC Veteran

    Points: 278

    Hello there,

    I am trying to complete a table I have with an accumulated value but not sure how to make it work. I have tried a nested query with no luck.

    An example table would be:

    CREATE TABLE [dbo].[test_table](
    [reg_date] [date] NULL,
    [store] [int] NULL,
    [sales_amt] [int] NULL,
    [sales_items] [int] NULL,
    [v_L_4_d] [INT] NULL,
    [v_i_L_4_d] [INT] NULL
    ) ON [PRIMARY]

    GO
    INSERT INTO [dbo].[test_table]
    (reg_date,store,sales_amt,sales_items)
    VALUES
    ('2019-04-30',1,1000,5),
    ('2019-04-29',1,1000,5),
    ('2019-04-28',1,1000,5),
    ('2019-04-27',1,1000,5),
    ('2019-04-26',1,1000,5),
    ('2019-04-25',1,1000,5),
    ('2019-04-24',1,1000,5),
    ('2019-04-23',1,1000,5),
    ('2019-04-22',1,1000,5),
    ('2019-04-21',1,1000,5),
    ('2019-04-20',1,1000,5),
    ('2019-04-19',1,1000,5),
    ('2019-04-18',1,1000,5),
    ('2019-04-17',1,1000,5),
    ('2019-04-16',1,1000,5),
    ('2019-04-15',1,1000,5),
    ('2019-04-14',1,1000,5),
    ('2019-04-13',1,1000,5),
    ('2019-04-12',1,1000,5),

    ('2019-04-30',2,1000,5),
    ('2019-04-29',2,1000,5),
    ('2019-04-28',2,1000,5),
    ('2019-04-27',2,1000,5),
    ('2019-04-26',2,1000,5),
    ('2019-04-25',2,1000,5),
    ('2019-04-24',2,1000,5),
    ('2019-04-23',2,1000,5),
    ('2019-04-22',2,1000,5),
    ('2019-04-21',2,1000,5),
    ('2019-04-20',2,1000,5),
    ('2019-04-19',2,1000,5),
    ('2019-04-18',2,1000,5),
    ('2019-04-17',2,1000,5),
    ('2019-04-16',2,1000,5),
    ('2019-04-15',2,1000,5),
    ('2019-04-14',2,1000,5),
    ('2019-04-13',2,1000,5),
    ('2019-04-12',1,1000,5)


    What I need to do is, running an update query, to complete the last two columns in the table.

    For instance for first record with date  2019-04-30 , store 1 , sales_amt 1000 and sales_item 5, after running the process I should have in the last two column for this record:

    v_L_4_d = 4000 (which is the addition of sales_amt value in records 2019-04-30, 2019-04-29, 2019-04-28, 2019-04-27 for store 1)

    v_i_L_4_d= 20 (which is the addition of sales_items value in records 2019-04-30, 2019-04-29, 2019-04-28, 2019-04-27 for store 1)

    Values are grouped by store number. No data is mixed between stores

    Same idea for all the records.

    Last three records wont have enough historic records for sum 4 values. Never mind, I will complete these with some kind of average by hand.

    Thank you in advance!

     

     

     

    • This topic was modified 1 week ago by  MatiOli.
  • Jonathan AC Roberts

    SSCoach

    Points: 16844

    UPDATE x
    SET x.v_i_L_4_d = z.sum_sales_amt,
    x.v_L_4_d = z.sum_sales_items
    FROM [dbo].[test_table] x
    CROSS APPLY(SELECT SUM(w.sales_amt) sum_sales_amt,
    SUM(w.sales_items) sum_sales_items
    FROM (SELECT TOP(4)
    z.sales_amt,
    z.sales_items
    FROM dbo.test_table z
    WHERE z.store = x.store
    AND z.reg_date <= x.reg_date
    ORDER BY z.reg_date DESC) AS w) AS z

    Where there are less than 4 previous rows you can use the results from the rows found to estimate the number that there would have been with this query:

    UPDATE x
    SET x.v_i_L_4_d = z.sum_sales_amt,
    x.v_L_4_d = z.sum_sales_items
    FROM [dbo].[test_table] x
    CROSS APPLY(SELECT SUM(w.sales_amt) * 4.0/COUNT(*) sum_sales_amt ,
    SUM(w.sales_items) * 4.0/COUNT(*) sum_sales_items
    FROM (SELECT TOP(4)
    z.sales_amt,
    z.sales_items
    FROM dbo.test_table z
    WHERE z.store = x.store
    AND z.reg_date <= x.reg_date
    ORDER BY z.reg_date DESC) AS w) AS z
  • jcelko212 32090

    SSCrazy Eights

    Points: 8821

    You don't seem to understand that a table must have a key by definition. But what you posted because of the columns are knowable can never have a key of any kind! Your sales amount should be a decimal because there is no currency on earth that is not decimal anymore. The data element name "store" is incomplete; it needs what ISO calls an attribute property. I'm going to guess that you want tell us that it is a store number of some kind which would make it a nominal scale; nominal scales are never numeric. Likewise items count is wrong.

    Basically, we just posted a deck of punch cards written in SQL!

    CREATE TABLE Sales

    (reg_date DATE NOT NULL,

    store_nbr CHAR(2) NOT NULL,

    PRIMARY KEY (reg_date, store_nbr),

    sales_amt DECIMAL(8, 2) NOT NULL

    CHECK (sales_amt >= 0.00),

    _items_cnt INTEGER NOT NULL

    CHECK (sales_item_cnt >= 0));

    You had a duplicate row.

    INSERT INTO Sales

    (reg_date, store_nbr, sales_amt, sales_items_cnt)

    VALUES

    ('2019-04-12', '01', 1000.00, 5),

    ('2019-04-13', '02', 1000.00, 5),

    ('2019-04-13', '01', 1000.00, 5),

    ('2019-04-14', '02', 1000.00, 5),

    ('2019-04-14', '01', 1000.00, 5),

    ('2019-04-15', '02', 1000.00, 5),

    ('2019-04-15', '01', 1000.00, 5),

    ('2019-04-16', '02', 1000.00, 5),

    ('2019-04-16', '01', 1000.00, 5),

    ('2019-04-17', '02', 1000.00, 5),

    ('2019-04-17', '01', 1000.00, 5),

    ('2019-04-18', '02', 1000.00, 5),

    ('2019-04-18', '01', 1000.00, 5),

    ('2019-04-19', '02', 1000.00, 5),

    ('2019-04-19', '01', 1000.00, 5),

    ('2019-04-20', '02', 1000.00, 5),

    ('2019-04-20', '01', 1000.00, 5),

    ('2019-04-21', '02', 1000.00, 5),

    ('2019-04-21', '01', 1000.00, 5),

    ('2019-04-22', '02', 1000.00, 5),

    ('2019-04-22', '01', 1000.00, 5),

    ('2019-04-23', '02', 1000.00, 5),

    ('2019-04-23', '01', 1000.00, 5),

    ('2019-04-24', '02', 1000.00, 5),

    ('2019-04-24', '01', 1000.00, 5),

    ('2019-04-25', '02', 1000.00, 5),

    ('2019-04-25', '01', 1000.00, 5),

    ('2019-04-26', '02', 1000.00, 5),

    ('2019-04-26', '01', 1000.00, 5),

    ('2019-04-27', '02', 1000.00, 5),

    ('2019-04-27', '01', 1000.00, 5),

    ('2019-04-28', '02', 1000.00, 5),

    ('2019-04-28', '01', 1000.00, 5),

    ('2019-04-29', '02', 1000.00, 5),

    ('2019-04-29', '01', 1000.00, 5),

    ('2019-04-30', '02', 1000.00, 5),

    ('2019-04-30', '01', 1000.00, 5);

    >> What I need to do is, running an update query, to complete the last two columns in the table. <<

    The next question is why are you materializing a computation inside a table? Unlike your punchcards, a table can have virtual columns and rows which are just as real as if they were materialized.

    >> For instance for first record [sic] with date 2019-04-30, store_nbr 1, sales_amt 1000 and sales_item 5, after running the process I should have in the last two column for this record [sic] <<

    Rows are completely different from records. As you know that a punch card is called a unit record? Again you're doing punchcards not RDBMS! You have to punch out results in a punch card like you're trying to do but not the tableValues are grouped by store_nbr number.

    SELECT reg_date, store_nbr,

    SUM(sales_amt)

    OVER (PARTITION BY store_nbr

    ORDER BY reg_date DESC

    ROWS BETWEEN 3 PRECEDING AND CURRENT ROW),

    SUM(sales_items_cnt)

    OVER (PARTITION BY store_nbr

    ORDER BY reg_date DESC

    ROWS BETWEEN 3 PRECEDING AND CURRENT ROW)

    FROM Sales; please don't use the proprietary Sybase/Microsoft TOP- ORDER BY syntax. This is so highly proprietary it will never port, has no flexibility, etc,

    Please post DDL and follow ANSI/ISO standards when asking for help. 

  • drew.allen

    SSC Guru

    Points: 76525

    When you provide sample data, you should also provide expected results.  This helps us compare our tests against what is expected to see how closely we match the expectations.  You haven't provided expected results.

    When you provide sample data, it should be representative of your actual data.  Your data has EXACTLY THE SAME VALUES for every single sales_amt and sales_items.  I seriously doubt that your sales are so consistent.  This makes it much harder to determine if a query is correct or just happens to get the right answer.  For instance, the following query gives the exact same results, but is obviously wrong.  If you had more realistic data, the fact that it's wrong would show up in the testing.

    SELECT *
    , tt.sales_amt * 4
    , tt.sales_items * 4
    FROM #test_table AS tt

    The following query gives the same results as Jonathan's, but is much faster, because it only requires one table scan.

    SELECT *
    , SUM(tt.sales_amt) OVER(PARTITION BY tt.store ORDER BY tt.reg_date ROWS BETWEEN 3 PRECEDING AND CURRENT ROW) * 4.0/COUNT(*) OVER(PARTITION BY tt.store ORDER BY tt.reg_date ROWS BETWEEN 3 PRECEDING AND CURRENT ROW)
    , SUM(tt.sales_items) OVER(PARTITION BY tt.store ORDER BY tt.reg_date ROWS BETWEEN 3 PRECEDING AND CURRENT ROW) * 4.0/COUNT(*) OVER(PARTITION BY tt.store ORDER BY tt.reg_date ROWS BETWEEN 3 PRECEDING AND CURRENT ROW)
    FROM #test_table AS tt

    Drew

    J. Drew Allen
    Business Intelligence Analyst
    Philadelphia, PA

  • Jonathan AC Roberts

    SSCoach

    Points: 16844

    Drew's method is a better more efficient way of getting the data.

    If you want to turn it into an UPDATE statement you will need to put the SELECT inside a common table expression.

    ;WITH cte AS
    (
    SELECT *,
    SUM(tt.sales_amt) OVER(PARTITION BY tt.store ORDER BY tt.reg_date ROWS BETWEEN 3 PRECEDING AND CURRENT ROW) * 4.0/COUNT(*) OVER(PARTITION BY tt.store ORDER BY tt.reg_date ROWS BETWEEN 3 PRECEDING AND CURRENT ROW) sum_sales_amt,
    SUM(tt.sales_items) OVER(PARTITION BY tt.store ORDER BY tt.reg_date ROWS BETWEEN 3 PRECEDING AND CURRENT ROW) * 4.0/COUNT(*) OVER(PARTITION BY tt.store ORDER BY tt.reg_date ROWS BETWEEN 3 PRECEDING AND CURRENT ROW) sum_sales_items
    FROM [dbo].[test_table] AS tt
    )
    UPDATE cte
    SET cte.v_i_L_4_d=x.sum_sales_amt,
    cte.v_L_4_d=x.sum_sales_items

     

  • MatiOli

    SSC Veteran

    Points: 278

    Hello jcelko212 32090,

    Thank you for your comments.

    What I posted is a dummy example to get the solution. Table, data, field names an types are quite more complex in real life. But I preffer to keep it simple.

    I do have an index and in fact there is a third level as department. Fields date, store and department are the index as you well suggested.

    Thanks for sharing your solution

     

  • MatiOli

    SSC Veteran

    Points: 278

    Hello drew.allen

    I will keep your comments in mind for next time.Thanks

    Thank you Drew and Jonathan AC Roberts for your solutions!

     

  • Jeff Moden

    SSC Guru

    Points: 994523

    jcelko212 32090 wrote:

    ...nominal scales are never numeric

    Please cite the standard and paragraph or other valid authority where it stipulates that they must not be.

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a row... think, instead, of what you want to do to a column.
    "If you think its expensive to hire a professional to do the job, wait until you hire an amateur."--Red Adair
    "Change is inevitable... change for the better is not."
    When you put the right degree of spin on it, the number 3|8 is also a glyph that describes the nature of a DBAs job. 😉

    Helpful Links:
    How to post code problems
    Create a Tally Function (fnTally)

  • Lynn Pettis

    SSC Guru

    Points: 442138

    Jeff Moden wrote:

    jcelko212 32090 wrote:

    ...nominal scales are never numeric

    Please cite the standard and paragraph or other valid authority where it stipulates that they must not be.

    Jeff Moden wrote:

    jcelko212 32090 wrote:

    ...nominal scales are never numeric

    Please cite the standard and paragraph or other valid authority where it stipulates that they must not be.

    Jeff, you forget, he IS the valid authority.  Just ask him, he will tell you so.

     

  • jcelko212 32090

    SSCrazy Eights

    Points: 8821

    This is in the realm of mathematics and not standards. There is no ANSI/ISO standard for structured programming, RDBMS theory, or anything like that. However, our modern view of scales and measurements really begins with this paper:

    Stevens, S. S. (1946) On the theory of scales of measurement. Science, 103, 677-680.

    Stevens defined the now-famous nominal, ordinal, interval and ratio scales, characterized by the type of transformations which mapped from one legitimate representation to another. He classified the types of scales and made the rule that scales can be converted one to the other only if they are of the same type. Please notice that this is being done in 1946 by a psychologist looking at statistics. I would've thought statisticians would have come up with something like this many decades before.

    I get credit for, when I'm cited, is adding the idea of designing encodings for use in databases. In that sense, I'm a primary source. But the property we're talking about goes back to Stevens in his original work. You might also want to look at "Statistics and the Theory of Measurement" by D. J. Hand. The statistical techniques used to analyze a classification are totally different from those used to analyze a numeric variable such as length. Somewhere I have a wonderful T-shirt with the slogan "on a scale from 1 to 10, what color is your favorite letter of the alphabet?" It demonstrates so nicely the conceptual problems using the wrong type of scale, and why it leads to an incredibly bad data model.

    Please post DDL and follow ANSI/ISO standards when asking for help. 

  • ScottPletcher

    SSC Guru

    Points: 98079

    You seem fixated on tying the physical format of data to its use/display.  A value can be stored physically as an int without the value ever being directly used in math, only because it's far more practical to do so.  A value such as customer_id is stored as an int, even though customer_ids will never be added, square-root'ed, etc..  Storing it as char/varchar is just terribly wasteful and impractical.

    For example, take your earlier DDL:

    (reg_date DATE NOT NULL, store_nbr CHAR(2) NOT NULL,

    The minute we add a hundredth store, massive changes are required, in all code.  Not to mention all the CHECK conditions you need to add for validation.  Instead, defined as tinyint, you'd have until 255 stores before you had any issue at all.  And even a change to smalllint will be much easier than a char/varchar change.  And SQL automatically verifies that the value is numeric, at least (you may need value checks to insure > 0).

    The store_id value can still be displayed with leading zeros to indicate that it's a nominal value, but it does not have to be stored that way, again just out of being practical in reality, not to match some theoretical dictate.

    It's very similar with dates: the values are displayed so that they "look like dates" to humans, but for practical reasons, they are stored as integers.

    SQL DBA,SQL Server MVP(07, 08, 09) Prosecutor James Blackburn, in closing argument in the Fatal Vision murders trial: If in the future, you should cry a tear, cry one for them [the murder victims]. If in the future, you should say a prayer, say one for them. And if in the future, you should light a candle, light one for them.

  • Jeff Moden

    SSC Guru

    Points: 994523

    jcelko212 32090 wrote:

    This is in the realm of mathematics and not standards. There is no ANSI/ISO standard for structured programming, RDBMS theory, or anything like that. However, our modern view of scales and measurements really begins with this paper:

    Stevens, S. S. (1946) On the theory of scales of measurement. Science, 103, 677-680.

    Stevens defined the now-famous nominal, ordinal, interval and ratio scales, characterized by the type of transformations which mapped from one legitimate representation to another. He classified the types of scales and made the rule that scales can be converted one to the other only if they are of the same type. Please notice that this is being done in 1946 by a psychologist looking at statistics. I would've thought statisticians would have come up with something like this many decades before.

    I get credit for, when I'm cited, is adding the idea of designing encodings for use in databases. In that sense, I'm a primary source. But the property we're talking about goes back to Stevens in his original work. You might also want to look at "Statistics and the Theory of Measurement" by D. J. Hand. The statistical techniques used to analyze a classification are totally different from those used to analyze a numeric variable such as length. Somewhere I have a wonderful T-shirt with the slogan "on a scale from 1 to 10, what color is your favorite letter of the alphabet?" It demonstrates so nicely the conceptual problems using the wrong type of scale, and why it leads to an incredibly bad data model.

    Perfect... unless I pay $15 for the whole PDF or decide to pay $50 for a membership, I can only read the first page of the document (and I did read that page).  I was able to find a copy of the second reference (link below) for "Statistics and the Theory of Measurement" by D. J. Hand but it would be nice if you would always provide a link to a reference when one is available.  It would also be nice if you cited free references and ones that require only scant info for membership if a membership is required to read an article.

    http://www.lps.uci.edu/~johnsonk/CLASSES/MeasurementTheory/Hand1996.StatisticsAndTheTheoryOfMeasurement.pdf

    I believe that you've made the same mistake that you admonish others for in the presence of a numeric "ID" column.  You've just cited two documents having to do with statistics and measurements and you speak of scales and measurements but that's not what an ID column is typically used for, even numeric ones (you remind us of that regularly).

    You are absolutely correct in saying that they're (normally) "nominal" in that the only thing they can be used for is to provide something that we can count to determine quantity and to uniquely identify a row that contains something that, hopefully, has unique attributes.

    Obviously, to be unique in a system (part of the problem they cited in the first PDF), there must be a "rule" table and we normally refer to that as the Primary Key Table, which contains a unique list of "Nominal" values and the attribute(s) they represent.  And, please, don't for a minute think that for most tables that I'm suggesting that you normally can rely on the "ID" value of 1 (or any number) or a gapless sequence for most tables but do understand that there are some table designs that could cause the "ID" column to be Ordinal, Interval, and even Ratio in nature by design, all 3 of which can also satisfy the definition of "Nominal" values whether such values are numeric or not.

    The real problem with your admonishments is that you apparently don't agree that "labels" can be numeric in nature (Heh... tell that to rat #13 in a medical experiment of person #74 standing in line to renew their license at the DMV).  There is nothing in the single page PDF that you provided that states names or labels (or whatever) cannot be numeric in nature.  Although I'm not going to take the time to do a full read of a 48 page PDF for the second PDF, I did so a search for the words "Name" and "Label" and "Ident" and there's nothing that showed up that said such "Nominal" identifiers couldn't be numeric.

    In fact and in more than one place you can search for, most of the sites I visited to check for anomalies in the definitions of "Nominal", "Ordinal", "Interval", and "Ratio" all had examples where the "Nominal" values were, in fact, numbers.  Here's just one of many.

    http://athena.ecs.csus.edu/~buckley/CSc238/Types%20of%20Data.pdf

    Neither are we (normally) attempting to use them as anything other than a "Nominal Value" even though an IDENTITY column automatically assigns them and they just happen to be in an ascending order which normally can't be counted on except that (under normal conditions) the next number will always be some value (there could be a gap generated by the system or a rollback) greater than the largest current value (which we sometimes depend on to prevent page splits in indexes).  Shoot... they can even be a GUID which is nothing in SQL Server anymore than a representation of a random, 16 byte, binary NUMBER.

    Again, there's nothing in the documentation that you cited or that I've been able to find in less than a half an hour that even comes close to stipulating that "Nominal" identifiers couldn't be numeric in nature. In fact, the second PDF by Hand actually states the following...

    In a lot of cases, you are indeed an expert, Joe, but you're wrong in this case.  There is nothing to support your claim that "nominal scales are never numeric".  Even you can't support the claim with a good and demonstrable reason.

    And, to be clear, we're NOT talking about whether or not having "Nominal" keys is a good idea or not.  We are simply discussing your claim that "nominal scales are never numeric".

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a row... think, instead, of what you want to do to a column.
    "If you think its expensive to hire a professional to do the job, wait until you hire an amateur."--Red Adair
    "Change is inevitable... change for the better is not."
    When you put the right degree of spin on it, the number 3|8 is also a glyph that describes the nature of a DBAs job. 😉

    Helpful Links:
    How to post code problems
    Create a Tally Function (fnTally)

Viewing 12 posts - 1 through 12 (of 12 total)

You must be logged in to reply to this topic. Login to reply