sql scheduled jobs vs calculated columns

  • I'm wondering if anyone has any thoughts and experience using calculated columns to determine the current status of a row.

    Case in point: we have a table A with PublishesOn and UnpublishesOn columns, another table B holding the items and the status (Published or Unpublished), and a job that runs every minute and sets the status of the rows in the table A (ie, Published or Unpublished) when PublishesOn and UnpublishesOn are in the past.

    I'm wondering if a calculated status column in table A might be a simpler and/or more robust way to go?








  • Depending on the table structure and the business requirements, I would go for almost anything but a scheduled job, can you please share more details?



  • Coming from you, that's a... shall we say, less than ringing endorsement of the current implementation.

    Here's the bones of the table, minus the meat columns - and believe it or not, there are 2 additional date columns.

    I was not consulted on the design!! All dates are set in the app. I've put my foot down, and we're implementing some very modest validation, but I am not pleased with this scheduling business.

    CREATE TABLE [dbo].[TDA3_Revision](
    [nRevision_Id] [int] IDENTITY(1,1) NOT NULL,
    [nArticle_Id] [int] NOT NULL,
    [nStatus_Id] [int] NOT NULL,
    [dCreated] [datetime] NOT NULL,
    [dPublished] [datetime] NULL,
    [dUnpublished] [datetime] NULL,
    [dArchives] [datetime] NULL,
    [dUnpublishes] [datetime] NULL,
    [dPublishes] [datetime] NULL,
    [dLastModified] [datetime] NULL,


    Basically, we have 5 statuses

    1 Draft; 2 Published; 3 Deleted; 4 Archived; 5 Publish pending. 3 and 4 do not quite mean what they say.

    Users can override a scheduled publish / unpublish by changing the value for the status column (not always reflected in the Published / Unpublished columns!), or not use the scheduling feature at all (in which case the Published and Unpublished columns are generally accurate).

    From the design, you can see that there can be n revisions of an article. The dates between revisions don't need to be contiguous. They should *definitely* never overlap.

    The consuming apps retrieve the list of currently published articles (nstatus_id = 2) every 5 minutes, so I'm sorta OK with a 10, even 15 minute post-unpublish stat. A stat 5 days later is not fine.

    My problem is, we have stats for revisions outside the Min published and Max unpublished dates for some articles (all revisions). And we have stats that fall between published and unpublished of more that one revision of an article.

    And that makes me twitch uncontrollably, because I write the reports.

    On the retrieval side:

    There are ~200-250 current articles.  2 Varchar(MAX) columns and 4 Varchar(255).

    3 apps use this table. A typical refresh in one app retrieves 250 rows, ~200 times / minute. A second app is 30 rows, also ~200 times / minute. The other is negligible.


  • So I got a prototype going as a computed column calling a scalar function, and the perf hit appears negligible.

    Further investigation underway.



  • A computed column with  a scalar UDF that performs data access? That's an absolute no-no in my book. Scalar UDFs are not inlined(*), and the the optimizer does not know what's inside. You get an overhead that is costly, and if you are returning many rows from that table, you will have to pay dearly.

    I did not exactly follow the logic, but then again I only read it very casually. But I would keep the columns updated through a trigger. Or just make a join over the other table and put in a view.

    (*) You may have heard that this changes in SQL 2019, where scalar UDFs are inlined under some conditions. However, inlining never happens with computed columns, so what I say above applies to SQL 2019 as well.

    [font="Times New Roman"]Erland Sommarskog, SQL Server MVP, www.sommarskog.se[/font]

  • I'm with Erland on the use of scalar functions.

    Based on your description, it sounds like a perfect use case for an "AFTER TRIGGER".

  • So after the above comments, it's now a giant CASE statement.


  • schleep wrote:

    So after the above comments, it's now a giant CASE statement.  

    This to me, just like Jason said earlier, looks like a perfect case for a set based trigger


    Further on Erland's point, how would you implement the logic using a Scalar UDF?

  • I don't see how a trigger helps in this scenario...? If there's no table activity, no trigger is fired, nothing changes...? Or am I standing to close to the tree to see the forest?

    There was no data access within the function, it was an encapsulation of the giant CASE statement logic.

    ALTER TABLE <mytable>

    ADD ComputedStatus AS dbo.myUDF(colA, ColB, ColC, ColD)

    where colA, ColB, ColC, ColD are in <mytable>


  • I think the reason I and others assumed that you UDF made data access and a trigger therefore would be better is that your initial post talked about two tables. Apparently, there was a misunderstanding.

    Yes, that's correct. If you want to add a column which is dependent on other values on the same row, there is no reason to use a trigger - a computed column is a better solution.

    Whether you should put the CASE expression directly in the table or put it in a UDF is a matter of taste. The UDF comes with some overhead, but it may not be dramatic. Personally, I prefer to put the expression in the table definition, unless I need to use the expression in many places. That makes the table definition more self-contained. Then again, you could argue that using a UDF gives you more flexibility, since you can change the logic without altering the table. (And you cannot use ALTER TABLE ALTER COLUMN to change a computed column as I recall.)

    [font="Times New Roman"]Erland Sommarskog, SQL Server MVP, www.sommarskog.se[/font]

Viewing 10 posts - 1 through 10 (of 10 total)

You must be logged in to reply to this topic. Login to reply