Table Function returns varying number of records for an update statement

  • It doesn't like text files either. Lets try with a zip.

    Attachments:
    You must be logged in to view attached files.
  • Is it me or do I still not see it?

    Edit: Nevermind

     

    • This reply was modified 4 years, 3 months ago by  Lynn Pettis.
  • This is the index I would create:

    CREATE NONCLUSTERED INDEX [idx_moss_plotsampleid] ON [dbo].[moss]
    (
    [plot_sample_Id] ASC
    )WITH (PAD_INDEX = OFF, STATISTICS_NORECOMPUTE = OFF, SORT_IN_TEMPDB = OFF, DROP_EXISTING = OFF, ONLINE = OFF, ALLOW_ROW_LOCKS = ON, ALLOW_PAGE_LOCKS = ON);

  • Tim S wrote:

    I figured it was something besides compatibility level, but I just could not see it.

    Lynn's solution is able to meet my needs with pulling the view out of the Order by clause and making it a column and then sorting by that column.

    Sergiy, unfortunately I left out a user requirement that the 10% was to be for each sample and not just year. I would have preferred by year and not sample. And yes, this is not ever going to be fast. Unfortunately, that will need to be a fix for another time.

    Yes, Tim, I missed that requirement, which is pretty easy to fix.

    Without the need of a function still.

    Look, there are too many issues in your piece, but you should be able to get it right, if you think logically.

    First, the table dbo.sample is referenced twice - in the query and inside the function. But you only need it to filter the plot_sample_id's for a particular year. So, your query must begin with

    select * from dbo.[sample]
    WHERE sample_date >= DATEADD(YY, @sampleyear - 1900, 0)
    and sample_date < DATEADD(YY, @sampleyear - 1900 + 1, 0)

    See how I apply the filter - no function applied to the value in the column, but the parameter is used to define the range of dates.

    Now, since you're selecting by the range of dates, you better have the clustered index with the date column in the 1st position.

    So, your DDL statement must look like this:

    CREATE TABLE [dbo].[sample](
    [Id] [int] IDENTITY(1,1) NOT NULL PRIMARY KEY NONCLUSTERED ,
    [sample_date] [date] NOT NULL,
    [sample_year] AS (datepart(year,[sample_date]))
    );
    CREATE CLUSTERED INDEX [sample_date] ON [dbo].[sample] ([sample_date])

    After you've got the set of appropriate plot_sample_id's you are looking for 10 percent of records in dbo.moss relevant to each of those id's:

    SELECT * 
    FROM dbo.[sample] s
    CROSS APPLY (select TOP 10 PERCENT M.Id from dbo.moss M
    where M.plot_sample_Id = s.Id
    order by NEWID ()
    ) TenPCT
    WHERE sample_date >= DATEADD(YY, @sampleyear - 1900, 0)
    and sample_date < DATEADD(YY, @sampleyear - 1900 + 1, 0)

    And, finally, you join all of this to the instance of the table dbo.moss, which you're about to update.

    The final statement will look like this:

    update T 
    set TenPercentQAFlag = CURRENT_TIMESTAMP
    --SELECT *
    FROM dbo.[sample] s
    CROSS APPLY (select TOP 10 PERCENT M.Id from dbo.moss M
    where M.plot_sample_Id = s.Id
    order by NEWID ()
    ) TenPCT
    inner join dbo.moss T on TenPCT.Id = T.Id
    WHERE sample_date >= DATEADD(YY, @sampleyear - 1900, 0)
    and sample_date < DATEADD(YY, @sampleyear - 1900 + 1, 0)

    Clean and simple.

    If you add an index to dbo.moss:

    CREATE INDEX [plot_sample_Id] ON [dbo].[moss] ([plot_sample_Id]);

    then it will be also fast, as the execution plan does not show scans anymore, only index seeks.

     

    P.S. You better drop that computed column in dbo.sample, so nobody can have an idea of using it ever again.

    _____________
    Code for TallyGenerator

  • Thanks Sergiy,

    That is a cleaner solution.

    Because I am stubborn, I probably keep calculated field in the criteria. In my working database, it is persisted with an index, but the index was designed for looking at a parent table to sample (a table of locations that are visited about every five years). Since this process doesn't care about where the data was collected, it is not of use in this case. I'll need to look at how a new index impacts things so that this procedure.

    Again thanks!

  • I can respect somebody being stubborn, probably because I'm a bit stubborn myself.

    🙂

    But having columns like that is a very bad idea, in general, and this particular case is not an exclusion.

    Even though it's indexed, the selectivity of that index is very low, so there is no reason for the engine to use this index when choosing an execution plan. If you look at execution plans of the queries using this table in your working database, you'll most certainly find "Clustered Index Scan" on almost every occasion.

    Which means this index is just a burden the system has to maintain, without getting any benefit from it.

    I know, the computed column solution looks very appealing, as it makes things easy and clear for developers, but when you deal with masses of data you need to consider the overheads imposed on the system too.

    _____________
    Code for TallyGenerator

Viewing 6 posts - 16 through 20 (of 20 total)

You must be logged in to reply to this topic. Login to reply