Performance on 10 million row table

  • Hello all,

    I have a table, which is destined to rapidly grow. To give you an idea, the table is :

    CREATE TABLE [dbo].[Production]

    (

    [ID] [int] IDENTITY(1,1) NOT NULL,

    [OlympusID] [int] NOT NULL,

    [DateTimeStamp] [datetime] NOT NULL,

    [PackCount] [int] NOT NULL,

    [CaseCount] [int] NOT NULL,

    [WorkOrderID] [int] NOT NULL,

    [State] [bit] NOT NULL,

    [RowID] [INT] INT

    )

    There will be a query that will read counts occuring within a shift (ie 06:00 - 14:00), HOWEVER , since the counts occur every 5 minutes, I will need to include the last row before 06:00 AND the 1st row following 14:00.

    Example:

    Counts Time

    10 05:54

    309 06:02

    ...

    ..

    25 13:58

    91 14:02

    So , the query will also need to include the above highlighted rows as well.

    A solution that I have considered is a self join using rownumber().

    I would join them as

    select * from (SELECT * FROM ProductionTest where datetime='Some day') A INNER JOIN ProductionTest B ON A.RowID=B.RowID-1

    Bear in mind that the table will REALLY grow. Undoubtedly, the table with include indexes(probably on date and RowID).

    However, do you see any other better way ?

    As always, I will really appreciate your thoughts.

    Thanks,

    V

  • Why so complex?

    Write the query in the simplest possible way. If that query form proves, in testing, to have performance problems, consider alternates.

    SELECT <column names> FROM ProductionTest WHERE DateTimeStamp >= '<date> 05:55' and DateTimeStamp <= '<date> 14:05'

    If the counts are logged every 5 minutes, then that will catch the one before 6AM and the one after 2PM and all in between. If the counts are logged at different intervals, then you'd need to adjust the times

    Gail Shaw
    Microsoft Certified Master: SQL Server, MVP, M.Sc (Comp Sci)
    SQL In The Wild: Discussions on DB performance with occasional diversions into recoverability

    We walk in the dark places no others will enter
    We stand on the bridge and no one may pass
  • Thanks for the reply,

    So according to my example :

    Counts Time

    10 05:54

    309 06:02

    ...

    ..

    25 13:58

    91 14:02

    The point is that you don't want just records before 06:02 am, you want to include it in your query because you want to find the exact number of counts

    YOU WOULD EXPECT to have at 06:00! So my point is not just to include 05:54 and its counts but to use it for calculation in order to estimate the number of counts at 06:00 precisely.

    Does it make sense?

    Apologies for not explaining

  • No, makes less sense.

    The query i posted doesn't return just rows from before 6:02. It would return all rows between 5:55 and 14:05 inclusive.

    If the intervals are every 5 minutes, why is there a count at 5:54 and another at 6:02? (8 minutes apart)

    Gail Shaw
    Microsoft Certified Master: SQL Server, MVP, M.Sc (Comp Sci)
    SQL In The Wild: Discussions on DB performance with occasional diversions into recoverability

    We walk in the dark places no others will enter
    We stand on the bridge and no one may pass
  • Either you are missing the point or I fail to explain

    I will shortly post the script to avoid confusions.

  • I would do it with a UNION ALL query. One for the interval, and two SELECT TOP(1) queries to get the values just outside the interval. An index on DateTimeStamp is essential. And either clustered, or including Count.

    [font="Times New Roman"]Erland Sommarskog, SQL Server MVP, www.sommarskog.se[/font]

  • Hello Erlang,

    Many thanks for your reply. Couple of days ago, I found a solution, check the following please :

    with cte_test (DatetimeStamp,Packcount,WorkOrderID,RowID)

    AS

    (

    SELECT *,ROW_NUMBER() OVER (Order by X.Datetimestamp) RowID FROM

    (

    SELECT * FROM

    (

    select top(1) DateTimeSTamp,PackCount,WorkOrderID from Production

    where DateTimeStamp<@StartTime and WorkOrderID=@WorkOrderID

    order by DateTimeStamp DESC

    ) minusrow

    UNION

    select DateTimeSTamp,PackCount,WorkOrderID from Production

    where DateTimeStamp>@StartTime and WorkOrderID=@WorkOrderID AND DateTimeStamp<=@EndTime

    UNION

    select * from

    (

    select top(1) DateTimeSTamp,PackCount,WorkOrderID from Production

    where DateTimeStamp>@@EndTime and WorkOrderID=@WorkOrderID

    order by DateTimeStamp ASC

    ) plusrow

    ) X

    )

    INSERT INTO @teml

    select * from cte_test

    SELECT * FROM @temp A INNER JOIN @temp B ON a.RowID=b.RowID-1)

    Today, I visited the forum to check if there are any messages and I saw your solution and it seems we both thought exactly the same 🙂

  • UNION ALL, not UNION. Otherwise you're paying for an unnecessary distinct sort.

    Gail Shaw
    Microsoft Certified Master: SQL Server, MVP, M.Sc (Comp Sci)
    SQL In The Wild: Discussions on DB performance with occasional diversions into recoverability

    We walk in the dark places no others will enter
    We stand on the bridge and no one may pass
  • Thanks Gail,

    That's a good spot. Will correct accordingly.

    V

Viewing 9 posts - 1 through 8 (of 8 total)

You must be logged in to reply to this topic. Login to reply