Read data based on date-range in huge table is slow

  • I've got a table here with more than 60 million rows and I want to extract data based on date (WHERE Time>='2023-01-01' AND Time<='2023-01-10' for example). Problem is, it's terribly slow, it takes several minutes.

    The clustered index (which I haven't created myself) consists of several columns (bigint, int, varchar(255), datetime, in that order). According to some tests, performance would be better rearranging the index (so that Time comes first), but not sure if I mess up any other queries by doing that.

    I've googled alot but haven't came up with any neat tricks yet. Each row doesn't has any useful id's (UNIQUEIDENTIFIER only) so I can't use that either as far as I know.

    Any bright ideas?

  • Are you able to share the execution plan (Paste the Plan)? This'll help us understand what the RDBMS is up to, and see if there are any helpful indexes. The Clustered index should ideally be always increasing, so if it's on your always increasing ID, it may be on the right choice of column, but that doesn't stop you created other indexes on the table, I assume you have some other non-clustered indexes on it as well?

    Thom~

    Excuse my typos and sometimes awful grammar. My fingers work faster than my brain does.
    Larnu.uk

  • Yeah, there's an index on the id of the table (UNIQUEIDENTIFIER) but I don't think I have any use of that for this query.

    I don't think I'm able to share the plan (too much personal stuff..) There's an "Clustered Index Scan" that has 98% cost and takes about 13 minutes to complete.

    Not sure if this attached screenshot helps?

     

    Attachments:
    You must be logged in to view attached files.
  • So do you have index on Time ? Does it contain any other columns and INCLUDE the rest of the columns in the SELECT? Without the query, DDL of the table, indexes, and execution plan, there's not a lot more we can offer here.If the clustered index isn't sorted by time, it's not going to help the RDBMS filter those rows quickly.

    Thom~

    Excuse my typos and sometimes awful grammar. My fingers work faster than my brain does.
    Larnu.uk

  • Before changing the clustered index, you should know what other queries are impacted and their importance

    Is a non-clustered index on [Time] an option?

    Is [Time] a string datatype? Since you compare it with a string-value in the example

     

  • Time is of datatype "datetime".

    "Time" is part of the clustered index, but the clustered index, as stated in the first post, consists of total of 4 columns (bigint, int, varchar(255), datetime, in that order)

    I guess a non-clustered "Index" on "Time" would help. Wonder if that would to so I don't have to use include columns aswell, I guess that might result in a massive index..

     

    • This reply was modified 1 year, 3 months ago by  oRBIT.
    • This reply was modified 1 year, 3 months ago by  oRBIT.
    Attachments:
    You must be logged in to view attached files.
  • oRBIT wrote:

    "Time" is part of the clustered index, but the clustered index, as stated in the first post, consists of total of 4 columns (bigint, int, varchar(255), datetime, in that order)

    Which isn't helpful to the data engine.

    If you were asked to get all the people whose first name was David from a phone book, the index (which lists people in alphabetical order of their last name) isn't going to help you quickly find all the Davids; you'll have to go and check the first name of every single person in the book which is not going to be quick.

    The same problem exists for the data engine; you don't have any appropriate indexes for the query being run, and so it has to check the value of every single row in the table.

    Thom~

    Excuse my typos and sometimes awful grammar. My fingers work faster than my brain does.
    Larnu.uk

  • If "Time" were the first in order in the index, that would be much better?

  • This was removed by the editor as SPAM

  • This was removed by the editor as SPAM

  • oRBIT wrote:

    If "Time" were the first in order in the index, that would be much better?

    Like Jo said, I wouldn't just go changing your clustered index without considering the effect it'll have on  your other queries. Why are you (apparently) against creating non-clustered indexes? A single, clustered index, isn't going to be helpful to every query you ever run, and changing the clustered index to help with the query "flavour" of the day isn't a solution. If someone chose that clustered index for your environment, presumably they had good reason to do so.

    Though, I will admit, the choice of the columns for the clustered value does seem pretty "wide", it's 277 bytes in size.

    • This reply was modified 1 year, 3 months ago by  Thom A.

    Thom~

    Excuse my typos and sometimes awful grammar. My fingers work faster than my brain does.
    Larnu.uk

  • For efficient filtering it should be the first column of an index. (so there are statistics about time distribution). As Thom A mentioned, the clustered index won't help much for filtering on time)

    What percentage of data will be collected in each interval? (non-clustered indexes have a traversal penalty from root to the leaf nodes of the b-tree)

    Compare [Time] with its datatype Time>= @Variable of type datetime (convert(datetime,'2023-01-01') ...)

    Test it out on a test-environment? Depending on usage, you might even compress the data

     

  • I'll be fetching like 50-100 rows only so it's tiny amounts..

    I don't have any problems with nonclustered indexes. I was just afraid I might need to include all columns I want to read, with the risk of creating a huge index (although I am no expert here so I might be mistaken)..

    I was hoping there could be some trick used to fetch the data so I didn't have to change the indexes (although I already suspected there isn't much to do without altering them..)

  • This was removed by the editor as SPAM

  • Instead of changing the clustered index, can you add another index?

    Michael L John
    If you assassinate a DBA, would you pull a trigger?
    To properly post on a forum:
    http://www.sqlservercentral.com/articles/61537/

Viewing 15 posts - 1 through 15 (of 25 total)

You must be logged in to reply to this topic. Login to reply