improve performance table with millions of rows

  • Hi

    I use sql server 2005 and I wanted to ask an opinion for the creation of a table with millions of rows...

    The table has a primary key and therefore indexes.

    improve performance by dividing the table for years? :hehe: :exclamation:

    ie

    it is better to have a table with 24 million of rows or 12 tables with 2 million of rows

    Thank you

  • That all depends on table design and access patterns. Being that you are on 2000 you could use a partitioned view to bring the multiple table scenario into one "table" and you can reap some benefits through that but again access patterns and designs are the critical component to the answer. So, if you want further input it would be good to post the table create script and some of the frequently used queries for consideration.

    Hopefully that makes sense.

    David

    @SQLTentmaker

    “He is no fool who gives what he cannot keep to gain that which he cannot lose” - Jim Elliot

  • This is the table

    CREATE TABLE [dbo].[Vendite](

    [Data] [datetime] NOT NULL,

    [Mese] [int] NOT NULL,

    [Anno] [int] NOT NULL,

    [Societa] [varchar](3) NOT NULL,

    [Codice] [varchar](9) NOT NULL,

    [TipologiaCliente] [varchar](1) NOT NULL,

    [Classe] [int] NOT NULL,

    [Importo1] [decimal](19, 2) NOT NULL,

    [Importo2] [decimal](19, 2) NOT NULL,

    [Importo3] [decimal](19, 2) NULL,

    CONSTRAINT [PK_Vendite] PRIMARY KEY CLUSTERED

    (

    [Mese] ASC,

    [Anno] ASC,

    [Societa] ASC,

    [Codice] ASC

    )WITH (PAD_INDEX = OFF, STATISTICS_NORECOMPUTE = OFF, IGNORE_DUP_KEY = OFF, ALLOW_ROW_LOCKS = ON, ALLOW_PAGE_LOCKS = ON) ON [PRIMARY]

    ) ON [PRIMARY]

    queries may require the join over two years

    some ideas?

    Thank you

  • Making some great assumptions as I still don't have queries but if you are going to be retrieving 2 years of data (is that what you were stating?) then you would be able to gain some benefit from breaking the tables down physically and creating a partitioned view over the top of them. You should review the information in BOL regarding restrictions associated with partitioned views and do some testing before jumping in and making the change. Part of that testing should be with running standard queries in a side-by-side scenario to see if you are going to reap any benefit.

    Again, some assumptions being made...

    David

    @SQLTentmaker

    “He is no fool who gives what he cannot keep to gain that which he cannot lose” - Jim Elliot

  • Example query

    solution with single table

    SELECT DISTINCT M.Mese,M.Anno,M.Societa,M.Codice FROM

    dbo.VenditeAnno(@M,@Anno) M

    LEFT OUTER JOIN dbo.VenditeAnno(@P,@AnnoPrec) P

    ON M.Societa = P.Societa AND M.Codice = P.Codice

    WHERE M.Classe = xxx

    VenditeAnno is a function that returns the data a year

    Thank you

  • I'm a little confused by the sql statement as it doesn't appear to be referencing the table at all but I am guessing that you are within the function. What is the function actually returning? Very curious.

    David

    @SQLTentmaker

    “He is no fool who gives what he cannot keep to gain that which he cannot lose” - Jim Elliot

  • Move whatever the function is doing into a derived table. The millions of call to that function is what is killing the performance.. no amount of indexing an overcome that.

    We need to see the function's code to go any further.

  • FUNCTION [dbo].[VenditeAnno]

    (

    @Mese Int,

    @Anno Int

    )

    RETURNS TABLE

    AS

    RETURN

    (SELECT Vendite.Data, Vendite.Mese As Mese,Vendite.Anno as Anno,

    Vendite.Societa, Vendite.Codice, Vendite.TipologiaCliente, Vendite.Classe,

    Vendite.Importo1, Vendite.Importo2, Vendite.Importo3

    FROM Vendite

    WHERE Vendite.Mese=@Mese AND Vendite.Anno=@Anno

    )

  • What's the point of the left join in the query? I don't understand what you are trying to do there.

  • Looks to be a year over year comparison, which if this is the case the physical separation of the data into multiple tables by year and using a partitioned view should help you get some better results. You should be able to see that in the execution plan when you do some testing. The only caveat to that would be that you may not be able to get the detail of the table access due to the use of the function. Hard to say until you try it.

    I haven't used a function in the way you are using it but it seems odd to me. Is this efficient?

    David

    @SQLTentmaker

    “He is no fool who gives what he cannot keep to gain that which he cannot lose” - Jim Elliot

  • I think you might get much better performance if you make that concatenated PK a non-clustered index and change the clustered to just those 2 columns in your query's WHERE clause.

  • ...because the way the function is written, and the columns contained in the function, the current indexing is not going to be sufficient and there will be some lookups. I haven't played with the table but it sure appears that way to me.

    Just my thoughts.

    David

    @SQLTentmaker

    “He is no fool who gives what he cannot keep to gain that which he cannot lose” - Jim Elliot

  • I have tried with SET STATISTICS IO

    Time is almost identical

    The difference is that the solution with a single table run 1 physics reading and then all logic

    while The solution with multiple tables running a 1 physical reading and then 1 logical reading for each year...

    the keys and the structure of the tables is always the same

    who benefits if I share the table??

    Thank you

  • Can you include the statistics IO output? Also, can you attach the two query plans?

    David

    @SQLTentmaker

    “He is no fool who gives what he cannot keep to gain that which he cannot lose” - Jim Elliot

Viewing 14 posts - 1 through 13 (of 13 total)

You must be logged in to reply to this topic. Login to reply