Large, high volume database

  • Hi All,

    I have a 25Gb SQL 2k8R2 database that holds a rolling 90 days of routing activity the used by SSRS for report generation. I have not managed anything this large before and need some guidance and direction - thank you in advance.

    The daily data volume is from 1.5 to 2.5 million records per day with an file size of over 20Gb, record volume is currently 170 million recs.

    Most of the data is in one table, all fields except the [RecordID] field are used to select data and are indexed for performance as non-clusterered

    Since it is a rolling 90 day table, daily ETL via SSIS deletes the oldest day of data, and uploads the newest day.

    PAIN POINTS

    ETL and Reporting performance need to be better.

    1) Strategically the data lives on one disk - should this change?

    2) Data loading - Would it be smarter to drop all indices, load the data, recreate indices?

    3) The daily ETL traffic seems to fragment the database and files - what is a smarter way to manage the framentation?

    Table structure:

    /****** Object: Table [dbo].[Activity] Script Date: 01/04/2012 09:35:53 ******/

    SET ANSI_NULLS ON

    GO

    SET QUOTED_IDENTIFIER ON

    GO

    SET ANSI_PADDING ON

    GO

    CREATE TABLE [dbo].[Activity](

    [recordID] [bigint] IDENTITY(1,1) NOT NULL,

    [Datestamp] [datetime] NOT NULL,

    [Activity] [varchar](512) NULL,

    [Username] [varchar](50) NOT NULL,

    [GroupName] [varchar](50) NULL,

    [CallerID] [varchar](50) NULL,

    [EventMsg] [varchar](50) NULL,

    [NASIPAddress] [varchar](50) NULL,

    [AccessDevice] [varchar](50) NULL,

    [NetworkDeviceGroup] [varchar](50) NULL,

    [ActivityID] [varchar](50) NULL,

    [LogServer] [varchar](50) NOT NULL,

    CONSTRAINT [PK_Activity] PRIMARY KEY CLUSTERED

    (

    [recordID] ASC

    )WITH (PAD_INDEX = OFF, STATISTICS_NORECOMPUTE = OFF, IGNORE_DUP_KEY = OFF, ALLOW_ROW_LOCKS = ON, ALLOW_PAGE_LOCKS = ON) ON [PRIMARY]

    ) ON [PRIMARY]

    GO

    SET ANSI_PADDING OFF

    GO

    Tom in Sacramento - For better, quicker answers on T-SQL questions, click on the following...http://www.sqlservercentral.com/articles/Best+Practices/61537/

  • Have you looked into table partitioning? Or are you like me, stuck on standard edition?

  • Thanks for your reply

    Table Partitioning seems like a good idea - and I do have DataCenter Edition. Any suggestions on best way to assess / apply partitioning.

    Tom in Sacramento - For better, quicker answers on T-SQL questions, click on the following...http://www.sqlservercentral.com/articles/Best+Practices/61537/

  • I have only played around with partitioning on a local database, never in a production environment. It may be best if I just point you in the direction of some good information. 😀

    http://msdn.microsoft.com/en-us/library/ms190787.aspx

  • Define what you mean by one disk - is that a single LUN presented from a SAN? If so, how is that LUN setup on the SAN?

    What you can try is disabling all non clustered indexes, loading the data and then rebuilding the indexes. This might perform much faster for you - but it may not because you are only loading a days worth of data. It is something to try, and generally does perform better.

    This would take care of both item 2 and 3 for you.

    To disable the indexes: ALTER INDEX {index name} ON {table} DISABLE;

    To enable the indexes: ALTER INDEX ALL ON {table} REBUILD;

    You can specify the option to sort in tempdb - if that is on a separate LUN and disks it will help with the rebuild performance.

    Jeffrey Williams
    “We are all faced with a series of great opportunities brilliantly disguised as impossible situations.”

    ― Charles R. Swindoll

    How to post questions to get better answers faster
    Managing Transaction Logs

  • A 25Gig database, even if a single table is allocating much of it should not cause much trouble.

    If the idea is to expedite ETL process I'll suggest to look into the possibility of partitioning the base table by DAY - meaning, your table will have 90 partitions.

    If this partitioning strategy is possible - I know nothing about your application therefore I do not know - you can:

    1- Get rid of the oldest partition by either truncate/drop it or switch it out against an empty file.

    2- Load your new partition by switching an external table as the newest partition. Your ETL process will be pointing to this "staging" table that will be switched in.

    Please note that for this to work like a charm , you indexes have to be partitioned and aligned to the table partitioning strategy. For details about aligned partitioned indexes please check http://msdn.microsoft.com/en-us/library/ms187526.aspx

    _____________________________________
    Pablo (Paul) Berzukov

    Author of Understanding Database Administration available at Amazon and other bookstores.

    Disclaimer: Advice is provided to the best of my knowledge but no implicit or explicit warranties are provided. Since the advisor explicitly encourages testing any and all suggestions on a test non-production environment advisor should not held liable or responsible for any actions taken based on the given advice.
  • That is a single file group on a single HDD, i.e.; no partitioning of data or indices.

    Tom in Sacramento - For better, quicker answers on T-SQL questions, click on the following...http://www.sqlservercentral.com/articles/Best+Practices/61537/

  • Tom.Hamilton.Sacramento (1/4/2012)


    That is a single file group on a single HDD, i.e.; no partitioning of data or indices.

    Table/Index partitioning has nothing to do with how many filegroups you have - we are suggesting to move from non-partitioned to partitioned table/indexes.

    _____________________________________
    Pablo (Paul) Berzukov

    Author of Understanding Database Administration available at Amazon and other bookstores.

    Disclaimer: Advice is provided to the best of my knowledge but no implicit or explicit warranties are provided. Since the advisor explicitly encourages testing any and all suggestions on a test non-production environment advisor should not held liable or responsible for any actions taken based on the given advice.
  • Partition by day seems to make sense from a logic and logistic perspective - will get busy figuring out how - thanks much

    Tom in Sacramento - For better, quicker answers on T-SQL questions, click on the following...http://www.sqlservercentral.com/articles/Best+Practices/61537/

  • Tom.Hamilton.Sacramento (1/4/2012)


    Partition by day seems to make sense from a logic and logistic perspective - will get busy figuring out how - thanks much

    You may want to start here: http://msdn.microsoft.com/en-us/library/dd578580(v=SQL.100).aspx

    _____________________________________
    Pablo (Paul) Berzukov

    Author of Understanding Database Administration available at Amazon and other bookstores.

    Disclaimer: Advice is provided to the best of my knowledge but no implicit or explicit warranties are provided. Since the advisor explicitly encourages testing any and all suggestions on a test non-production environment advisor should not held liable or responsible for any actions taken based on the given advice.
  • Gone are the days I thought 25GB was large. I have a LOT of tables bigger than that.

    Agree with the above. I would look at partitioning to add a new partition and switch out the old one post import. You can automate this in a stored procedure as I already have.

  • If the main problem is ETL loading speed I would also look at a few configuration details:

    a) You say the data file is all on one disc - where are the logs? If on same drive as data moving them to a different disc should speed the load a fair bit.

    b) All those large varchar columns are potentially taking a lot of space, can you encode the values down to narrow values? Anything that makes the row narrower can potentially save you disc IO.

    c) Should any of the varchar columns be char? Save the 2 byte varchar overhead

    d) Can you lose the NULLable columns and store a zero-length string instead. Saves the null bitmap if you can lose them all. I have also found previously that updates from a null value to a non-null value are significantly slower that from zero or blank (as appropriate to data type) to a "real" value.

    e) If the ETL is using SSIS, is that spooling buffers to disc? If so where - is that the same disc as the database?

    Hope some of the above helps.

    Mike John

  • Thanks John - all great input/suggestions

    Tom in Sacramento - For better, quicker answers on T-SQL questions, click on the following...http://www.sqlservercentral.com/articles/Best+Practices/61537/

  • How many distinct values do you have for the following fields ?

    [Activity] [varchar](512) NULL,

    [Username] [varchar](50) NOT NULL,

    [GroupName] [varchar](50) NULL,

    [CallerID] [varchar](50) NULL,

    [EventMsg] [varchar](50) NULL,

    [NASIPAddress] [varchar](50) NULL,

    [AccessDevice] [varchar](50) NULL,

    [NetworkDeviceGroup] [varchar](50) NULL,

    [ActivityID] [varchar](50) NULL,

    [LogServer] [varchar](50) NOT NULL,

    Have you thought about normalising these

    If you havent done this before you should store activity_id in your main table, this would be an int (as small as possible)

    then in another table store the activity description and the related id.

    it would be more work on the etl but the IO would reduce massively and the table and indexs should be much faster as you will be aggregating the ints and not the strings.

    This should hopefully explain more

    http://en.wikipedia.org/wiki/Database_normalization

  • Thanks much - All this is cardinal data with hundreds or thousands of distinct values. Lookups are handled via PK/FK

    Tom in Sacramento - For better, quicker answers on T-SQL questions, click on the following...http://www.sqlservercentral.com/articles/Best+Practices/61537/

Viewing 15 posts - 1 through 14 (of 14 total)

You must be logged in to reply to this topic. Login to reply