Database design

  • Hi

    We are planning to implement a database which etl's about 2GB of data daily.

    Estimating the overall size on the server to be close to 1 TB on a yearly basis.

    We are going to be using an ETL database and a Datawarehouse , from which we will load the data into cubes. Considering the ETL and the overall size of the database are there any specifics that i need to consider when designing the database. I am sorry i cant provide much detail right now but we are looking for alternatives in the database design so that we dont tradeoff too much on space or performance.

    I am thinking that I dont want to normalize the database too far. Am i right ???

    Thanks

    Jay

    Jayanth Kurup[/url]

  • What about table partitioning, will it work in your case?

  • You want to normalize the data as much as you need to. There really isn't a "too far" or a "not far enough." Meet the business requirements in the best way possible. Remember that normalization not only increases data accuracy, but it reduces the amount of data stored. For example, you can create all the address information with 50 million customers, repeating addresses over and over again, or you can link to an address table and radically reduce the amount of data stored. That two table join is not going to seriously impact performance. Three, four, and 15 table joins won't seriously impact performance either if you've got good indexes, especially good clustered indexes. Flattening the structure reduces joins and simplifies queries, but it could make for poorer performance (you'll need to index more columns on the table and maintain that data on that one table with more page splits, more index rebuilds...). If flat files were better, we'd never have gone to relational databases in the first place.

    "The credit belongs to the man who is actually in the arena, whose face is marred by dust and sweat and blood"
    - Theodore Roosevelt

    Author of:
    SQL Server Execution Plans
    SQL Server Query Performance Tuning

  • Yes , we will be implementing a horizontal partition on the main tables , with a partition for each month.

    regarding the indexes , we will be loading data once every 5 - 10 minutes, so i was wondering if i should use the indexes since the tables will have frequent inserts happening . I also need to implement full text indexing as well.

    I need the ETL to complete in 5 -10 minutes before it starts all over again.

    Indexes i think could become a performance bottle neck during etl , am i right ?

    Thanks again for your help.

    Jayanth Kurup[/url]

  • Indexes can be, but aren't always, a performance problem when performing ETL. Best answer to that question is for you to test your load both ways, running with the indexes on and then running with a drop & recreate on the indexes. The one thing you can do to speed up either load is to, where possible, ensure that the data being loaded is in the same order as your clustered index. That helps regardless of whether you recreate the indexes or not.

    "The credit belongs to the man who is actually in the arena, whose face is marred by dust and sweat and blood"
    - Theodore Roosevelt

    Author of:
    SQL Server Execution Plans
    SQL Server Query Performance Tuning

  • If you have an ETL database for staging the data you would not have indexes on tables. The point of the ETL is to pull the information out and prep it for loading into your datawarehouse not to query against. If you use the datawarehouse as the collection of everything for historic purposes then you should think about implementing data marts or small subsets of the data using snapshots for your different internal business clients and leave the datawarehouse for your BI folks, and I don’t mean marketing, I mean your internal person who understands data mining algorithms and who can perform predictive analytics. The reduced number of data columns tailored to the specific needs of the internal client group could save you from index problems. The thing to keep in mind with indexes is that if the query returns more then 1% of your total data the optimizer will not use the index < this tidbit comes from Kimberly Trip at the SQLConnections conference (her site is excellent BTW http://www.sqlskills.com/). Your indexes on a DW will add roughly 3-5 times the storage space of the data and keep in-mind that to do online index rebuilds you need to have space equal to the size of the index being rebuilt.

    A couple of things to keep in mind for planning purposes, first the 5-10 min ETL as a Service Level Agreement, may not be workable, 80% of the effort in the build of datawarehouse and data mining is getting the data clean and in the format you need. Only you are the expert as to the quality of your data but until you run through this process I would not commit to 5-10 mins. Additionally, only you can realistic assess if a 5-10 minute refresh makes business sense too. Marketing does just fine on a 12 or 24 hour old view of the data and inventory managment does fine with 30 min views of the data so long as you have a method to notify customers that you are out of stock when they have already placed an order.

    Also, keep in mind that in order to data mine the data you need to use nvarchar and ntext as data types, varchar and varchar(max) won’t work. This could double your initial storage requirements and you want to store data in this type instead of converting on the fly which you can in BI-Studio because of the additional memory overheard associated with the conversion which leaves you with less memory for the predictive analytics which use a lot of memory; this alone is often reason enough to move to a 64 bit version for DW needs.

    Based on the size of your data normalizing of 3NF should work fine the advantage to a star schema is that it is more intuitive for the end user clients so if you are looking to use the data mining plug-in for Excel to allow end-user to query data themselves this may work to your advantage > the advantage being that it shifts the report monkey duties to them and away from you leave you to do the heavy lifting such as predictive analytics and multichannel analysis.

    Last recommendation is to use the following tool which I seem to be recommending a lot lately to automatically figure out your index needs based on usage patterns. This is from an earlier post but holds true here too.

    What you want to do is pull the data from the missing index dmv and the script below will do this for you by creating a database called AUTOINDEXRECS that polls the missing index dmv and sucks in the info which leaves you to come back at a later time and look at the table to determine what indexes you need to create, and which can be dropped, and on what tables. You need to have sa permission to do this. This comes from the query team at microsoft and you should download the .zip here http://blogs.msdn.com/queryoptteam/archive/2006/06/01/613516.aspx.

    I found it on Paul's former storage engine blog.

    When you query the recommendation table the results will look like the following:

    CREATE INDEX _MS_Sys_1 ON [Database_name].[dbo].[tbl_name]([ResponseID]) INCLUDE ([ResponseText])

    This is without a doubt the best tuning tool for a database server, works wonders in OLAP environments where you don't know what the reports are going to be before hand, and I am baffled why this is not more widely known.

    Lastly, make sure to use upsert statements if you will be updataing as well as inserting data and wrap everything in one transaction statement this will save you from the overhead of a row by agonizing row committ for all inserts. And make sure your temp DB has one file for each CPU core on the box to prevent contention issues.

    The partitioning by month is a good plan but take a look here at a better design methodology to use >From Kim Tripp

    (Entire post here http://www.sqlskills.com/blogs/kimberly/2007/10/03/SQLServer2008OffersPartitionlevelLockEscalationExcellentBut.aspx)

    "As a result, I would suggest a slightly different architecture. Instead of using only a single partitioned table for both read-only and read-write data, use at least two tables. One table for read-only data and another for read-write data. If you think this might be defeating the purpose of partitioning... then look at these benefits:

    * the read-only portion of the table (which is typically the *much* larger portion of the table - can still be managed with partitioning)

    * the read-only portion - once separated from the read-write - can have additional indexes for better [range] query performance

    * the read-only portion of the table can actually be partitioned into multiple partitioned tables - to give better per-table statistics (statistics are still at the table-level only so even if your partitioning scheme is "monthly" you might want to have tables that represent a year's worth of data...especially if your trends seem to change year to year)

    * large range queries against the read-only portion of the data will only escalate to the "table" (which is now separated from the read-write data)

    * the read-write portion of the data can have fewer indexes

    * the read-write portion of the data can be placed on different disks (MORE fault tolerant disks) due to the importance/volatility of the data

    * finally, and most importantly, the read-write portion of the data can be maintained completely separately from the read-only portion with regard to index rebuilds

    "

    Hope this helps,

    --Dave

    /****************************************************************************

    // Copyright (c) 2005 Microsoft Corporation.

    //

    // @File: AutoIndex.sql

    //

    // @test-2:

    //

    // Purpose:

    // Auto create or drop indexes

    //

    // Notes:

    //

    //

    // @EndHeader@

    *****************************************************************************/

    CREATE DATABASE AUTOINDEXRECS

    go

    USE AUTOINDEXRECS

    go

    -- Table to store recommendations

    IF object_id(N'dbo.recommendations', N'U') IS NOT NULL

    DROP table [dbo].[recommendations]

    GO

    create table [dbo].[recommendations]

    (

    id int IDENTITY primary key,

    recommendation nvarchar(400),

    type char(2),

    initial_time datetime,

    latest_time datetime,

    [count] int,

    status nvarchar(20)

    )

    GO

    -- Table to store recommendation history

    IF object_id(N'dbo.recommendations_history', N'U') IS NOT NULL

    DROP table [dbo].[recommendations_history]

    GO

    create table [dbo].[recommendations_history]

    (

    id int,

    operation nvarchar(20),

    time datetime,

    db_user_name sysname,

    login_name sysname

    )

    GO

    -- Table to store index recommendations details

    IF object_id(N'dbo.recommendations_details_index', N'U') IS NOT NULL

    DROP table [dbo].[recommendations_details_index]

    GO

    create table [dbo].[recommendations_details_index]

    (

    id int,

    database_id int,

    table_id int,

    table_modify_time datetime

    )

    GO

    ------------------------- add_recommendation_history ----------------------------------------------------

    ------ SP for adding a recommendation into the recommendations_history table.

    IF OBJECT_ID (N'dbo.add_recommendation_history', N'P') IS NOT NULL

    DROP PROC [dbo].[add_recommendation_history];

    GO

    create procedure [dbo].[add_recommendation_history]

    @id int,

    @operation nvarchar(20),

    @time datetime

    AS

    BEGIN

    declare @db_user_name sysname

    select @db_user_name = CURRENT_USER

    declare @login_name sysname

    select @login_name = SUSER_SNAME()

    insert into recommendations_history values (@id, @operation, @time, @db_user_name, @login_name)

    END

    go

    ------------------------- add_recommendation----------------------------------------------------

    ------ SP for inserting a new recommendation into the dbo.RECOMMENDATIONS table.

    ------ If the same entry already exists, it just changes latest_create_date to the latest time

    ------ and increase the count by one

    IF OBJECT_ID (N'dbo.add_recommendation', N'P') IS NOT NULL

    DROP PROC [dbo].[add_recommendation];

    GO

    create procedure [dbo].[add_recommendation]

    @recommendation nvarchar(max),

    @type_desc char(2),

    @id int OUTPUT

    AS

    BEGIN

    declare @create_date datetime

    set @create_date = getdate()

    IF ( @recommendation not in

    (select recommendation from dbo.recommendations))

    BEGIN

    insert into dbo.recommendations values

    (@recommendation, @type_desc, @create_date, @create_date, 1, N'Active')

    select @id = @@identity

    -- add it into the recommendation history

    exec [dbo].[add_recommendation_history] @id, N'ADD', @create_date

    return 0

    END

    ELSE

    BEGIN

    select @id = id

    from dbo.recommendations

    where @recommendation = recommendation

    update dbo.recommendations

    set latest_time = @create_date,

    [count] = [count] +1

    where id = @id

    -- add it into the recommendation history

    exec [dbo].[add_recommendation_history] @id, N'UPDATE', @create_date

    return 10

    END

    END

    go

    ------------------------- disable_recommendation----------------------------------------------------

    ------ SP for disabling a recommendation in the RECOMMENDATIONS table.

    IF OBJECT_ID (N'dbo.disable_recommendation', N'P') IS NOT NULL

    DROP PROC [dbo].[disable_recommendation];

    GO

    create procedure [dbo].[disable_recommendation]

    @id int

    AS

    BEGIN

    BEGIN TRANSACTION xDisableRecommendation

    declare @create_date datetime

    set @create_date = getdate()

    update recommendations

    set status = N'Inactive'

    where id = @id

    -- add it into the recommendation history

    exec [dbo].[add_recommendation_history] @id, N'DISABLE', @create_date

    DECLARE @Error int

    SET @Error = @@ERROR

    IF @Error <> 0

    BEGIN

    ROLLBACK TRANSACTION xDisableRecommendation

    RETURN @Error

    END

    COMMIT TRANSACTION xDisableRecommendation

    END

    go

    ------------------------- enable_recommendation----------------------------------------------------

    ------ SP for enabling a recommendation in the RECOMMENDATIONS table.

    IF OBJECT_ID (N'dbo.enable_recommendation', N'P') IS NOT NULL

    DROP PROC [dbo].[enable_recommendation];

    GO

    create procedure [dbo].[enable_recommendation]

    @id int

    AS

    BEGIN

    BEGIN TRANSACTION xEnableRecommendation

    declare @create_date datetime

    set @create_date = getdate()

    update recommendations

    set status = N'Active'

    where id = @id

    -- add it into the recommendation history

    exec [dbo].[add_recommendation_history] @id, N'ENABLE', @create_date

    DECLARE @Error int

    SET @Error = @@ERROR

    IF @Error <> 0

    BEGIN

    ROLLBACK TRANSACTION xEnableRecommendation

    RETURN @Error

    END

    COMMIT TRANSACTION xEnableRecommendation

    END

    go

    ------------------------- execute_recommendation----------------------------------------------------

    ------ SP for executing a recommendation in the RECOMMENDATIONS table.

    IF OBJECT_ID (N'dbo.execute_recommendation', N'P') IS NOT NULL

    DROP PROC [dbo].[execute_recommendation];

    GO

    create procedure [dbo].[execute_recommendation]

    @id int

    AS

    BEGIN

    declare @recommendation nvarchar(max)

    declare @status nvarchar(20)

    -- exec the recommendation

    select @recommendation = recommendation, @status = status

    from [recommendations]

    where id = @id

    -- check recommendation status

    if (@status = 'Inactive')

    begin

    print N'Error: Recommendation ' + cast ( @id as nvarchar(10)) + ' is Inactive. Change the status to Active before execution'

    return 1

    end

    -- check whether the schema has changed for the table

    declare @database_id int

    declare @object_id int

    declare @stored_modify_date datetime

    select @database_id = database_id, @object_id = table_id, @stored_modify_date = table_modify_time

    from [dbo].[recommendations_details_index]

    where id = @id

    declare @database_name sysname

    select @database_name = db_name(@database_id)

    -- create temporary table to store the current table schema version

    create table [#tabSchema] ( modify_date datetime)

    truncate table [#tabSchema]

    declare @exec_stmt nvarchar(4000)

    select @exec_stmt =

    'use '+ @database_name +

    '; insert [#tabSchema] select modify_date from sys.objects where object_id = ' + cast ( @object_id as nvarchar(10))

    --print @exec_stmt

    EXEC (@exec_stmt)

    declare @modify_date datetime

    select @modify_date = modify_date from #tabSchema

    if (object_id('[#tabSchema]') is not null)

    begin

    drop table [#tabSchema]

    end

    if (@modify_date > @stored_modify_date)

    begin

    print N'Error: Recommendation ' + cast ( @id as nvarchar(10)) + ' might be invalid since the schema on the table has changed since the recommendation was made'

    return 1

    end

    declare @create_date datetime

    set @create_date = getdate()

    BEGIN TRANSACTION xExecuteRecommendation

    exec (@recommendation)

    -- add it into the recommendation history

    exec [dbo].[add_recommendation_history] @id, N'EXECUTE', @create_date

    DECLARE @Error int

    SET @Error = @@ERROR

    IF @Error <> 0

    BEGIN

    ROLLBACK TRANSACTION xExecuteRecommendation

    RETURN @Error

    END

    COMMIT TRANSACTION xExecuteRecommendation

    END

    go

    ------------------------- add_recommendation_details_index ----------------------------------------------------

    ------ SP for adding index recommendation details into the recommendations_details_index table.

    IF OBJECT_ID (N'dbo.add_recommendation_details_index', N'P') IS NOT NULL

    DROP PROC [dbo].[add_recommendation_details_index];

    GO

    create procedure [dbo].[add_recommendation_details_index]

    @id int,

    @database_id int,

    @table_id int

    AS

    BEGIN

    declare @database_name sysname

    select @database_name = db_name(@database_id)

    -- create temporary table to store the current table schema version

    create table [#tabSchemaVer] ( modify_date datetime)

    truncate table [#tabSchemaVer]

    declare @exec_stmt nvarchar(4000)

    select @exec_stmt =

    'use '+ @database_name +

    '; insert [#tabSchemaVer] select modify_date from sys.objects where object_id = ' + cast ( @table_id as nvarchar(10))

    --print @exec_stmt

    EXEC (@exec_stmt)

    declare @tabVer datetime

    select @tabVer = modify_date from #tabSchemaVer

    insert into recommendations_details_index values (@id,@database_id, @table_id, @tabVer)

    if (object_id('[#tabSchemaVer]') is not null)

    begin

    drop table [#tabSchemaVer]

    end

    END

    go

    ---------------------------- auto_create_index ------------------------------

    IF OBJECT_ID (N'dbo.auto_create_index', N'P') IS NOT NULL

    DROP PROC [dbo].[auto_create_index];

    GO

    create procedure [dbo].[auto_create_index]

    as

    -- NOTE: This sp will create indexes recommended by the Missing Index DMVs.

    --

    set nocount on

    -- required for creating index on ICC/IVs

    set ansi_warnings on

    set ansi_padding on

    set arithabort on

    set concat_null_yields_null on

    set numeric_roundabort off

    declare @exec_stmt nvarchar(4000)

    declare @table_name nvarchar(521)

    declare @column_name sysname

    declare @column_usage varchar(20)

    declare @column_id smallint

    declare @index_handle int

    declare @database_id int

    declare @object_id int

    -- find the top 5 indexes with maximum total improvent

    declare ms_cri_tnames cursor local static for

    Select Top 5 mid.database_id, mid.object_id, mid.statement as table_name, mig.index_handle as index_handle

    from

    (

    select

    (user_seeks+user_scans) * avg_total_user_cost * (avg_user_impact * 0.01) as index_advantage, migs.*

    from sys.dm_db_missing_index_group_stats migs

    ) as migs_adv,

    sys.dm_db_missing_index_groups mig,

    sys.dm_db_missing_index_details mid

    where

    migs_adv.group_handle = mig.index_group_handle and

    mig.index_handle = mid.index_handle

    and migs_adv.index_advantage > 10

    order by migs_adv.index_advantage DESC

    -- create temporary table to store the table names on which we just auto created indexes

    create table #tablenametab

    ( table_name nvarchar(521) collate database_default

    )

    truncate table #tablenametab

    open ms_cri_tnames

    fetch next from ms_cri_tnames into @database_id, @object_id, @table_name, @index_handle

    --print @table_name

    while (@@fetch_status <> -1)

    begin

    -- don't auto create index on same table again

    -- UNDONE: we may try to filter out local temp table in the future

    if (@table_name not in (select table_name from #tablenametab ))

    begin

    -- these are all columns on which we are going to auto create indexes

    declare ms_cri_cnames cursor local for

    select column_id, quotename(column_name,'['), column_usage

    from sys.dm_db_missing_index_columns(@index_handle)

    -- now go over all columns for the index to-be-created and

    -- construct the create index statement

    open ms_cri_cnames

    fetch next from ms_cri_cnames into @column_id, @column_name, @column_usage

    declare @index_name sysname

    declare @include_column_list nvarchar(517)

    declare @key_list nvarchar(517)

    select @index_name = '_MS_Sys'

    select @key_list = ''

    select @include_column_list = ''

    declare @num_keys smallint

    declare @num_include_columns smallint

    select @num_keys = 0

    select @num_include_columns = 0

    while @@fetch_status >= 0

    begin

    -- construct index name, key list and include column list during the loop

    -- Index Name in the format: _MS_Sys_colid1_colid2_..._colidn

    if (@column_usage = 'INCLUDE')

    begin

    if (@num_include_columns = 0)

    select @include_column_list = @column_name

    else

    select @include_column_list = @include_column_list + ', ' +@column_name

    select @num_include_columns = @num_include_columns + 1

    end

    else

    begin

    if (@num_keys = 0)

    select @key_list = @column_name

    else

    select @key_list = @key_list + ', ' +@column_name

    select @num_keys = @num_keys + 1

    select @index_name = @index_name + '_'+cast ( @column_id as nvarchar(10))

    end

    fetch next from ms_cri_cnames into @column_id, @column_name, @column_usage

    end

    close ms_cri_cnames

    deallocate ms_cri_cnames

    --print @index_name

    --print @table_name

    --print @key_list

    --print @include_column_list

    -- construct create index statement

    -- "CREATE INDEX @INDEX_NAME ON @TABLE_NAME (KEY_NAME1, KEY_NAME2, ...) INCLUDE (INCLUDE_COL_NAME1, INCLUDE_COL_NAME2, ...) WITH (ONLINE = ON)" (Note: for recommendation mode, we don't use online option)

    if (@num_include_columns > 0)

    select @exec_stmt = 'CREATE INDEX ' + @index_name + ' ON ' + @table_name + '(' + @key_list + ') INCLUDE ('+ @include_column_list + ')'-- WITH (ONLINE = ON)'

    else

    select @exec_stmt = 'CREATE INDEX ' + @index_name + ' ON ' + @table_name + '(' + @key_list + ')'-- WITH (ONLINE = ON)'

    --print @exec_stmt

    declare @id int

    declare @create_date datetime

    BEGIN TRANSACTION xAddCreateIdxRecommendation

    DECLARE @result int;

    EXEC @result = dbo.add_recommendation @exec_stmt, 'CI', @id OUT

    if (@result <> 10)

    EXEC dbo.add_recommendation_details_index @id, @database_id, @object_id

    DECLARE @Error int

    SET @Error = @@ERROR

    IF @Error <> 0

    BEGIN

    ROLLBACK TRANSACTION xAddCreateIdxRecommendation

    RETURN @Error

    END

    COMMIT TRANSACTION xAddCreateIdxRecommendation

    --EXEC (@exec_stmt)

    -- insert the table name into #tablenametab

    insert into #tablenametab values (@table_name)

    end

    fetch next from ms_cri_tnames into @database_id, @object_id, @table_name, @index_handle

    end

    deallocate ms_cri_tnames

    return(0) -- auto_create_index

    go

    ---------------------------- sp_autodropindex ------------------------------

    IF OBJECT_ID (N'dbo.auto_drop_index', N'P') IS NOT NULL

    DROP PROC [dbo].[auto_drop_index];

    GO

    create procedure [dbo].[auto_drop_index]

    as

    -- NOTE: This sp will drop indexes that are automatically created and

    -- are no longer very useful in a cost efficient manner based on feedbacks

    -- from index usage DMVs.

    set nocount on

    declare @database_id int

    declare @object_id int

    declare @index_id int

    declare ms_drpi_iids cursor local static for

    Select Top 3 database_id, object_id, index_id

    from sys.dm_db_index_usage_stats

    where user_updates > 10 * (user_seeks+user_scans)

    and index_id > 1

    order by user_updates / (user_seeks+user_scans+1) DESC

    open ms_drpi_iids

    fetch next from ms_drpi_iids into @database_id, @object_id, @index_id

    -- create temporary table to store the table name and index name

    create table #tabIdxnametab

    (

    table_name nvarchar(1000) collate database_default,

    index_name nvarchar(521) collate database_default

    )

    while (@@fetch_status >= 0)

    begin

    declare @exec_stmt nvarchar(4000)

    declare @database_name sysname

    select @database_name = db_name(@database_id)

    truncate table #tabIdxnametab

    -- insert the table name and index name into the temp table

    select @exec_stmt =

    'use '+ @database_name + ';'+

    'insert #tabIdxnametab select quotename(''' + @database_name+''', ''['')+ ''.'' +quotename(schema_name(o.schema_id), ''['')+''.''+quotename(o.name,''['') , i.name

    from sys.objects o, sys.indexes i where o.type = ''U'' and o.is_ms_shipped = 0 and i.is_primary_key = 0 and i.is_unique_constraint = 0 and o.object_id =' + cast ( @object_id as nvarchar(10))+' and o.object_id = i.object_id and index_id = '+ cast ( @index_id as nvarchar(10))

    --print @exec_stmt

    EXEC (@exec_stmt)

    -- get the table_name and index_name

    declare @table_name nvarchar(1000)

    declare @index_name sysname

    select @table_name = table_name, @index_name = index_name from #tabIdxnametab

    --use name convention to recognize auto-created indexes for now

    --in the future, we will add a special bit inside metadata to distinguish

    --if (substring(@index_name, 1, 8) = '_MS_Sys_')

    --begin

    -- construct drop index statement

    -- "DROP INDEX @TABLE_NAME.@INDEX_NAME"

    --select @exec_stmt = 'drop index '+@index_name+' on '+@table_name

    --print @exec_stmt

    --EXEC (@exec_stmt)

    --end

    --else

    --print 'User Index: '+@table_name + '.'+ @index_name

    IF (@index_name IS NOT NULL)

    begin

    select @exec_stmt = 'drop index '+@index_name+' on '+@table_name

    declare @id int

    declare @create_date datetime

    BEGIN TRANSACTION xAddDropIdxRecommendation

    DECLARE @result int;

    EXEC @result = dbo.add_recommendation @exec_stmt, 'DI', @id out

    if (@result <> 10)

    EXEC dbo.add_recommendation_details_index @id, @database_id, @object_id

    DECLARE @Error int

    SET @Error = @@ERROR

    IF @Error <> 0

    BEGIN

    ROLLBACK TRANSACTION xAddDropIdxRecommendation

    RETURN @Error

    END

    COMMIT TRANSACTION xAddDropIdxRecommendation

    end

    fetch next from ms_drpi_iids into @database_id, @object_id, @index_id

    end

    if (object_id('[#tabIdxnametab]') is not null)

    begin

    drop table [#tabIdxnametab]

    end

    deallocate ms_drpi_iids

    return(0) -- auto_drop_index

    go

    --

    -- JOBs for Executing [auto_create_index] and [auto_drop_index]

    --

    DECLARE @jobId BINARY(16)

    EXEC msdb.dbo.sp_add_job

    @job_name=N'SQL MDW: Auto Index Management',

    @job_id = @jobId OUTPUT

    GO

    EXEC msdb.dbo.sp_add_jobstep

    @job_name=N'SQL MDW: Auto Index Management',

    @step_name=N'Auto Create Index',

    @step_id=1,

    @subsystem=N'TSQL',

    @command=N'EXECUTE [dbo].[auto_create_index]',

    @on_success_action = 3, -- on success, go to next step

    @database_name=N'AUTOINDEXRECS'

    GO

    EXEC msdb.dbo.sp_add_jobstep

    @job_name=N'SQL MDW: Auto Index Management',

    @step_name=N'Auto Drop Index',

    @step_id=2,

    @subsystem=N'TSQL',

    @command=N'EXECUTE [dbo].[auto_drop_index]',

    @database_name=N'AUTOINDEXRECS'

    GO

    EXEC msdb.dbo.sp_add_jobserver

    @job_name=N'SQL MDW: Auto Index Management'

    GO

    DECLARE @schedule_id int

    EXEC msdb.dbo.sp_add_schedule

    @schedule_name = N'SQL MDW: Auto Index Management' ,

    @freq_type = 4, -- daily

    @freq_interval = 1, -- every day

    @freq_subday_type = 4, -- subday interval in minutes

    @freq_subday_interval = 30, -- every 30 minutes

    @schedule_id = @schedule_id OUTPUT

    EXEC msdb.dbo.sp_attach_schedule

    @job_name=N'SQL MDW: Auto Index Management',

    @schedule_id = @schedule_id

    go

  • Hi Jay,

    After Running a similar sized Datawarehouse (without Cubes) the following is the main advice I can give you.

    1) Business rules for validating and cleaning the Data take the bulk of the ETL time.

    2) Pay particular attention to your disk configuration, make good use of filegroups and multiple disk arrays (we did not have a SAN but inherited 100 disks with 6 Raid controllers), make sure your NTFS allocations are correct, I found 64K was best performance for our configuration. Would recommend using Mount points for you disk arrays as it makes for easier configuration and easier restores to other systems i.e. dev, test.

    3) Backup Compression software in our case was a must for to meet backup windows and reduce disk space. (used SQL Lite Speed with no problems)

    4) The Warehouse was based around Kimballs dimensional model, found that we had to add back some of the Natural keys to some of the very large Fact and dimensional tables for performance reasons, the joins between the large tables was killing performance.

    5) Appropriate indexing and Up to date statistic's (we updated nightly for all but the largest table) makes a huge performance impact.

    6) Setting the Warehouse to Read only after the ETL made a huge difference to reporting speed

    I trust this is helpful, can give further details if you would like.

    Cheers

    Brandon


    Kindest Regards,

    Brandon

Viewing 7 posts - 1 through 6 (of 6 total)

You must be logged in to reply to this topic. Login to reply