Database design

Question

Post reply

Database design

Jayanth_Kurup

SSC-Insane

Points: 22967
More actions
January 29, 2008 at 6:31 am

#183421

Hi
We are planning to implement a database which etl's about 2GB of data daily.
Estimating the overall size on the server to be close to 1 TB on a yearly basis.
We are going to be using an ETL database and a Datawarehouse , from which we will load the data into cubes. Considering the ETL and the overall size of the database are there any specifics that i need to consider when designing the database. I am sorry i cant provide much detail right now but we are looking for alternatives in the database design so that we dont tradeoff too much on space or performance.
I am thinking that I dont want to normalize the database too far. Am i right ???
Thanks
Jay
Jayanth Kurup[/url]

Viewing 7 posts - 1 through 6 (of 6 total)

You must be logged in to reply to this topic. Login to reply

Loner SSC-Insane Points: 21279 More actions · Answer 1

What about table partitioning, will it work in your case?

Grant Fritchey SSC Guru Points: 398679 More actions · Answer 2

You want to normalize the data as much as you need to. There really isn't a "too far" or a "not far enough." Meet the business requirements in the best way possible. Remember that normalization not only increases data accuracy, but it reduces the amount of data stored. For example, you can create all the address information with 50 million customers, repeating addresses over and over again, or you can link to an address table and radically reduce the amount of data stored. That two table join is not going to seriously impact performance. Three, four, and 15 table joins won't seriously impact performance either if you've got good indexes, especially good clustered indexes. Flattening the structure reduces joins and simplifies queries, but it could make for poorer performance (you'll need to index more columns on the table and maintain that data on that one table with more page splits, more index rebuilds...). If flat files were better, we'd never have gone to relational databases in the first place.

"The credit belongs to the man who is actually in the arena, whose face is marred by dust and sweat and blood"
- Theodore Roosevelt

Author of:
SQL Server Execution Plans
SQL Server Query Performance Tuning

Jayanth_Kurup SSC-Insane Points: 22967 More actions · Answer 3

Yes , we will be implementing a horizontal partition on the main tables , with a partition for each month.

regarding the indexes , we will be loading data once every 5 - 10 minutes, so i was wondering if i should use the indexes since the tables will have frequent inserts happening . I also need to implement full text indexing as well.

I need the ETL to complete in 5 -10 minutes before it starts all over again.

Indexes i think could become a performance bottle neck during etl , am i right ?

Thanks again for your help.

Jayanth Kurup[/url]

Grant Fritchey SSC Guru Points: 398679 More actions · Answer 4

Indexes can be, but aren't always, a performance problem when performing ETL. Best answer to that question is for you to test your load both ways, running with the indexes on and then running with a drop & recreate on the indexes. The one thing you can do to speed up either load is to, where possible, ensure that the data being loaded is in the same order as your clustered index. That helps regardless of whether you recreate the indexes or not.

"The credit belongs to the man who is actually in the arena, whose face is marred by dust and sweat and blood"
- Theodore Roosevelt

Author of:
SQL Server Execution Plans
SQL Server Query Performance Tuning

dtevlin Ten Centuries Points: 1084 More actions · Answer 5

If you have an ETL database for staging the data you would not have indexes on tables. The point of the ETL is to pull the information out and prep it for loading into your datawarehouse not to query against. If you use the datawarehouse as the collection of everything for historic purposes then you should think about implementing data marts or small subsets of the data using snapshots for your different internal business clients and leave the datawarehouse for your BI folks, and I don’t mean marketing, I mean your internal person who understands data mining algorithms and who can perform predictive analytics. The reduced number of data columns tailored to the specific needs of the internal client group could save you from index problems. The thing to keep in mind with indexes is that if the query returns more then 1% of your total data the optimizer will not use the index < this tidbit comes from Kimberly Trip at the SQLConnections conference (her site is excellent BTW http://www.sqlskills.com/). Your indexes on a DW will add roughly 3-5 times the storage space of the data and keep in-mind that to do online index rebuilds you need to have space equal to the size of the index being rebuilt.

A couple of things to keep in mind for planning purposes, first the 5-10 min ETL as a Service Level Agreement, may not be workable, 80% of the effort in the build of datawarehouse and data mining is getting the data clean and in the format you need. Only you are the expert as to the quality of your data but until you run through this process I would not commit to 5-10 mins. Additionally, only you can realistic assess if a 5-10 minute refresh makes business sense too. Marketing does just fine on a 12 or 24 hour old view of the data and inventory managment does fine with 30 min views of the data so long as you have a method to notify customers that you are out of stock when they have already placed an order.

Also, keep in mind that in order to data mine the data you need to use nvarchar and ntext as data types, varchar and varchar(max) won’t work. This could double your initial storage requirements and you want to store data in this type instead of converting on the fly which you can in BI-Studio because of the additional memory overheard associated with the conversion which leaves you with less memory for the predictive analytics which use a lot of memory; this alone is often reason enough to move to a 64 bit version for DW needs.

Based on the size of your data normalizing of 3NF should work fine the advantage to a star schema is that it is more intuitive for the end user clients so if you are looking to use the data mining plug-in for Excel to allow end-user to query data themselves this may work to your advantage > the advantage being that it shifts the report monkey duties to them and away from you leave you to do the heavy lifting such as predictive analytics and multichannel analysis.

Last recommendation is to use the following tool which I seem to be recommending a lot lately to automatically figure out your index needs based on usage patterns. This is from an earlier post but holds true here too.

What you want to do is pull the data from the missing index dmv and the script below will do this for you by creating a database called AUTOINDEXRECS that polls the missing index dmv and sucks in the info which leaves you to come back at a later time and look at the table to determine what indexes you need to create, and which can be dropped, and on what tables. You need to have sa permission to do this. This comes from the query team at microsoft and you should download the .zip here http://blogs.msdn.com/queryoptteam/archive/2006/06/01/613516.aspx.

I found it on Paul's former storage engine blog.

When you query the recommendation table the results will look like the following:

CREATE INDEX _MS_Sys_1 ON [Database_name].[dbo].[tbl_name]([ResponseID]) INCLUDE ([ResponseText])

This is without a doubt the best tuning tool for a database server, works wonders in OLAP environments where you don't know what the reports are going to be before hand, and I am baffled why this is not more widely known.

Lastly, make sure to use upsert statements if you will be updataing as well as inserting data and wrap everything in one transaction statement this will save you from the overhead of a row by agonizing row committ for all inserts. And make sure your temp DB has one file for each CPU core on the box to prevent contention issues.

The partitioning by month is a good plan but take a look here at a better design methodology to use >From Kim Tripp

(Entire post here http://www.sqlskills.com/blogs/kimberly/2007/10/03/SQLServer2008OffersPartitionlevelLockEscalationExcellentBut.aspx)

"As a result, I would suggest a slightly different architecture. Instead of using only a single partitioned table for both read-only and read-write data, use at least two tables. One table for read-only data and another for read-write data. If you think this might be defeating the purpose of partitioning... then look at these benefits:

* the read-only portion of the table (which is typically the *much* larger portion of the table - can still be managed with partitioning)

* the read-only portion - once separated from the read-write - can have additional indexes for better [range] query performance

* the read-only portion of the table can actually be partitioned into multiple partitioned tables - to give better per-table statistics (statistics are still at the table-level only so even if your partitioning scheme is "monthly" you might want to have tables that represent a year's worth of data...especially if your trends seem to change year to year)

* large range queries against the read-only portion of the data will only escalate to the "table" (which is now separated from the read-write data)

* the read-write portion of the data can have fewer indexes

* the read-write portion of the data can be placed on different disks (MORE fault tolerant disks) due to the importance/volatility of the data

* finally, and most importantly, the read-write portion of the data can be maintained completely separately from the read-only portion with regard to index rebuilds

"

Hope this helps,

--Dave

/****************************************************************************

//

// @File: AutoIndex.sql

//

// @test-2:

//

// Purpose:

// Auto create or drop indexes

//

// Notes:

//

// @EndHeader@

*****************************************************************************/

CREATE DATABASE AUTOINDEXRECS

go

USE AUTOINDEXRECS

go

-- Table to store recommendations

IF object_id(N'dbo.recommendations', N'U') IS NOT NULL

DROP table [dbo].[recommendations]

GO

create table [dbo].[recommendations]

(

id int IDENTITY primary key,

recommendation nvarchar(400),

type char(2),

initial_time datetime,

latest_time datetime,

[count] int,

status nvarchar(20)

)

GO

-- Table to store recommendation history

IF object_id(N'dbo.recommendations_history', N'U') IS NOT NULL

DROP table [dbo].[recommendations_history]

GO

create table [dbo].[recommendations_history]

(

id int,

operation nvarchar(20),

time datetime,

db_user_name sysname,

login_name sysname

)

GO

-- Table to store index recommendations details

IF object_id(N'dbo.recommendations_details_index', N'U') IS NOT NULL

DROP table [dbo].[recommendations_details_index]

GO

create table [dbo].[recommendations_details_index]

(

id int,

database_id int,

table_id int,

table_modify_time datetime

)

GO

------------------------- add_recommendation_history ----------------------------------------------------

------ SP for adding a recommendation into the recommendations_history table.

IF OBJECT_ID (N'dbo.add_recommendation_history', N'P') IS NOT NULL

DROP PROC [dbo].[add_recommendation_history];

GO

create procedure [dbo].[add_recommendation_history]

@id int,

@operation nvarchar(20),

@time datetime

AS

BEGIN

declare @db_user_name sysname

select @db_user_name = CURRENT_USER

declare @login_name sysname

select @login_name = SUSER_SNAME()

insert into recommendations_history values (@id, @operation, @time, @db_user_name, @login_name)

END

go

------------------------- add_recommendation----------------------------------------------------

------ SP for inserting a new recommendation into the dbo.RECOMMENDATIONS table.

------ If the same entry already exists, it just changes latest_create_date to the latest time

------ and increase the count by one

IF OBJECT_ID (N'dbo.add_recommendation', N'P') IS NOT NULL

DROP PROC [dbo].[add_recommendation];

GO

create procedure [dbo].[add_recommendation]

@recommendation nvarchar(max),

@type_desc char(2),

@id int OUTPUT

AS

BEGIN

declare @create_date datetime

set @create_date = getdate()

IF ( @recommendation not in

(select recommendation from dbo.recommendations))

BEGIN

insert into dbo.recommendations values

(@recommendation, @type_desc, @create_date, @create_date, 1, N'Active')

select @id = @@identity

-- add it into the recommendation history

exec [dbo].[add_recommendation_history] @id, N'ADD', @create_date

return 0

END

ELSE

BEGIN

select @id = id

from dbo.recommendations

where @recommendation = recommendation

update dbo.recommendations

set latest_time = @create_date,

[count] = [count] +1

where id = @id

-- add it into the recommendation history

exec [dbo].[add_recommendation_history] @id, N'UPDATE', @create_date

return 10

END

go

------------------------- disable_recommendation----------------------------------------------------

------ SP for disabling a recommendation in the RECOMMENDATIONS table.

IF OBJECT_ID (N'dbo.disable_recommendation', N'P') IS NOT NULL

DROP PROC [dbo].[disable_recommendation];

GO

create procedure [dbo].[disable_recommendation]

@id int

AS

BEGIN

BEGIN TRANSACTION xDisableRecommendation

declare @create_date datetime

set @create_date = getdate()

update recommendations

set status = N'Inactive'

where id = @id

-- add it into the recommendation history

exec [dbo].[add_recommendation_history] @id, N'DISABLE', @create_date

DECLARE @Error int

SET @Error = @@ERROR

IF @Error <> 0

BEGIN

ROLLBACK TRANSACTION xDisableRecommendation

RETURN @Error

END

COMMIT TRANSACTION xDisableRecommendation

END

go

------------------------- enable_recommendation----------------------------------------------------

------ SP for enabling a recommendation in the RECOMMENDATIONS table.

IF OBJECT_ID (N'dbo.enable_recommendation', N'P') IS NOT NULL

DROP PROC [dbo].[enable_recommendation];

GO

create procedure [dbo].[enable_recommendation]

@id int

AS

BEGIN

BEGIN TRANSACTION xEnableRecommendation

declare @create_date datetime

set @create_date = getdate()

update recommendations

set status = N'Active'

where id = @id

-- add it into the recommendation history

exec [dbo].[add_recommendation_history] @id, N'ENABLE', @create_date

DECLARE @Error int

SET @Error = @@ERROR

IF @Error <> 0

BEGIN

ROLLBACK TRANSACTION xEnableRecommendation

RETURN @Error

END

COMMIT TRANSACTION xEnableRecommendation

END

go

------------------------- execute_recommendation----------------------------------------------------

------ SP for executing a recommendation in the RECOMMENDATIONS table.

IF OBJECT_ID (N'dbo.execute_recommendation', N'P') IS NOT NULL

DROP PROC [dbo].[execute_recommendation];

GO

create procedure [dbo].[execute_recommendation]

@id int

AS

BEGIN

declare @recommendation nvarchar(max)

declare @status nvarchar(20)

-- exec the recommendation

select @recommendation = recommendation, @status = status

from [recommendations]

where id = @id

-- check recommendation status

if (@status = 'Inactive')

begin

print N'Error: Recommendation ' + cast ( @id as nvarchar(10)) + ' is Inactive. Change the status to Active before execution'

return 1

end

-- check whether the schema has changed for the table

declare @database_id int

declare @object_id int

declare @stored_modify_date datetime

select @database_id = database_id, @object_id = table_id, @stored_modify_date = table_modify_time

from [dbo].[recommendations_details_index]

where id = @id

declare @database_name sysname

select @database_name = db_name(@database_id)

-- create temporary table to store the current table schema version

create table [#tabSchema] ( modify_date datetime)

truncate table [#tabSchema]

declare @exec_stmt nvarchar(4000)

select @exec_stmt =

'use '+ @database_name +

'; insert [#tabSchema] select modify_date from sys.objects where object_id = ' + cast ( @object_id as nvarchar(10))

--print @exec_stmt

EXEC (@exec_stmt)

declare @modify_date datetime

select @modify_date = modify_date from #tabSchema

if (object_id('[#tabSchema]') is not null)

begin

drop table [#tabSchema]

end

if (@modify_date > @stored_modify_date)

begin

print N'Error: Recommendation ' + cast ( @id as nvarchar(10)) + ' might be invalid since the schema on the table has changed since the recommendation was made'

return 1

end

declare @create_date datetime

set @create_date = getdate()

BEGIN TRANSACTION xExecuteRecommendation

exec (@recommendation)

-- add it into the recommendation history

exec [dbo].[add_recommendation_history] @id, N'EXECUTE', @create_date

DECLARE @Error int

SET @Error = @@ERROR

IF @Error <> 0

BEGIN

ROLLBACK TRANSACTION xExecuteRecommendation

RETURN @Error

END

COMMIT TRANSACTION xExecuteRecommendation

END

go

------------------------- add_recommendation_details_index ----------------------------------------------------

------ SP for adding index recommendation details into the recommendations_details_index table.

IF OBJECT_ID (N'dbo.add_recommendation_details_index', N'P') IS NOT NULL

DROP PROC [dbo].[add_recommendation_details_index];

GO

create procedure [dbo].[add_recommendation_details_index]

@id int,

@database_id int,

@table_id int

AS

BEGIN

declare @database_name sysname

select @database_name = db_name(@database_id)

-- create temporary table to store the current table schema version

create table [#tabSchemaVer] ( modify_date datetime)

truncate table [#tabSchemaVer]

declare @exec_stmt nvarchar(4000)

select @exec_stmt =

'use '+ @database_name +

'; insert [#tabSchemaVer] select modify_date from sys.objects where object_id = ' + cast ( @table_id as nvarchar(10))

--print @exec_stmt

EXEC (@exec_stmt)

declare @tabVer datetime

select @tabVer = modify_date from #tabSchemaVer

insert into recommendations_details_index values (@id,@database_id, @table_id, @tabVer)

if (object_id('[#tabSchemaVer]') is not null)

begin

drop table [#tabSchemaVer]

end

END

go

---------------------------- auto_create_index ------------------------------

IF OBJECT_ID (N'dbo.auto_create_index', N'P') IS NOT NULL

DROP PROC [dbo].[auto_create_index];

GO

create procedure [dbo].[auto_create_index]

as

-- NOTE: This sp will create indexes recommended by the Missing Index DMVs.

--

set nocount on

-- required for creating index on ICC/IVs

set ansi_warnings on

set ansi_padding on

set arithabort on

set concat_null_yields_null on

set numeric_roundabort off

declare @exec_stmt nvarchar(4000)

declare @table_name nvarchar(521)

declare @column_name sysname

declare @column_usage varchar(20)

declare @column_id smallint

declare @index_handle int

declare @database_id int

declare @object_id int

-- find the top 5 indexes with maximum total improvent

declare ms_cri_tnames cursor local static for

Select Top 5 mid.database_id, mid.object_id, mid.statement as table_name, mig.index_handle as index_handle

from

(

select

(user_seeks+user_scans) * avg_total_user_cost * (avg_user_impact * 0.01) as index_advantage, migs.*

from sys.dm_db_missing_index_group_stats migs

) as migs_adv,

sys.dm_db_missing_index_groups mig,

sys.dm_db_missing_index_details mid

where

migs_adv.group_handle = mig.index_group_handle and

mig.index_handle = mid.index_handle

and migs_adv.index_advantage > 10

order by migs_adv.index_advantage DESC

-- create temporary table to store the table names on which we just auto created indexes

create table #tablenametab

( table_name nvarchar(521) collate database_default

)

truncate table #tablenametab

open ms_cri_tnames

fetch next from ms_cri_tnames into @database_id, @object_id, @table_name, @index_handle

--print @table_name

while (@@fetch_status <> -1)

begin

-- don't auto create index on same table again

-- UNDONE: we may try to filter out local temp table in the future

if (@table_name not in (select table_name from #tablenametab ))

begin

-- these are all columns on which we are going to auto create indexes

declare ms_cri_cnames cursor local for

select column_id, quotename(column_name,'['), column_usage

from sys.dm_db_missing_index_columns(@index_handle)

-- now go over all columns for the index to-be-created and

-- construct the create index statement

open ms_cri_cnames

fetch next from ms_cri_cnames into @column_id, @column_name, @column_usage

declare @index_name sysname

declare @include_column_list nvarchar(517)

declare @key_list nvarchar(517)

select @index_name = '_MS_Sys'

select @key_list = ''

select @include_column_list = ''

declare @num_keys smallint

declare @num_include_columns smallint

select @num_keys = 0

select @num_include_columns = 0

while @@fetch_status >= 0

begin

-- construct index name, key list and include column list during the loop

-- Index Name in the format: _MS_Sys_colid1_colid2_..._colidn

if (@column_usage = 'INCLUDE')

begin

if (@num_include_columns = 0)

select @include_column_list = @column_name

else

select @include_column_list = @include_column_list + ', ' +@column_name

select @num_include_columns = @num_include_columns + 1

end

else

begin

if (@num_keys = 0)

select @key_list = @column_name

else

select @key_list = @key_list + ', ' +@column_name

select @num_keys = @num_keys + 1

select @index_name = @index_name + '_'+cast ( @column_id as nvarchar(10))

end

fetch next from ms_cri_cnames into @column_id, @column_name, @column_usage

end

close ms_cri_cnames

deallocate ms_cri_cnames

--print @index_name

--print @table_name

--print @key_list

--print @include_column_list

-- construct create index statement

-- "CREATE INDEX @INDEX_NAME ON @TABLE_NAME (KEY_NAME1, KEY_NAME2, ...) INCLUDE (INCLUDE_COL_NAME1, INCLUDE_COL_NAME2, ...) WITH (ONLINE = ON)" (Note: for recommendation mode, we don't use online option)

if (@num_include_columns > 0)

select @exec_stmt = 'CREATE INDEX ' + @index_name + ' ON ' + @table_name + '(' + @key_list + ') INCLUDE ('+ @include_column_list + ')'-- WITH (ONLINE = ON)'

else

select @exec_stmt = 'CREATE INDEX ' + @index_name + ' ON ' + @table_name + '(' + @key_list + ')'-- WITH (ONLINE = ON)'

--print @exec_stmt

declare @id int

declare @create_date datetime

BEGIN TRANSACTION xAddCreateIdxRecommendation

DECLARE @result int;

EXEC @result = dbo.add_recommendation @exec_stmt, 'CI', @id OUT

if (@result <> 10)

EXEC dbo.add_recommendation_details_index @id, @database_id, @object_id

DECLARE @Error int

SET @Error = @@ERROR

IF @Error <> 0

BEGIN

ROLLBACK TRANSACTION xAddCreateIdxRecommendation

RETURN @Error

END

COMMIT TRANSACTION xAddCreateIdxRecommendation

--EXEC (@exec_stmt)

-- insert the table name into #tablenametab

insert into #tablenametab values (@table_name)

end

fetch next from ms_cri_tnames into @database_id, @object_id, @table_name, @index_handle

end

deallocate ms_cri_tnames

return(0) -- auto_create_index

go

---------------------------- sp_autodropindex ------------------------------

IF OBJECT_ID (N'dbo.auto_drop_index', N'P') IS NOT NULL

DROP PROC [dbo].[auto_drop_index];

GO

create procedure [dbo].[auto_drop_index]

as

-- NOTE: This sp will drop indexes that are automatically created and

-- are no longer very useful in a cost efficient manner based on feedbacks

-- from index usage DMVs.

set nocount on

declare @database_id int

declare @object_id int

declare @index_id int

declare ms_drpi_iids cursor local static for

Select Top 3 database_id, object_id, index_id

from sys.dm_db_index_usage_stats

where user_updates > 10 * (user_seeks+user_scans)

and index_id > 1

order by user_updates / (user_seeks+user_scans+1) DESC

open ms_drpi_iids

fetch next from ms_drpi_iids into @database_id, @object_id, @index_id

-- create temporary table to store the table name and index name

create table #tabIdxnametab

(

table_name nvarchar(1000) collate database_default,

index_name nvarchar(521) collate database_default

)

while (@@fetch_status >= 0)

begin

declare @exec_stmt nvarchar(4000)

declare @database_name sysname

select @database_name = db_name(@database_id)

truncate table #tabIdxnametab

-- insert the table name and index name into the temp table

select @exec_stmt =

'use '+ @database_name + ';'+

'insert #tabIdxnametab select quotename(''' + @database_name+''', ''['')+ ''.'' +quotename(schema_name(o.schema_id), ''['')+''.''+quotename(o.name,''['') , i.name

from sys.objects o, sys.indexes i where o.type = ''U'' and o.is_ms_shipped = 0 and i.is_primary_key = 0 and i.is_unique_constraint = 0 and o.object_id =' + cast ( @object_id as nvarchar(10))+' and o.object_id = i.object_id and index_id = '+ cast ( @index_id as nvarchar(10))

--print @exec_stmt

EXEC (@exec_stmt)

-- get the table_name and index_name

declare @table_name nvarchar(1000)

declare @index_name sysname

select @table_name = table_name, @index_name = index_name from #tabIdxnametab

--use name convention to recognize auto-created indexes for now

--in the future, we will add a special bit inside metadata to distinguish

--if (substring(@index_name, 1, 8) = '_MS_Sys_')

--begin

-- construct drop index statement

-- "DROP INDEX @TABLE_NAME.@INDEX_NAME"

--select @exec_stmt = 'drop index '+@index_name+' on '+@table_name

--print @exec_stmt

--EXEC (@exec_stmt)

--end

--else

--print 'User Index: '+@table_name + '.'+ @index_name

IF (@index_name IS NOT NULL)

begin

select @exec_stmt = 'drop index '+@index_name+' on '+@table_name

declare @id int

declare @create_date datetime

BEGIN TRANSACTION xAddDropIdxRecommendation

DECLARE @result int;

EXEC @result = dbo.add_recommendation @exec_stmt, 'DI', @id out

if (@result <> 10)

EXEC dbo.add_recommendation_details_index @id, @database_id, @object_id

DECLARE @Error int

SET @Error = @@ERROR

IF @Error <> 0

BEGIN

ROLLBACK TRANSACTION xAddDropIdxRecommendation

RETURN @Error

END

COMMIT TRANSACTION xAddDropIdxRecommendation

end

fetch next from ms_drpi_iids into @database_id, @object_id, @index_id

end

if (object_id('[#tabIdxnametab]') is not null)

begin

drop table [#tabIdxnametab]

end

deallocate ms_drpi_iids

return(0) -- auto_drop_index

go

--

-- JOBs for Executing [auto_create_index] and [auto_drop_index]

--

DECLARE @jobId BINARY(16)

EXEC msdb.dbo.sp_add_job

@job_name=N'SQL MDW: Auto Index Management',

@job_id = @jobId OUTPUT

GO

EXEC msdb.dbo.sp_add_jobstep

@job_name=N'SQL MDW: Auto Index Management',

@step_name=N'Auto Create Index',

@step_id=1,

@subsystem=N'TSQL',

@command=N'EXECUTE [dbo].[auto_create_index]',

@on_success_action = 3, -- on success, go to next step

@database_name=N'AUTOINDEXRECS'

GO

EXEC msdb.dbo.sp_add_jobstep

@job_name=N'SQL MDW: Auto Index Management',

@step_name=N'Auto Drop Index',

@step_id=2,

@subsystem=N'TSQL',

@command=N'EXECUTE [dbo].[auto_drop_index]',

@database_name=N'AUTOINDEXRECS'

GO

EXEC msdb.dbo.sp_add_jobserver

@job_name=N'SQL MDW: Auto Index Management'

GO

DECLARE @schedule_id int

EXEC msdb.dbo.sp_add_schedule

@schedule_name = N'SQL MDW: Auto Index Management' ,

@freq_type = 4, -- daily

@freq_interval = 1, -- every day

@freq_subday_type = 4, -- subday interval in minutes

@freq_subday_interval = 30, -- every 30 minutes

@schedule_id = @schedule_id OUTPUT

EXEC msdb.dbo.sp_attach_schedule

@job_name=N'SQL MDW: Auto Index Management',

@schedule_id = @schedule_id

go

fruitloop SSC Enthusiast Points: 156 More actions · Answer 6

Hi Jay,

After Running a similar sized Datawarehouse (without Cubes) the following is the main advice I can give you.

1) Business rules for validating and cleaning the Data take the bulk of the ETL time.

2) Pay particular attention to your disk configuration, make good use of filegroups and multiple disk arrays (we did not have a SAN but inherited 100 disks with 6 Raid controllers), make sure your NTFS allocations are correct, I found 64K was best performance for our configuration. Would recommend using Mount points for you disk arrays as it makes for easier configuration and easier restores to other systems i.e. dev, test.

3) Backup Compression software in our case was a must for to meet backup windows and reduce disk space. (used SQL Lite Speed with no problems)

4) The Warehouse was based around Kimballs dimensional model, found that we had to add back some of the Natural keys to some of the very large Fact and dimensional tables for performance reasons, the joins between the large tables was killing performance.

5) Appropriate indexing and Up to date statistic's (we updated nightly for all but the largest table) makes a huge performance impact.

6) Setting the Warehouse to Read only after the ETL made a huge difference to reporting speed

I trust this is helpful, can give further details if you would like.

Cheers

Brandon

Kindest Regards,

Brandon