|
|
|
SSCommitted
      
Group: General Forum Members
Last Login: Tuesday, May 07, 2013 6:43 AM
Points: 1,785,
Visits: 1,008
|
|
Hi
We are planning to implement a database which etl's about 2GB of data daily. Estimating the overall size on the server to be close to 1 TB on a yearly basis. We are going to be using an ETL database and a Datawarehouse , from which we will load the data into cubes. Considering the ETL and the overall size of the database are there any specifics that i need to consider when designing the database. I am sorry i cant provide much detail right now but we are looking for alternatives in the database design so that we dont tradeoff too much on space or performance.
I am thinking that I dont want to normalize the database too far. Am i right ???
Thanks Jay
Jayanth Kurup
|
|
|
|
|
SSCrazy
      
Group: General Forum Members
Last Login: Yesterday @ 7:05 AM
Points: 2,726,
Visits: 2,925
|
|
| What about table partitioning, will it work in your case?
|
|
|
|
|
SSChampion
        
Group: General Forum Members
Last Login: Today @ 4:50 AM
Points: 13,371,
Visits: 25,143
|
|
You want to normalize the data as much as you need to. There really isn't a "too far" or a "not far enough." Meet the business requirements in the best way possible. Remember that normalization not only increases data accuracy, but it reduces the amount of data stored. For example, you can create all the address information with 50 million customers, repeating addresses over and over again, or you can link to an address table and radically reduce the amount of data stored. That two table join is not going to seriously impact performance. Three, four, and 15 table joins won't seriously impact performance either if you've got good indexes, especially good clustered indexes. Flattening the structure reduces joins and simplifies queries, but it could make for poorer performance (you'll need to index more columns on the table and maintain that data on that one table with more page splits, more index rebuilds...). If flat files were better, we'd never have gone to relational databases in the first place.
---------------------------------------------------- "The credit belongs to the man who is actually in the arena, whose face is marred by dust and sweat and blood..." Theodore Roosevelt The Scary DBA Author of: SQL Server 2012 Query Performance Tuning SQL Server 2008 Query Performance Tuning Distilled and SQL Server Execution Plans
Product Evangelist for Red Gate Software
|
|
|
|
|
SSCommitted
      
Group: General Forum Members
Last Login: Tuesday, May 07, 2013 6:43 AM
Points: 1,785,
Visits: 1,008
|
|
Yes , we will be implementing a horizontal partition on the main tables , with a partition for each month.
regarding the indexes , we will be loading data once every 5 - 10 minutes, so i was wondering if i should use the indexes since the tables will have frequent inserts happening . I also need to implement full text indexing as well.
I need the ETL to complete in 5 -10 minutes before it starts all over again.
Indexes i think could become a performance bottle neck during etl , am i right ?
Thanks again for your help.
Jayanth Kurup
|
|
|
|
|
SSChampion
        
Group: General Forum Members
Last Login: Today @ 4:50 AM
Points: 13,371,
Visits: 25,143
|
|
Indexes can be, but aren't always, a performance problem when performing ETL. Best answer to that question is for you to test your load both ways, running with the indexes on and then running with a drop & recreate on the indexes. The one thing you can do to speed up either load is to, where possible, ensure that the data being loaded is in the same order as your clustered index. That helps regardless of whether you recreate the indexes or not.
---------------------------------------------------- "The credit belongs to the man who is actually in the arena, whose face is marred by dust and sweat and blood..." Theodore Roosevelt The Scary DBA Author of: SQL Server 2012 Query Performance Tuning SQL Server 2008 Query Performance Tuning Distilled and SQL Server Execution Plans
Product Evangelist for Red Gate Software
|
|
|
|
|
Valued Member
      
Group: General Forum Members
Last Login: Thursday, October 29, 2009 2:34 PM
Points: 56,
Visits: 77
|
|
If you have an ETL database for staging the data you would not have indexes on tables. The point of the ETL is to pull the information out and prep it for loading into your datawarehouse not to query against. If you use the datawarehouse as the collection of everything for historic purposes then you should think about implementing data marts or small subsets of the data using snapshots for your different internal business clients and leave the datawarehouse for your BI folks, and I don’t mean marketing, I mean your internal person who understands data mining algorithms and who can perform predictive analytics. The reduced number of data columns tailored to the specific needs of the internal client group could save you from index problems. The thing to keep in mind with indexes is that if the query returns more then 1% of your total data the optimizer will not use the index < this tidbit comes from Kimberly Trip at the SQLConnections conference (her site is excellent BTW http://www.sqlskills.com/). Your indexes on a DW will add roughly 3-5 times the storage space of the data and keep in-mind that to do online index rebuilds you need to have space equal to the size of the index being rebuilt.
A couple of things to keep in mind for planning purposes, first the 5-10 min ETL as a Service Level Agreement, may not be workable, 80% of the effort in the build of datawarehouse and data mining is getting the data clean and in the format you need. Only you are the expert as to the quality of your data but until you run through this process I would not commit to 5-10 mins. Additionally, only you can realistic assess if a 5-10 minute refresh makes business sense too. Marketing does just fine on a 12 or 24 hour old view of the data and inventory managment does fine with 30 min views of the data so long as you have a method to notify customers that you are out of stock when they have already placed an order.
Also, keep in mind that in order to data mine the data you need to use nvarchar and ntext as data types, varchar and varchar(max) won’t work. This could double your initial storage requirements and you want to store data in this type instead of converting on the fly which you can in BI-Studio because of the additional memory overheard associated with the conversion which leaves you with less memory for the predictive analytics which use a lot of memory; this alone is often reason enough to move to a 64 bit version for DW needs.
Based on the size of your data normalizing of 3NF should work fine the advantage to a star schema is that it is more intuitive for the end user clients so if you are looking to use the data mining plug-in for Excel to allow end-user to query data themselves this may work to your advantage > the advantage being that it shifts the report monkey duties to them and away from you leave you to do the heavy lifting such as predictive analytics and multichannel analysis.
Last recommendation is to use the following tool which I seem to be recommending a lot lately to automatically figure out your index needs based on usage patterns. This is from an earlier post but holds true here too.
What you want to do is pull the data from the missing index dmv and the script below will do this for you by creating a database called AUTOINDEXRECS that polls the missing index dmv and sucks in the info which leaves you to come back at a later time and look at the table to determine what indexes you need to create, and which can be dropped, and on what tables. You need to have sa permission to do this. This comes from the query team at microsoft and you should download the .zip here http://blogs.msdn.com/queryoptteam/archive/2006/06/01/613516.aspx. I found it on Paul's former storage engine blog.
When you query the recommendation table the results will look like the following: CREATE INDEX _MS_Sys_1 ON [Database_name].[dbo].[tbl_name]([ResponseID]) INCLUDE ([ResponseText])
This is without a doubt the best tuning tool for a database server, works wonders in OLAP environments where you don't know what the reports are going to be before hand, and I am baffled why this is not more widely known.
Lastly, make sure to use upsert statements if you will be updataing as well as inserting data and wrap everything in one transaction statement this will save you from the overhead of a row by agonizing row committ for all inserts. And make sure your temp DB has one file for each CPU core on the box to prevent contention issues.
The partitioning by month is a good plan but take a look here at a better design methodology to use >From Kim Tripp (Entire post here http://www.sqlskills.com/blogs/kimberly/2007/10/03/SQLServer2008OffersPartitionlevelLockEscalationExcellentBut.aspx)
"As a result, I would suggest a slightly different architecture. Instead of using only a single partitioned table for both read-only and read-write data, use at least two tables. One table for read-only data and another for read-write data. If you think this might be defeating the purpose of partitioning... then look at these benefits:
* the read-only portion of the table (which is typically the *much* larger portion of the table - can still be managed with partitioning) * the read-only portion - once separated from the read-write - can have additional indexes for better [range] query performance * the read-only portion of the table can actually be partitioned into multiple partitioned tables - to give better per-table statistics (statistics are still at the table-level only so even if your partitioning scheme is "monthly" you might want to have tables that represent a year's worth of data...especially if your trends seem to change year to year) * large range queries against the read-only portion of the data will only escalate to the "table" (which is now separated from the read-write data) * the read-write portion of the data can have fewer indexes * the read-write portion of the data can be placed on different disks (MORE fault tolerant disks) due to the importance/volatility of the data * finally, and most importantly, the read-write portion of the data can be maintained completely separately from the read-only portion with regard to index rebuilds
"
Hope this helps, --Dave
/**************************************************************************** // Copyright (c) 2005 Microsoft Corporation. // // @File: AutoIndex.sql // // @test: // // Purpose: // Auto create or drop indexes // // Notes: // // // @EndHeader@ *****************************************************************************/
CREATE DATABASE AUTOINDEXRECS go
USE AUTOINDEXRECS go
-- Table to store recommendations IF object_id(N'dbo.recommendations', N'U') IS NOT NULL DROP table [dbo].[recommendations] GO
create table [dbo].[recommendations] ( id int IDENTITY primary key, recommendation nvarchar(400), type char(2), initial_time datetime, latest_time datetime, [count] int, status nvarchar(20) ) GO
-- Table to store recommendation history IF object_id(N'dbo.recommendations_history', N'U') IS NOT NULL DROP table [dbo].[recommendations_history] GO
create table [dbo].[recommendations_history] ( id int, operation nvarchar(20), time datetime, db_user_name sysname, login_name sysname ) GO
-- Table to store index recommendations details IF object_id(N'dbo.recommendations_details_index', N'U') IS NOT NULL DROP table [dbo].[recommendations_details_index] GO
create table [dbo].[recommendations_details_index] ( id int, database_id int, table_id int, table_modify_time datetime ) GO
------------------------- add_recommendation_history ---------------------------------------------------- ------ SP for adding a recommendation into the recommendations_history table. IF OBJECT_ID (N'dbo.add_recommendation_history', N'P') IS NOT NULL DROP PROC [dbo].[add_recommendation_history]; GO
create procedure [dbo].[add_recommendation_history] @id int, @operation nvarchar(20), @time datetime AS BEGIN declare @db_user_name sysname select @db_user_name = CURRENT_USER declare @login_name sysname select @login_name = SUSER_SNAME() insert into recommendations_history values (@id, @operation, @time, @db_user_name, @login_name) END go
------------------------- add_recommendation---------------------------------------------------- ------ SP for inserting a new recommendation into the dbo.RECOMMENDATIONS table. ------ If the same entry already exists, it just changes latest_create_date to the latest time ------ and increase the count by one IF OBJECT_ID (N'dbo.add_recommendation', N'P') IS NOT NULL DROP PROC [dbo].[add_recommendation]; GO
create procedure [dbo].[add_recommendation] @recommendation nvarchar(max), @type_desc char(2), @id int OUTPUT AS BEGIN declare @create_date datetime set @create_date = getdate()
IF ( @recommendation not in (select recommendation from dbo.recommendations)) BEGIN insert into dbo.recommendations values (@recommendation, @type_desc, @create_date, @create_date, 1, N'Active') select @id = @@identity -- add it into the recommendation history exec [dbo].[add_recommendation_history] @id, N'ADD', @create_date return 0 END ELSE BEGIN select @id = id from dbo.recommendations where @recommendation = recommendation
update dbo.recommendations set latest_time = @create_date, [count] = [count] +1 where id = @id
-- add it into the recommendation history exec [dbo].[add_recommendation_history] @id, N'UPDATE', @create_date return 10 END END go
------------------------- disable_recommendation---------------------------------------------------- ------ SP for disabling a recommendation in the RECOMMENDATIONS table. IF OBJECT_ID (N'dbo.disable_recommendation', N'P') IS NOT NULL DROP PROC [dbo].[disable_recommendation]; GO
create procedure [dbo].[disable_recommendation] @id int AS BEGIN BEGIN TRANSACTION xDisableRecommendation
declare @create_date datetime set @create_date = getdate()
update recommendations set status = N'Inactive' where id = @id
-- add it into the recommendation history exec [dbo].[add_recommendation_history] @id, N'DISABLE', @create_date
DECLARE @Error int SET @Error = @@ERROR IF @Error <> 0 BEGIN ROLLBACK TRANSACTION xDisableRecommendation RETURN @Error END COMMIT TRANSACTION xDisableRecommendation END go
------------------------- enable_recommendation---------------------------------------------------- ------ SP for enabling a recommendation in the RECOMMENDATIONS table. IF OBJECT_ID (N'dbo.enable_recommendation', N'P') IS NOT NULL DROP PROC [dbo].[enable_recommendation]; GO create procedure [dbo].[enable_recommendation] @id int AS BEGIN BEGIN TRANSACTION xEnableRecommendation
declare @create_date datetime set @create_date = getdate()
update recommendations set status = N'Active' where id = @id
-- add it into the recommendation history exec [dbo].[add_recommendation_history] @id, N'ENABLE', @create_date
DECLARE @Error int SET @Error = @@ERROR IF @Error <> 0 BEGIN ROLLBACK TRANSACTION xEnableRecommendation RETURN @Error END COMMIT TRANSACTION xEnableRecommendation END go
------------------------- execute_recommendation---------------------------------------------------- ------ SP for executing a recommendation in the RECOMMENDATIONS table. IF OBJECT_ID (N'dbo.execute_recommendation', N'P') IS NOT NULL DROP PROC [dbo].[execute_recommendation]; GO create procedure [dbo].[execute_recommendation] @id int AS BEGIN declare @recommendation nvarchar(max) declare @status nvarchar(20)
-- exec the recommendation select @recommendation = recommendation, @status = status from [recommendations] where id = @id
-- check recommendation status if (@status = 'Inactive') begin print N'Error: Recommendation ' + cast ( @id as nvarchar(10)) + ' is Inactive. Change the status to Active before execution' return 1 end
-- check whether the schema has changed for the table declare @database_id int declare @object_id int declare @stored_modify_date datetime select @database_id = database_id, @object_id = table_id, @stored_modify_date = table_modify_time from [dbo].[recommendations_details_index] where id = @id
declare @database_name sysname select @database_name = db_name(@database_id)
-- create temporary table to store the current table schema version create table [#tabSchema] ( modify_date datetime) truncate table [#tabSchema]
declare @exec_stmt nvarchar(4000) select @exec_stmt = 'use '+ @database_name + '; insert [#tabSchema] select modify_date from sys.objects where object_id = ' + cast ( @object_id as nvarchar(10)) --print @exec_stmt EXEC (@exec_stmt)
declare @modify_date datetime select @modify_date = modify_date from #tabSchema
if (object_id('[#tabSchema]') is not null) begin drop table [#tabSchema] end if (@modify_date > @stored_modify_date) begin print N'Error: Recommendation ' + cast ( @id as nvarchar(10)) + ' might be invalid since the schema on the table has changed since the recommendation was made' return 1 end
declare @create_date datetime set @create_date = getdate()
BEGIN TRANSACTION xExecuteRecommendation exec (@recommendation)
-- add it into the recommendation history exec [dbo].[add_recommendation_history] @id, N'EXECUTE', @create_date
DECLARE @Error int SET @Error = @@ERROR IF @Error <> 0 BEGIN ROLLBACK TRANSACTION xExecuteRecommendation RETURN @Error END COMMIT TRANSACTION xExecuteRecommendation END go
------------------------- add_recommendation_details_index ---------------------------------------------------- ------ SP for adding index recommendation details into the recommendations_details_index table. IF OBJECT_ID (N'dbo.add_recommendation_details_index', N'P') IS NOT NULL DROP PROC [dbo].[add_recommendation_details_index]; GO create procedure [dbo].[add_recommendation_details_index] @id int, @database_id int, @table_id int AS BEGIN declare @database_name sysname select @database_name = db_name(@database_id)
-- create temporary table to store the current table schema version create table [#tabSchemaVer] ( modify_date datetime) truncate table [#tabSchemaVer]
declare @exec_stmt nvarchar(4000) select @exec_stmt = 'use '+ @database_name + '; insert [#tabSchemaVer] select modify_date from sys.objects where object_id = ' + cast ( @table_id as nvarchar(10)) --print @exec_stmt EXEC (@exec_stmt)
declare @tabVer datetime select @tabVer = modify_date from #tabSchemaVer insert into recommendations_details_index values (@id,@database_id, @table_id, @tabVer) if (object_id('[#tabSchemaVer]') is not null) begin drop table [#tabSchemaVer] end END go
---------------------------- auto_create_index ------------------------------ IF OBJECT_ID (N'dbo.auto_create_index', N'P') IS NOT NULL DROP PROC [dbo].[auto_create_index]; GO create procedure [dbo].[auto_create_index] as -- NOTE: This sp will create indexes recommended by the Missing Index DMVs. -- set nocount on
-- required for creating index on ICC/IVs set ansi_warnings on set ansi_padding on set arithabort on set concat_null_yields_null on set numeric_roundabort off
declare @exec_stmt nvarchar(4000) declare @table_name nvarchar(521) declare @column_name sysname declare @column_usage varchar(20) declare @column_id smallint declare @index_handle int declare @database_id int declare @object_id int
-- find the top 5 indexes with maximum total improvent declare ms_cri_tnames cursor local static for Select Top 5 mid.database_id, mid.object_id, mid.statement as table_name, mig.index_handle as index_handle from ( select (user_seeks+user_scans) * avg_total_user_cost * (avg_user_impact * 0.01) as index_advantage, migs.* from sys.dm_db_missing_index_group_stats migs ) as migs_adv, sys.dm_db_missing_index_groups mig, sys.dm_db_missing_index_details mid where migs_adv.group_handle = mig.index_group_handle and mig.index_handle = mid.index_handle and migs_adv.index_advantage > 10 order by migs_adv.index_advantage DESC
-- create temporary table to store the table names on which we just auto created indexes create table #tablenametab ( table_name nvarchar(521) collate database_default )
truncate table #tablenametab
open ms_cri_tnames fetch next from ms_cri_tnames into @database_id, @object_id, @table_name, @index_handle
--print @table_name
while (@@fetch_status <> -1) begin -- don't auto create index on same table again -- UNDONE: we may try to filter out local temp table in the future if (@table_name not in (select table_name from #tablenametab )) begin -- these are all columns on which we are going to auto create indexes declare ms_cri_cnames cursor local for select column_id, quotename(column_name,'['), column_usage from sys.dm_db_missing_index_columns(@index_handle)
-- now go over all columns for the index to-be-created and -- construct the create index statement open ms_cri_cnames fetch next from ms_cri_cnames into @column_id, @column_name, @column_usage
declare @index_name sysname declare @include_column_list nvarchar(517) declare @key_list nvarchar(517) select @index_name = '_MS_Sys' select @key_list = '' select @include_column_list = '' declare @num_keys smallint declare @num_include_columns smallint select @num_keys = 0 select @num_include_columns = 0
while @@fetch_status >= 0 begin -- construct index name, key list and include column list during the loop -- Index Name in the format: _MS_Sys_colid1_colid2_..._colidn
if (@column_usage = 'INCLUDE') begin if (@num_include_columns = 0) select @include_column_list = @column_name else select @include_column_list = @include_column_list + ', ' +@column_name select @num_include_columns = @num_include_columns + 1 end else begin if (@num_keys = 0) select @key_list = @column_name else select @key_list = @key_list + ', ' +@column_name
select @num_keys = @num_keys + 1 select @index_name = @index_name + '_'+cast ( @column_id as nvarchar(10)) end
fetch next from ms_cri_cnames into @column_id, @column_name, @column_usage end close ms_cri_cnames deallocate ms_cri_cnames --print @index_name --print @table_name --print @key_list --print @include_column_list -- construct create index statement -- "CREATE INDEX @INDEX_NAME ON @TABLE_NAME (KEY_NAME1, KEY_NAME2, ...) INCLUDE (INCLUDE_COL_NAME1, INCLUDE_COL_NAME2, ...) WITH (ONLINE = ON)" (Note: for recommendation mode, we don't use online option) if (@num_include_columns > 0) select @exec_stmt = 'CREATE INDEX ' + @index_name + ' ON ' + @table_name + '(' + @key_list + ') INCLUDE ('+ @include_column_list + ')'-- WITH (ONLINE = ON)' else select @exec_stmt = 'CREATE INDEX ' + @index_name + ' ON ' + @table_name + '(' + @key_list + ')'-- WITH (ONLINE = ON)' --print @exec_stmt declare @id int declare @create_date datetime BEGIN TRANSACTION xAddCreateIdxRecommendation DECLARE @result int;
EXEC @result = dbo.add_recommendation @exec_stmt, 'CI', @id OUT if (@result <> 10) EXEC dbo.add_recommendation_details_index @id, @database_id, @object_id
DECLARE @Error int SET @Error = @@ERROR IF @Error <> 0 BEGIN ROLLBACK TRANSACTION xAddCreateIdxRecommendation RETURN @Error END COMMIT TRANSACTION xAddCreateIdxRecommendation --EXEC (@exec_stmt)
-- insert the table name into #tablenametab insert into #tablenametab values (@table_name) end
fetch next from ms_cri_tnames into @database_id, @object_id, @table_name, @index_handle
end deallocate ms_cri_tnames
return(0) -- auto_create_index go
---------------------------- sp_autodropindex ------------------------------ IF OBJECT_ID (N'dbo.auto_drop_index', N'P') IS NOT NULL DROP PROC [dbo].[auto_drop_index]; GO create procedure [dbo].[auto_drop_index] as -- NOTE: This sp will drop indexes that are automatically created and -- are no longer very useful in a cost efficient manner based on feedbacks -- from index usage DMVs.
set nocount on
declare @database_id int declare @object_id int declare @index_id int
declare ms_drpi_iids cursor local static for Select Top 3 database_id, object_id, index_id from sys.dm_db_index_usage_stats where user_updates > 10 * (user_seeks+user_scans) and index_id > 1 order by user_updates / (user_seeks+user_scans+1) DESC
open ms_drpi_iids fetch next from ms_drpi_iids into @database_id, @object_id, @index_id
-- create temporary table to store the table name and index name create table #tabIdxnametab ( table_name nvarchar(1000) collate database_default, index_name nvarchar(521) collate database_default )
while (@@fetch_status >= 0) begin
declare @exec_stmt nvarchar(4000) declare @database_name sysname select @database_name = db_name(@database_id)
truncate table #tabIdxnametab
-- insert the table name and index name into the temp table select @exec_stmt = 'use '+ @database_name + ';'+ 'insert #tabIdxnametab select quotename(''' + @database_name+''', ''['')+ ''.'' +quotename(schema_name(o.schema_id), ''['')+''.''+quotename(o.name,''['') , i.name from sys.objects o, sys.indexes i where o.type = ''U'' and o.is_ms_shipped = 0 and i.is_primary_key = 0 and i.is_unique_constraint = 0 and o.object_id =' + cast ( @object_id as nvarchar(10))+' and o.object_id = i.object_id and index_id = '+ cast ( @index_id as nvarchar(10)) --print @exec_stmt EXEC (@exec_stmt)
-- get the table_name and index_name declare @table_name nvarchar(1000) declare @index_name sysname select @table_name = table_name, @index_name = index_name from #tabIdxnametab --use name convention to recognize auto-created indexes for now --in the future, we will add a special bit inside metadata to distinguish --if (substring(@index_name, 1, 8) = '_MS_Sys_') --begin -- construct drop index statement -- "DROP INDEX @TABLE_NAME.@INDEX_NAME" --select @exec_stmt = 'drop index '+@index_name+' on '+@table_name --print @exec_stmt --EXEC (@exec_stmt) --end --else --print 'User Index: '+@table_name + '.'+ @index_name IF (@index_name IS NOT NULL) begin select @exec_stmt = 'drop index '+@index_name+' on '+@table_name
declare @id int declare @create_date datetime
BEGIN TRANSACTION xAddDropIdxRecommendation DECLARE @result int; EXEC @result = dbo.add_recommendation @exec_stmt, 'DI', @id out if (@result <> 10) EXEC dbo.add_recommendation_details_index @id, @database_id, @object_id
DECLARE @Error int SET @Error = @@ERROR IF @Error <> 0 BEGIN ROLLBACK TRANSACTION xAddDropIdxRecommendation RETURN @Error END COMMIT TRANSACTION xAddDropIdxRecommendation end
fetch next from ms_drpi_iids into @database_id, @object_id, @index_id end
if (object_id('[#tabIdxnametab]') is not null) begin drop table [#tabIdxnametab] end deallocate ms_drpi_iids return(0) -- auto_drop_index go
-- -- JOBs for Executing [auto_create_index] and [auto_drop_index] --
DECLARE @jobId BINARY(16) EXEC msdb.dbo.sp_add_job @job_name=N'SQL MDW: Auto Index Management', @job_id = @jobId OUTPUT GO
EXEC msdb.dbo.sp_add_jobstep @job_name=N'SQL MDW: Auto Index Management', @step_name=N'Auto Create Index', @step_id=1, @subsystem=N'TSQL', @command=N'EXECUTE [dbo].[auto_create_index]', @on_success_action = 3, -- on success, go to next step @database_name=N'AUTOINDEXRECS' GO
EXEC msdb.dbo.sp_add_jobstep @job_name=N'SQL MDW: Auto Index Management', @step_name=N'Auto Drop Index', @step_id=2, @subsystem=N'TSQL', @command=N'EXECUTE [dbo].[auto_drop_index]', @database_name=N'AUTOINDEXRECS' GO
EXEC msdb.dbo.sp_add_jobserver @job_name=N'SQL MDW: Auto Index Management' GO
DECLARE @schedule_id int EXEC msdb.dbo.sp_add_schedule @schedule_name = N'SQL MDW: Auto Index Management' , @freq_type = 4, -- daily @freq_interval = 1, -- every day @freq_subday_type = 4, -- subday interval in minutes @freq_subday_interval = 30, -- every 30 minutes @schedule_id = @schedule_id OUTPUT
EXEC msdb.dbo.sp_attach_schedule @job_name=N'SQL MDW: Auto Index Management', @schedule_id = @schedule_id go
|
|
|
|
|
SSC Rookie
      
Group: General Forum Members
Last Login: Sunday, February 17, 2013 4:35 PM
Points: 46,
Visits: 132
|
|
Hi Jay,
After Running a similar sized Datawarehouse (without Cubes) the following is the main advice I can give you.
1) Business rules for validating and cleaning the Data take the bulk of the ETL time. 2) Pay particular attention to your disk configuration, make good use of filegroups and multiple disk arrays (we did not have a SAN but inherited 100 disks with 6 Raid controllers), make sure your NTFS allocations are correct, I found 64K was best performance for our configuration. Would recommend using Mount points for you disk arrays as it makes for easier configuration and easier restores to other systems i.e. dev, test. 3) Backup Compression software in our case was a must for to meet backup windows and reduce disk space. (used SQL Lite Speed with no problems) 4) The Warehouse was based around Kimballs dimensional model, found that we had to add back some of the Natural keys to some of the very large Fact and dimensional tables for performance reasons, the joins between the large tables was killing performance. 5) Appropriate indexing and Up to date statistic's (we updated nightly for all but the largest table) makes a huge performance impact. 6) Setting the Warehouse to Read only after the ETL made a huge difference to reporting speed
I trust this is helpful, can give further details if you would like. Cheers Brandon
Kindest Regards,
Brandon
|
|
|
|