Click here to monitor SSC
SQLServerCentral is supported by Red Gate Software Ltd.
 
Log in  ::  Register  ::  Not logged in
 
 
 
        
Home       Members    Calendar    Who's On


Add to briefcase

partitioning a large dimension? Expand / Collapse
Author
Message
Posted Friday, May 9, 2014 7:46 AM
SSC-Enthusiastic

SSC-EnthusiasticSSC-EnthusiasticSSC-EnthusiasticSSC-EnthusiasticSSC-EnthusiasticSSC-EnthusiasticSSC-EnthusiasticSSC-Enthusiastic

Group: General Forum Members
Last Login: 2 days ago @ 7:05 AM
Points: 129, Visits: 435
The SQL CAT team's recommendation is to avoid partitioning dimension tables: http://blogs.msdn.com/b/sqlcat/archive/2013/09/16/top-10-best-practices-for-building-a-large-scale-relational-data-warehous.aspx

I have inherited a dimension table that has almost 3 billion rows and is 1TB and been asked to look at partitioning and putting maintenance in place, etc. (first step is to compress)

I'm not a DW expert so was wondering what are the reasons to not partition dimensions? I can find articles that reference the above article but not much that explains why.

Thanks in advance.
Post #1569272
Posted Friday, May 9, 2014 8:14 AM


SSC-Dedicated

SSC-DedicatedSSC-DedicatedSSC-DedicatedSSC-DedicatedSSC-DedicatedSSC-DedicatedSSC-DedicatedSSC-DedicatedSSC-DedicatedSSC-DedicatedSSC-DedicatedSSC-DedicatedSSC-Dedicated

Group: General Forum Members
Last Login: Today @ 10:23 PM
Points: 35,550, Visits: 32,147
GavD (5/9/2014)
The SQL CAT team's recommendation is to avoid partitioning dimension tables: http://blogs.msdn.com/b/sqlcat/archive/2013/09/16/top-10-best-practices-for-building-a-large-scale-relational-data-warehous.aspx

I have inherited a dimension table that has almost 3 billion rows and is 1TB and been asked to look at partitioning and putting maintenance in place, etc. (first step is to compress)

I'm not a DW expert so was wondering what are the reasons to not partition dimensions? I can find articles that reference the above article but not much that explains why.

Thanks in advance.


Are you sure that you're not mixing things up a bit? Dimension tables should be relatively small when compared to Fact tables and, as a result, shouldn't need to be partitioned.


--Jeff Moden
"RBAR is pronounced "ree-bar" and is a "Modenism" for "Row-By-Agonizing-Row".

First step towards the paradigm shift of writing Set Based code:
Stop thinking about what you want to do to a row... think, instead, of what you want to do to a column."

(play on words) "Just because you CAN do something in T-SQL, doesn't mean you SHOULDN'T." --22 Aug 2013

Helpful Links:
How to post code problems
How to post performance problems
Post #1569296
Posted Friday, May 9, 2014 8:28 AM
SSC-Enthusiastic

SSC-EnthusiasticSSC-EnthusiasticSSC-EnthusiasticSSC-EnthusiasticSSC-EnthusiasticSSC-EnthusiasticSSC-EnthusiasticSSC-Enthusiastic

Group: General Forum Members
Last Login: 2 days ago @ 7:05 AM
Points: 129, Visits: 435
That was my understanding but this is a dimension. The large fact tables are all partitioned, they are a similar size to this one.
Post #1569305
Posted Friday, May 9, 2014 8:36 AM


SSC-Dedicated

SSC-DedicatedSSC-DedicatedSSC-DedicatedSSC-DedicatedSSC-DedicatedSSC-DedicatedSSC-DedicatedSSC-DedicatedSSC-DedicatedSSC-DedicatedSSC-DedicatedSSC-DedicatedSSC-Dedicated

Group: General Forum Members
Last Login: Today @ 10:23 PM
Points: 35,550, Visits: 32,147
Can you post the CREATE TABLE statement (including indexes, keys, FKs)? That might help a bit here.

Just so you know, partitioning a table that has other tables pointing to it via FK can be a real pain because, in order for a column to be unique on such a table, either the partitioning column must be added to the otherwise unique column (which would break it as an FK candidate) or the UNIQUE index on the column can't be "aligned" which would kill the ability to do SWITCHes to move legacy data out of the table quickly.


--Jeff Moden
"RBAR is pronounced "ree-bar" and is a "Modenism" for "Row-By-Agonizing-Row".

First step towards the paradigm shift of writing Set Based code:
Stop thinking about what you want to do to a row... think, instead, of what you want to do to a column."

(play on words) "Just because you CAN do something in T-SQL, doesn't mean you SHOULDN'T." --22 Aug 2013

Helpful Links:
How to post code problems
How to post performance problems
Post #1569311
Posted Friday, May 9, 2014 9:28 AM
SSC-Enthusiastic

SSC-EnthusiasticSSC-EnthusiasticSSC-EnthusiasticSSC-EnthusiasticSSC-EnthusiasticSSC-EnthusiasticSSC-EnthusiasticSSC-Enthusiastic

Group: General Forum Members
Last Login: 2 days ago @ 7:05 AM
Points: 129, Visits: 435
CREATE TABLE [dbo].[dim_dr](
[dr_sk] [bigint] IDENTITY(1,1) NOT NULL,
[dr_nk] [varchar](30) NOT NULL,
[s_sk] [bigint] NULL,
[s_nk] [varchar](30) NULL,
[dr_frdate_hq_sk] [int] NULL,
[dr_frtime_hq_sk] [int] NULL,
[dr_tdate_hq_sk] [int] NULL,
[dr_ttime_hq_sk] [int] NULL,
[dr_frdate_local_sk] [int] NULL,
[dr_frtime_local_sk] [int] NULL,
[dr_tdate_local_sk] [int] NULL,
[dr_ttime_local_sk] [int] NULL,
[dr_frdate_utc_sk] [int] NULL,
[dr_frtime_utc_sk] [int] NULL,
[dr_tdate_utc_sk] [int] NULL,
[dr_ttime_utc_sk] [int] NULL,
[p_dr_id] [int] NULL,
[p_s_id] [int] NULL,
[p_ita_dr_id] [int] NULL,
[p_ita_s_id] [int] NULL,
[e_idfodr] [numeric](28, 12) NULL,
[e_idfos] [numeric](28, 12) NULL,
[c_datetime_hq] [datetime] NULL,
[c_datetime_local] [datetime] NULL,
[c_datetime_utc] [datetime] NULL,
[dr_decimal] [decimal](28, 12) NULL,
[dr_numerator] [int] NULL,
[dr_denominator] [numeric](9, 0) NULL,
[dr_fraction] [varchar](20) NULL,
[ibr_dr_flag] [bit] NULL,
[dr_frdatetime_hq] [datetime] NULL,
[dr_frdatetime_local] [datetime] NULL,
[dr_frdatetime_utc] [datetime] NULL,
[dr_tdatetime_hq] [datetime] NULL,
[dr_tdatetime_local] [datetime] NULL,
[dr_tdatetime_utc] [datetime] NULL,
[current_dr_flag] [bit] NULL,
[inferred_m_flag] [bit] NULL,
[inserted_exec_log_key] [bigint] NULL,
[updated_exec_log_key] [bigint] NULL,
CONSTRAINT [PK_dim_dr] PRIMARY KEY CLUSTERED
(
[dr_sk] ASC
)WITH (PAD_INDEX = OFF, STATISTICS_NORECOMPUTE = OFF, IGNORE_DUP_KEY = OFF, ALLOW_ROW_LOCKS = ON, ALLOW_PAGE_LOCKS = ON) ON [DIM_dr]
) ON [DIM_dr]

GO

SET ANSI_PADDING ON
GO

ALTER TABLE [dbo].[dim_dr] WITH NOCHECK ADD CONSTRAINT [FK_dim_dr_dr_frdate_hq_sk_dim_d] FOREIGN KEY([dr_frdate_hq_sk])
REFERENCES [dbo].[dim_d] ([d_sk])
NOT FOR REPLICATION
GO

ALTER TABLE [dbo].[dim_dr] NOCHECK CONSTRAINT [FK_dim_dr_dr_frdate_hq_sk_dim_d]
GO

ALTER TABLE [dbo].[dim_dr] WITH NOCHECK ADD CONSTRAINT [FK_dim_dr_dr_frdate_local_sk_dim_d] FOREIGN KEY([dr_frdate_local_sk])
REFERENCES [dbo].[dim_d] ([d_sk])
NOT FOR REPLICATION
GO

ALTER TABLE [dbo].[dim_dr] NOCHECK CONSTRAINT [FK_dim_dr_dr_frdate_local_sk_dim_d]
GO

ALTER TABLE [dbo].[dim_dr] WITH NOCHECK ADD CONSTRAINT [FK_dim_dr_dr_frdate_utc_sk_dim_d] FOREIGN KEY([dr_frdate_utc_sk])
REFERENCES [dbo].[dim_d] ([d_sk])
NOT FOR REPLICATION
GO

ALTER TABLE [dbo].[dim_dr] NOCHECK CONSTRAINT [FK_dim_dr_dr_frdate_utc_sk_dim_d]
GO

ALTER TABLE [dbo].[dim_dr] WITH NOCHECK ADD CONSTRAINT [FK_dim_dr_dr_frtime_hq_sk_dim_t] FOREIGN KEY([dr_frtime_hq_sk])
REFERENCES [dbo].[dim_t] ([t_sk])
NOT FOR REPLICATION
GO

ALTER TABLE [dbo].[dim_dr] NOCHECK CONSTRAINT [FK_dim_dr_dr_frtime_hq_sk_dim_t]
GO

ALTER TABLE [dbo].[dim_dr] WITH NOCHECK ADD CONSTRAINT [FK_dim_dr_dr_frtime_local_sk_dim_t] FOREIGN KEY([dr_frtime_local_sk])
REFERENCES [dbo].[dim_t] ([t_sk])
NOT FOR REPLICATION
GO

ALTER TABLE [dbo].[dim_dr] NOCHECK CONSTRAINT [FK_dim_dr_dr_frtime_local_sk_dim_t]
GO

ALTER TABLE [dbo].[dim_dr] WITH NOCHECK ADD CONSTRAINT [FK_dim_dr_dr_frtime_utc_sk_dim_t] FOREIGN KEY([dr_frtime_utc_sk])
REFERENCES [dbo].[dim_t] ([t_sk])
NOT FOR REPLICATION
GO

ALTER TABLE [dbo].[dim_dr] NOCHECK CONSTRAINT [FK_dim_dr_dr_frtime_utc_sk_dim_t]
GO

ALTER TABLE [dbo].[dim_dr] WITH NOCHECK ADD CONSTRAINT [FK_dim_dr_dr_tdate_hq_sk_dim_d] FOREIGN KEY([dr_tdate_hq_sk])
REFERENCES [dbo].[dim_d] ([d_sk])
NOT FOR REPLICATION
GO

ALTER TABLE [dbo].[dim_dr] NOCHECK CONSTRAINT [FK_dim_dr_dr_tdate_hq_sk_dim_d]
GO

ALTER TABLE [dbo].[dim_dr] WITH NOCHECK ADD CONSTRAINT [FK_dim_dr_dr_tdate_local_sk_dim_d] FOREIGN KEY([dr_tdate_local_sk])
REFERENCES [dbo].[dim_d] ([d_sk])
NOT FOR REPLICATION
GO

ALTER TABLE [dbo].[dim_dr] NOCHECK CONSTRAINT [FK_dim_dr_dr_tdate_local_sk_dim_d]
GO

ALTER TABLE [dbo].[dim_dr] WITH NOCHECK ADD CONSTRAINT [FK_dim_dr_dr_tdate_utc_sk_dim_d] FOREIGN KEY([dr_tdate_utc_sk])
REFERENCES [dbo].[dim_d] ([d_sk])
NOT FOR REPLICATION
GO

ALTER TABLE [dbo].[dim_dr] NOCHECK CONSTRAINT [FK_dim_dr_dr_tdate_utc_sk_dim_d]
GO

ALTER TABLE [dbo].[dim_dr] WITH NOCHECK ADD CONSTRAINT [FK_dim_dr_dr_ttime_hq_sk_dim_t] FOREIGN KEY([dr_ttime_hq_sk])
REFERENCES [dbo].[dim_t] ([t_sk])
NOT FOR REPLICATION
GO

ALTER TABLE [dbo].[dim_dr] NOCHECK CONSTRAINT [FK_dim_dr_dr_ttime_hq_sk_dim_t]
GO

ALTER TABLE [dbo].[dim_dr] WITH NOCHECK ADD CONSTRAINT [FK_dim_dr_dr_ttime_local_sk_dim_t] FOREIGN KEY([dr_ttime_local_sk])
REFERENCES [dbo].[dim_t] ([t_sk])
NOT FOR REPLICATION
GO

ALTER TABLE [dbo].[dim_dr] NOCHECK CONSTRAINT [FK_dim_dr_dr_ttime_local_sk_dim_t]
GO

ALTER TABLE [dbo].[dim_dr] WITH NOCHECK ADD CONSTRAINT [FK_dim_dr_dr_ttime_utc_sk_dim_t] FOREIGN KEY([dr_ttime_utc_sk])
REFERENCES [dbo].[dim_t] ([t_sk])
NOT FOR REPLICATION
GO

ALTER TABLE [dbo].[dim_dr] NOCHECK CONSTRAINT [FK_dim_dr_dr_ttime_utc_sk_dim_t]
GO

ALTER TABLE [dbo].[dim_dr] WITH NOCHECK ADD CONSTRAINT [FK_dim_dr_s_sk_dim_s] FOREIGN KEY([s_sk])
REFERENCES [dbo].[dim_s] ([s_sk])
NOT FOR REPLICATION
GO

ALTER TABLE [dbo].[dim_dr] NOCHECK CONSTRAINT [FK_dim_dr_s_sk_dim_s]
GO



CREATE NONCLUSTERED INDEX [IX_dim_dr_dr_nk] ON [dbo].[dim_dr]
(
[dr_nk] ASC
)WITH (PAD_INDEX = OFF, STATISTICS_NORECOMPUTE = OFF, SORT_IN_TEMPDB = OFF, DROP_EXISTING = OFF, ONLINE = OFF, ALLOW_ROW_LOCKS = ON, ALLOW_PAGE_LOCKS = ON) ON [DIM_DR]
GO

CREATE NONCLUSTERED INDEX [IX_dim_dr_s_nk] ON [dbo].[dim_dr]
(
[s_nk] ASC,
[dr_frdatetime_hq] ASC,
[dr_tdatetime_hq] ASC
)WITH (PAD_INDEX = OFF, STATISTICS_NORECOMPUTE = OFF, SORT_IN_TEMPDB = OFF, DROP_EXISTING = OFF, ONLINE = OFF, ALLOW_ROW_LOCKS = ON, ALLOW_PAGE_LOCKS = ON) ON [DIM_DR]
GO

CREATE NONCLUSTERED INDEX [IX_dim_dr_s_sk] ON [dbo].[dim_dr]
(
[s_sk] ASC,
[dr_frdatetime_hq] ASC,
[dr_tdatetime_hq] ASC
)WITH (PAD_INDEX = OFF, STATISTICS_NORECOMPUTE = OFF, SORT_IN_TEMPDB = OFF, DROP_EXISTING = OFF, ONLINE = OFF, ALLOW_ROW_LOCKS = ON, ALLOW_PAGE_LOCKS = ON) ON [DIM_DR]
GO


Post #1569342
Posted Friday, May 9, 2014 10:24 AM


SSC-Dedicated

SSC-DedicatedSSC-DedicatedSSC-DedicatedSSC-DedicatedSSC-DedicatedSSC-DedicatedSSC-DedicatedSSC-DedicatedSSC-DedicatedSSC-DedicatedSSC-DedicatedSSC-DedicatedSSC-Dedicated

Group: General Forum Members
Last Login: Today @ 10:23 PM
Points: 35,550, Visits: 32,147
Except for the dr_sk column, I don't see any other unique indexes on this table and that's a good thing for paritioning-sake.

What's the nature of the dr_frdatetime_hq column?
1. Is it assigned a value as part of the insert of the row? If not, is there such a date column in the table (we need one to partition)?
2. Is it (that column) ever updated?

The reason I ask is because (especially because of the indexes on this table) it looks like it might make a decent column to base monthly partitions on.

Also, (and I have to double-check when I get home) the existing constraints with the NO CHECK hint might be a problem for paritioning. IIRC, no constraints in a table can have the NO CHECK option if you want fully aligned partitions that can someday take advantage of SWITCH (out or in) but that might only apply to the parititioning column. Like I said, I have to double-echeck on that.


--Jeff Moden
"RBAR is pronounced "ree-bar" and is a "Modenism" for "Row-By-Agonizing-Row".

First step towards the paradigm shift of writing Set Based code:
Stop thinking about what you want to do to a row... think, instead, of what you want to do to a column."

(play on words) "Just because you CAN do something in T-SQL, doesn't mean you SHOULDN'T." --22 Aug 2013

Helpful Links:
How to post code problems
How to post performance problems
Post #1569372
Posted Monday, May 12, 2014 5:39 AM
SSC-Enthusiastic

SSC-EnthusiasticSSC-EnthusiasticSSC-EnthusiasticSSC-EnthusiasticSSC-EnthusiasticSSC-EnthusiasticSSC-EnthusiasticSSC-Enthusiastic

Group: General Forum Members
Last Login: 2 days ago @ 7:05 AM
Points: 129, Visits: 435
Some of the int fields are loaded as YYYYMMDD so these may be good partitioning columns also but I see what you say about the indexes (looking at the dmv's it seems one index is not being used)

I am approaching the partitioning mainly from a maintenance perspective, but will also be looking at queries that may hit the table.

I'm trying to find out if there are any big pitfalls to partitioning a dimension, from a performance, load, querying perspective etc or what are the reasons to avoid doing it?
Post #1569783
Posted Monday, May 12, 2014 7:50 AM


SSC-Dedicated

SSC-DedicatedSSC-DedicatedSSC-DedicatedSSC-DedicatedSSC-DedicatedSSC-DedicatedSSC-DedicatedSSC-DedicatedSSC-DedicatedSSC-DedicatedSSC-DedicatedSSC-DedicatedSSC-Dedicated

Group: General Forum Members
Last Login: Today @ 10:23 PM
Points: 35,550, Visits: 32,147
If the rows eventually become static, there are some huge pitfalls with individual FileGroups having mounds of wasted space.... and individual FileGroups by month is definitely the way to go for maintenance especially where backups and piece-meal restores are concerned.

I've solved those pitfalls but it's not easy because of the problems associated with extra space generated during index rebuilds and the fact that DBCC SHRINKFILE hammers indexes for fragmentation.


--Jeff Moden
"RBAR is pronounced "ree-bar" and is a "Modenism" for "Row-By-Agonizing-Row".

First step towards the paradigm shift of writing Set Based code:
Stop thinking about what you want to do to a row... think, instead, of what you want to do to a column."

(play on words) "Just because you CAN do something in T-SQL, doesn't mean you SHOULDN'T." --22 Aug 2013

Helpful Links:
How to post code problems
How to post performance problems
Post #1569851
« Prev Topic | Next Topic »

Add to briefcase

Permissions Expand / Collapse