Click here to monitor SSC
SQLServerCentral is supported by Redgate
 
Log in  ::  Register  ::  Not logged in
 
 
 


partitioning a large dimension?


partitioning a large dimension?

Author
Message
GavD
GavD
SSC-Enthusiastic
SSC-Enthusiastic (130 reputation)SSC-Enthusiastic (130 reputation)SSC-Enthusiastic (130 reputation)SSC-Enthusiastic (130 reputation)SSC-Enthusiastic (130 reputation)SSC-Enthusiastic (130 reputation)SSC-Enthusiastic (130 reputation)SSC-Enthusiastic (130 reputation)

Group: General Forum Members
Points: 130 Visits: 557
The SQL CAT team's recommendation is to avoid partitioning dimension tables: http://blogs.msdn.com/b/sqlcat/archive/2013/09/16/top-10-best-practices-for-building-a-large-scale-relational-data-warehous.aspx

I have inherited a dimension table that has almost 3 billion rows and is 1TB and been asked to look at partitioning and putting maintenance in place, etc. (first step is to compress)

I'm not a DW expert so was wondering what are the reasons to not partition dimensions? I can find articles that reference the above article but not much that explains why.

Thanks in advance.
Jeff Moden
Jeff Moden
SSC-Forever
SSC-Forever (45K reputation)SSC-Forever (45K reputation)SSC-Forever (45K reputation)SSC-Forever (45K reputation)SSC-Forever (45K reputation)SSC-Forever (45K reputation)SSC-Forever (45K reputation)SSC-Forever (45K reputation)

Group: General Forum Members
Points: 45119 Visits: 39918
GavD (5/9/2014)
The SQL CAT team's recommendation is to avoid partitioning dimension tables: http://blogs.msdn.com/b/sqlcat/archive/2013/09/16/top-10-best-practices-for-building-a-large-scale-relational-data-warehous.aspx

I have inherited a dimension table that has almost 3 billion rows and is 1TB and been asked to look at partitioning and putting maintenance in place, etc. (first step is to compress)

I'm not a DW expert so was wondering what are the reasons to not partition dimensions? I can find articles that reference the above article but not much that explains why.

Thanks in advance.


Are you sure that you're not mixing things up a bit? Dimension tables should be relatively small when compared to Fact tables and, as a result, shouldn't need to be partitioned.

--Jeff Moden

RBAR is pronounced ree-bar and is a Modenism for Row-By-Agonizing-Row.
First step towards the paradigm shift of writing Set Based code:
Stop thinking about what you want to do to a row... think, instead, of what you want to do to a column.
Although they tell us that they want it real bad, our primary goal is to ensure that we dont actually give it to them that way.
Although change is inevitable, change for the better is not.
Just because you can do something in PowerShell, doesnt mean you should. Wink

Helpful Links:
How to post code problems
How to post performance problems
Forum FAQs
GavD
GavD
SSC-Enthusiastic
SSC-Enthusiastic (130 reputation)SSC-Enthusiastic (130 reputation)SSC-Enthusiastic (130 reputation)SSC-Enthusiastic (130 reputation)SSC-Enthusiastic (130 reputation)SSC-Enthusiastic (130 reputation)SSC-Enthusiastic (130 reputation)SSC-Enthusiastic (130 reputation)

Group: General Forum Members
Points: 130 Visits: 557
That was my understanding but this is a dimension. The large fact tables are all partitioned, they are a similar size to this one.
Jeff Moden
Jeff Moden
SSC-Forever
SSC-Forever (45K reputation)SSC-Forever (45K reputation)SSC-Forever (45K reputation)SSC-Forever (45K reputation)SSC-Forever (45K reputation)SSC-Forever (45K reputation)SSC-Forever (45K reputation)SSC-Forever (45K reputation)

Group: General Forum Members
Points: 45119 Visits: 39918
Can you post the CREATE TABLE statement (including indexes, keys, FKs)? That might help a bit here.

Just so you know, partitioning a table that has other tables pointing to it via FK can be a real pain because, in order for a column to be unique on such a table, either the partitioning column must be added to the otherwise unique column (which would break it as an FK candidate) or the UNIQUE index on the column can't be "aligned" which would kill the ability to do SWITCHes to move legacy data out of the table quickly.

--Jeff Moden

RBAR is pronounced ree-bar and is a Modenism for Row-By-Agonizing-Row.
First step towards the paradigm shift of writing Set Based code:
Stop thinking about what you want to do to a row... think, instead, of what you want to do to a column.
Although they tell us that they want it real bad, our primary goal is to ensure that we dont actually give it to them that way.
Although change is inevitable, change for the better is not.
Just because you can do something in PowerShell, doesnt mean you should. Wink

Helpful Links:
How to post code problems
How to post performance problems
Forum FAQs
GavD
GavD
SSC-Enthusiastic
SSC-Enthusiastic (130 reputation)SSC-Enthusiastic (130 reputation)SSC-Enthusiastic (130 reputation)SSC-Enthusiastic (130 reputation)SSC-Enthusiastic (130 reputation)SSC-Enthusiastic (130 reputation)SSC-Enthusiastic (130 reputation)SSC-Enthusiastic (130 reputation)

Group: General Forum Members
Points: 130 Visits: 557
CREATE TABLE [dbo].[dim_dr](
[dr_sk] [bigint] IDENTITY(1,1) NOT NULL,
[dr_nk] [varchar](30) NOT NULL,
[s_sk] [bigint] NULL,
[s_nk] [varchar](30) NULL,
[dr_frdate_hq_sk] [int] NULL,
[dr_frtime_hq_sk] [int] NULL,
[dr_tdate_hq_sk] [int] NULL,
[dr_ttime_hq_sk] [int] NULL,
[dr_frdate_local_sk] [int] NULL,
[dr_frtime_local_sk] [int] NULL,
[dr_tdate_local_sk] [int] NULL,
[dr_ttime_local_sk] [int] NULL,
[dr_frdate_utc_sk] [int] NULL,
[dr_frtime_utc_sk] [int] NULL,
[dr_tdate_utc_sk] [int] NULL,
[dr_ttime_utc_sk] [int] NULL,
[p_dr_id] [int] NULL,
[p_s_id] [int] NULL,
[p_ita_dr_id] [int] NULL,
[p_ita_s_id] [int] NULL,
[e_idfodr] [numeric](28, 12) NULL,
[e_idfos] [numeric](28, 12) NULL,
[c_datetime_hq] [datetime] NULL,
[c_datetime_local] [datetime] NULL,
[c_datetime_utc] [datetime] NULL,
[dr_decimal] [decimal](28, 12) NULL,
[dr_numerator] [int] NULL,
[dr_denominator] [numeric](9, 0) NULL,
[dr_fraction] [varchar](20) NULL,
[ibr_dr_flag] [bit] NULL,
[dr_frdatetime_hq] [datetime] NULL,
[dr_frdatetime_local] [datetime] NULL,
[dr_frdatetime_utc] [datetime] NULL,
[dr_tdatetime_hq] [datetime] NULL,
[dr_tdatetime_local] [datetime] NULL,
[dr_tdatetime_utc] [datetime] NULL,
[current_dr_flag] [bit] NULL,
[inferred_m_flag] [bit] NULL,
[inserted_exec_log_key] [bigint] NULL,
[updated_exec_log_key] [bigint] NULL,
CONSTRAINT [PK_dim_dr] PRIMARY KEY CLUSTERED
(
[dr_sk] ASC
)WITH (PAD_INDEX = OFF, STATISTICS_NORECOMPUTE = OFF, IGNORE_DUP_KEY = OFF, ALLOW_ROW_LOCKS = ON, ALLOW_PAGE_LOCKS = ON) ON [DIM_dr]
) ON [DIM_dr]

GO

SET ANSI_PADDING ON
GO

ALTER TABLE [dbo].[dim_dr] WITH NOCHECK ADD CONSTRAINT [FK_dim_dr_dr_frdate_hq_sk_dim_d] FOREIGN KEY([dr_frdate_hq_sk])
REFERENCES [dbo].[dim_d] ([d_sk])
NOT FOR REPLICATION
GO

ALTER TABLE [dbo].[dim_dr] NOCHECK CONSTRAINT [FK_dim_dr_dr_frdate_hq_sk_dim_d]
GO

ALTER TABLE [dbo].[dim_dr] WITH NOCHECK ADD CONSTRAINT [FK_dim_dr_dr_frdate_local_sk_dim_d] FOREIGN KEY([dr_frdate_local_sk])
REFERENCES [dbo].[dim_d] ([d_sk])
NOT FOR REPLICATION
GO

ALTER TABLE [dbo].[dim_dr] NOCHECK CONSTRAINT [FK_dim_dr_dr_frdate_local_sk_dim_d]
GO

ALTER TABLE [dbo].[dim_dr] WITH NOCHECK ADD CONSTRAINT [FK_dim_dr_dr_frdate_utc_sk_dim_d] FOREIGN KEY([dr_frdate_utc_sk])
REFERENCES [dbo].[dim_d] ([d_sk])
NOT FOR REPLICATION
GO

ALTER TABLE [dbo].[dim_dr] NOCHECK CONSTRAINT [FK_dim_dr_dr_frdate_utc_sk_dim_d]
GO

ALTER TABLE [dbo].[dim_dr] WITH NOCHECK ADD CONSTRAINT [FK_dim_dr_dr_frtime_hq_sk_dim_t] FOREIGN KEY([dr_frtime_hq_sk])
REFERENCES [dbo].[dim_t] ([t_sk])
NOT FOR REPLICATION
GO

ALTER TABLE [dbo].[dim_dr] NOCHECK CONSTRAINT [FK_dim_dr_dr_frtime_hq_sk_dim_t]
GO

ALTER TABLE [dbo].[dim_dr] WITH NOCHECK ADD CONSTRAINT [FK_dim_dr_dr_frtime_local_sk_dim_t] FOREIGN KEY([dr_frtime_local_sk])
REFERENCES [dbo].[dim_t] ([t_sk])
NOT FOR REPLICATION
GO

ALTER TABLE [dbo].[dim_dr] NOCHECK CONSTRAINT [FK_dim_dr_dr_frtime_local_sk_dim_t]
GO

ALTER TABLE [dbo].[dim_dr] WITH NOCHECK ADD CONSTRAINT [FK_dim_dr_dr_frtime_utc_sk_dim_t] FOREIGN KEY([dr_frtime_utc_sk])
REFERENCES [dbo].[dim_t] ([t_sk])
NOT FOR REPLICATION
GO

ALTER TABLE [dbo].[dim_dr] NOCHECK CONSTRAINT [FK_dim_dr_dr_frtime_utc_sk_dim_t]
GO

ALTER TABLE [dbo].[dim_dr] WITH NOCHECK ADD CONSTRAINT [FK_dim_dr_dr_tdate_hq_sk_dim_d] FOREIGN KEY([dr_tdate_hq_sk])
REFERENCES [dbo].[dim_d] ([d_sk])
NOT FOR REPLICATION
GO

ALTER TABLE [dbo].[dim_dr] NOCHECK CONSTRAINT [FK_dim_dr_dr_tdate_hq_sk_dim_d]
GO

ALTER TABLE [dbo].[dim_dr] WITH NOCHECK ADD CONSTRAINT [FK_dim_dr_dr_tdate_local_sk_dim_d] FOREIGN KEY([dr_tdate_local_sk])
REFERENCES [dbo].[dim_d] ([d_sk])
NOT FOR REPLICATION
GO

ALTER TABLE [dbo].[dim_dr] NOCHECK CONSTRAINT [FK_dim_dr_dr_tdate_local_sk_dim_d]
GO

ALTER TABLE [dbo].[dim_dr] WITH NOCHECK ADD CONSTRAINT [FK_dim_dr_dr_tdate_utc_sk_dim_d] FOREIGN KEY([dr_tdate_utc_sk])
REFERENCES [dbo].[dim_d] ([d_sk])
NOT FOR REPLICATION
GO

ALTER TABLE [dbo].[dim_dr] NOCHECK CONSTRAINT [FK_dim_dr_dr_tdate_utc_sk_dim_d]
GO

ALTER TABLE [dbo].[dim_dr] WITH NOCHECK ADD CONSTRAINT [FK_dim_dr_dr_ttime_hq_sk_dim_t] FOREIGN KEY([dr_ttime_hq_sk])
REFERENCES [dbo].[dim_t] ([t_sk])
NOT FOR REPLICATION
GO

ALTER TABLE [dbo].[dim_dr] NOCHECK CONSTRAINT [FK_dim_dr_dr_ttime_hq_sk_dim_t]
GO

ALTER TABLE [dbo].[dim_dr] WITH NOCHECK ADD CONSTRAINT [FK_dim_dr_dr_ttime_local_sk_dim_t] FOREIGN KEY([dr_ttime_local_sk])
REFERENCES [dbo].[dim_t] ([t_sk])
NOT FOR REPLICATION
GO

ALTER TABLE [dbo].[dim_dr] NOCHECK CONSTRAINT [FK_dim_dr_dr_ttime_local_sk_dim_t]
GO

ALTER TABLE [dbo].[dim_dr] WITH NOCHECK ADD CONSTRAINT [FK_dim_dr_dr_ttime_utc_sk_dim_t] FOREIGN KEY([dr_ttime_utc_sk])
REFERENCES [dbo].[dim_t] ([t_sk])
NOT FOR REPLICATION
GO

ALTER TABLE [dbo].[dim_dr] NOCHECK CONSTRAINT [FK_dim_dr_dr_ttime_utc_sk_dim_t]
GO

ALTER TABLE [dbo].[dim_dr] WITH NOCHECK ADD CONSTRAINT [FK_dim_dr_s_sk_dim_s] FOREIGN KEY([s_sk])
REFERENCES [dbo].[dim_s] ([s_sk])
NOT FOR REPLICATION
GO

ALTER TABLE [dbo].[dim_dr] NOCHECK CONSTRAINT [FK_dim_dr_s_sk_dim_s]
GO



CREATE NONCLUSTERED INDEX [IX_dim_dr_dr_nk] ON [dbo].[dim_dr]
(
[dr_nk] ASC
)WITH (PAD_INDEX = OFF, STATISTICS_NORECOMPUTE = OFF, SORT_IN_TEMPDB = OFF, DROP_EXISTING = OFF, ONLINE = OFF, ALLOW_ROW_LOCKS = ON, ALLOW_PAGE_LOCKS = ON) ON [DIM_DR]
GO

CREATE NONCLUSTERED INDEX [IX_dim_dr_s_nk] ON [dbo].[dim_dr]
(
[s_nk] ASC,
[dr_frdatetime_hq] ASC,
[dr_tdatetime_hq] ASC
)WITH (PAD_INDEX = OFF, STATISTICS_NORECOMPUTE = OFF, SORT_IN_TEMPDB = OFF, DROP_EXISTING = OFF, ONLINE = OFF, ALLOW_ROW_LOCKS = ON, ALLOW_PAGE_LOCKS = ON) ON [DIM_DR]
GO

CREATE NONCLUSTERED INDEX [IX_dim_dr_s_sk] ON [dbo].[dim_dr]
(
[s_sk] ASC,
[dr_frdatetime_hq] ASC,
[dr_tdatetime_hq] ASC
)WITH (PAD_INDEX = OFF, STATISTICS_NORECOMPUTE = OFF, SORT_IN_TEMPDB = OFF, DROP_EXISTING = OFF, ONLINE = OFF, ALLOW_ROW_LOCKS = ON, ALLOW_PAGE_LOCKS = ON) ON [DIM_DR]
GO
Jeff Moden
Jeff Moden
SSC-Forever
SSC-Forever (45K reputation)SSC-Forever (45K reputation)SSC-Forever (45K reputation)SSC-Forever (45K reputation)SSC-Forever (45K reputation)SSC-Forever (45K reputation)SSC-Forever (45K reputation)SSC-Forever (45K reputation)

Group: General Forum Members
Points: 45119 Visits: 39918
Except for the dr_sk column, I don't see any other unique indexes on this table and that's a good thing for paritioning-sake.

What's the nature of the dr_frdatetime_hq column?
1. Is it assigned a value as part of the insert of the row? If not, is there such a date column in the table (we need one to partition)?
2. Is it (that column) ever updated?

The reason I ask is because (especially because of the indexes on this table) it looks like it might make a decent column to base monthly partitions on.

Also, (and I have to double-check when I get home) the existing constraints with the NO CHECK hint might be a problem for paritioning. IIRC, no constraints in a table can have the NO CHECK option if you want fully aligned partitions that can someday take advantage of SWITCH (out or in) but that might only apply to the parititioning column. Like I said, I have to double-echeck on that.

--Jeff Moden

RBAR is pronounced ree-bar and is a Modenism for Row-By-Agonizing-Row.
First step towards the paradigm shift of writing Set Based code:
Stop thinking about what you want to do to a row... think, instead, of what you want to do to a column.
Although they tell us that they want it real bad, our primary goal is to ensure that we dont actually give it to them that way.
Although change is inevitable, change for the better is not.
Just because you can do something in PowerShell, doesnt mean you should. Wink

Helpful Links:
How to post code problems
How to post performance problems
Forum FAQs
GavD
GavD
SSC-Enthusiastic
SSC-Enthusiastic (130 reputation)SSC-Enthusiastic (130 reputation)SSC-Enthusiastic (130 reputation)SSC-Enthusiastic (130 reputation)SSC-Enthusiastic (130 reputation)SSC-Enthusiastic (130 reputation)SSC-Enthusiastic (130 reputation)SSC-Enthusiastic (130 reputation)

Group: General Forum Members
Points: 130 Visits: 557
Some of the int fields are loaded as YYYYMMDD so these may be good partitioning columns also but I see what you say about the indexes (looking at the dmv's it seems one index is not being used)

I am approaching the partitioning mainly from a maintenance perspective, but will also be looking at queries that may hit the table.

I'm trying to find out if there are any big pitfalls to partitioning a dimension, from a performance, load, querying perspective etc or what are the reasons to avoid doing it?
Jeff Moden
Jeff Moden
SSC-Forever
SSC-Forever (45K reputation)SSC-Forever (45K reputation)SSC-Forever (45K reputation)SSC-Forever (45K reputation)SSC-Forever (45K reputation)SSC-Forever (45K reputation)SSC-Forever (45K reputation)SSC-Forever (45K reputation)

Group: General Forum Members
Points: 45119 Visits: 39918
If the rows eventually become static, there are some huge pitfalls with individual FileGroups having mounds of wasted space.... and individual FileGroups by month is definitely the way to go for maintenance especially where backups and piece-meal restores are concerned.

I've solved those pitfalls but it's not easy because of the problems associated with extra space generated during index rebuilds and the fact that DBCC SHRINKFILE hammers indexes for fragmentation.

--Jeff Moden

RBAR is pronounced ree-bar and is a Modenism for Row-By-Agonizing-Row.
First step towards the paradigm shift of writing Set Based code:
Stop thinking about what you want to do to a row... think, instead, of what you want to do to a column.
Although they tell us that they want it real bad, our primary goal is to ensure that we dont actually give it to them that way.
Although change is inevitable, change for the better is not.
Just because you can do something in PowerShell, doesnt mean you should. Wink

Helpful Links:
How to post code problems
How to post performance problems
Forum FAQs
Go


Permissions

You can't post new topics.
You can't post topic replies.
You can't post new polls.
You can't post replies to polls.
You can't edit your own topics.
You can't delete your own topics.
You can't edit other topics.
You can't delete other topics.
You can't edit your own posts.
You can't edit other posts.
You can't delete your own posts.
You can't delete other posts.
You can't post events.
You can't edit your own events.
You can't edit other events.
You can't delete your own events.
You can't delete other events.
You can't send private messages.
You can't send emails.
You can read topics.
You can't vote in polls.
You can't upload attachments.
You can download attachments.
You can't post HTML code.
You can't edit HTML code.
You can't post IFCode.
You can't post JavaScript.
You can post emoticons.
You can't post or upload images.

Select a forum

































































































































































SQLServerCentral


Search