partitioning a large dimension?

  • The SQL CAT team's recommendation is to avoid partitioning dimension tables: http://blogs.msdn.com/b/sqlcat/archive/2013/09/16/top-10-best-practices-for-building-a-large-scale-relational-data-warehous.aspx

    I have inherited a dimension table that has almost 3 billion rows and is 1TB and been asked to look at partitioning and putting maintenance in place, etc. (first step is to compress)

    I'm not a DW expert so was wondering what are the reasons to not partition dimensions? I can find articles that reference the above article but not much that explains why.

    Thanks in advance.

  • GavD (5/9/2014)


    The SQL CAT team's recommendation is to [font="Arial Black"]avoid partitioning dimension tables[/font]: http://blogs.msdn.com/b/sqlcat/archive/2013/09/16/top-10-best-practices-for-building-a-large-scale-relational-data-warehous.aspx

    I have inherited a dimension table that has almost 3 billion rows and is 1TB and been asked to look at partitioning and putting maintenance in place, etc. (first step is to compress)

    I'm not a DW expert so was wondering what are the reasons to not partition dimensions? I can find articles that reference the above article but not much that explains why.

    Thanks in advance.

    Are you sure that you're not mixing things up a bit? Dimension tables should be relatively small when compared to Fact tables and, as a result, shouldn't need to be partitioned.

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

    Change is inevitable... Change for the better is not.


    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)

  • That was my understanding but this is a dimension. The large fact tables are all partitioned, they are a similar size to this one.

  • Can you post the CREATE TABLE statement (including indexes, keys, FKs)? That might help a bit here.

    Just so you know, partitioning a table that has other tables pointing to it via FK can be a real pain because, in order for a column to be unique on such a table, either the partitioning column must be added to the otherwise unique column (which would break it as an FK candidate) or the UNIQUE index on the column can't be "aligned" which would kill the ability to do SWITCHes to move legacy data out of the table quickly.

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

    Change is inevitable... Change for the better is not.


    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)

  • CREATE TABLE [dbo].[dim_dr](

    [dr_sk] [bigint] IDENTITY(1,1) NOT NULL,

    [dr_nk] [varchar](30) NOT NULL,

    [s_sk] [bigint] NULL,

    [s_nk] [varchar](30) NULL,

    [dr_frdate_hq_sk] [int] NULL,

    [dr_frtime_hq_sk] [int] NULL,

    [dr_tdate_hq_sk] [int] NULL,

    [dr_ttime_hq_sk] [int] NULL,

    [dr_frdate_local_sk] [int] NULL,

    [dr_frtime_local_sk] [int] NULL,

    [dr_tdate_local_sk] [int] NULL,

    [dr_ttime_local_sk] [int] NULL,

    [dr_frdate_utc_sk] [int] NULL,

    [dr_frtime_utc_sk] [int] NULL,

    [dr_tdate_utc_sk] [int] NULL,

    [dr_ttime_utc_sk] [int] NULL,

    [p_dr_id] [int] NULL,

    [p_s_id] [int] NULL,

    [p_ita_dr_id] [int] NULL,

    [p_ita_s_id] [int] NULL,

    [e_idfodr] [numeric](28, 12) NULL,

    [e_idfos] [numeric](28, 12) NULL,

    [c_datetime_hq] [datetime] NULL,

    [c_datetime_local] [datetime] NULL,

    [c_datetime_utc] [datetime] NULL,

    [dr_decimal] [decimal](28, 12) NULL,

    [dr_numerator] [int] NULL,

    [dr_denominator] [numeric](9, 0) NULL,

    [dr_fraction] [varchar](20) NULL,

    [ibr_dr_flag] [bit] NULL,

    [dr_frdatetime_hq] [datetime] NULL,

    [dr_frdatetime_local] [datetime] NULL,

    [dr_frdatetime_utc] [datetime] NULL,

    [dr_tdatetime_hq] [datetime] NULL,

    [dr_tdatetime_local] [datetime] NULL,

    [dr_tdatetime_utc] [datetime] NULL,

    [current_dr_flag] [bit] NULL,

    [inferred_m_flag] [bit] NULL,

    [inserted_exec_log_key] [bigint] NULL,

    [updated_exec_log_key] [bigint] NULL,

    CONSTRAINT [PK_dim_dr] PRIMARY KEY CLUSTERED

    (

    [dr_sk] ASC

    )WITH (PAD_INDEX = OFF, STATISTICS_NORECOMPUTE = OFF, IGNORE_DUP_KEY = OFF, ALLOW_ROW_LOCKS = ON, ALLOW_PAGE_LOCKS = ON) ON [DIM_dr]

    ) ON [DIM_dr]

    GO

    SET ANSI_PADDING ON

    GO

    ALTER TABLE [dbo].[dim_dr] WITH NOCHECK ADD CONSTRAINT [FK_dim_dr_dr_frdate_hq_sk_dim_d] FOREIGN KEY([dr_frdate_hq_sk])

    REFERENCES [dbo].[dim_d] ([d_sk])

    NOT FOR REPLICATION

    GO

    ALTER TABLE [dbo].[dim_dr] NOCHECK CONSTRAINT [FK_dim_dr_dr_frdate_hq_sk_dim_d]

    GO

    ALTER TABLE [dbo].[dim_dr] WITH NOCHECK ADD CONSTRAINT [FK_dim_dr_dr_frdate_local_sk_dim_d] FOREIGN KEY([dr_frdate_local_sk])

    REFERENCES [dbo].[dim_d] ([d_sk])

    NOT FOR REPLICATION

    GO

    ALTER TABLE [dbo].[dim_dr] NOCHECK CONSTRAINT [FK_dim_dr_dr_frdate_local_sk_dim_d]

    GO

    ALTER TABLE [dbo].[dim_dr] WITH NOCHECK ADD CONSTRAINT [FK_dim_dr_dr_frdate_utc_sk_dim_d] FOREIGN KEY([dr_frdate_utc_sk])

    REFERENCES [dbo].[dim_d] ([d_sk])

    NOT FOR REPLICATION

    GO

    ALTER TABLE [dbo].[dim_dr] NOCHECK CONSTRAINT [FK_dim_dr_dr_frdate_utc_sk_dim_d]

    GO

    ALTER TABLE [dbo].[dim_dr] WITH NOCHECK ADD CONSTRAINT [FK_dim_dr_dr_frtime_hq_sk_dim_t] FOREIGN KEY([dr_frtime_hq_sk])

    REFERENCES [dbo].[dim_t] ([t_sk])

    NOT FOR REPLICATION

    GO

    ALTER TABLE [dbo].[dim_dr] NOCHECK CONSTRAINT [FK_dim_dr_dr_frtime_hq_sk_dim_t]

    GO

    ALTER TABLE [dbo].[dim_dr] WITH NOCHECK ADD CONSTRAINT [FK_dim_dr_dr_frtime_local_sk_dim_t] FOREIGN KEY([dr_frtime_local_sk])

    REFERENCES [dbo].[dim_t] ([t_sk])

    NOT FOR REPLICATION

    GO

    ALTER TABLE [dbo].[dim_dr] NOCHECK CONSTRAINT [FK_dim_dr_dr_frtime_local_sk_dim_t]

    GO

    ALTER TABLE [dbo].[dim_dr] WITH NOCHECK ADD CONSTRAINT [FK_dim_dr_dr_frtime_utc_sk_dim_t] FOREIGN KEY([dr_frtime_utc_sk])

    REFERENCES [dbo].[dim_t] ([t_sk])

    NOT FOR REPLICATION

    GO

    ALTER TABLE [dbo].[dim_dr] NOCHECK CONSTRAINT [FK_dim_dr_dr_frtime_utc_sk_dim_t]

    GO

    ALTER TABLE [dbo].[dim_dr] WITH NOCHECK ADD CONSTRAINT [FK_dim_dr_dr_tdate_hq_sk_dim_d] FOREIGN KEY([dr_tdate_hq_sk])

    REFERENCES [dbo].[dim_d] ([d_sk])

    NOT FOR REPLICATION

    GO

    ALTER TABLE [dbo].[dim_dr] NOCHECK CONSTRAINT [FK_dim_dr_dr_tdate_hq_sk_dim_d]

    GO

    ALTER TABLE [dbo].[dim_dr] WITH NOCHECK ADD CONSTRAINT [FK_dim_dr_dr_tdate_local_sk_dim_d] FOREIGN KEY([dr_tdate_local_sk])

    REFERENCES [dbo].[dim_d] ([d_sk])

    NOT FOR REPLICATION

    GO

    ALTER TABLE [dbo].[dim_dr] NOCHECK CONSTRAINT [FK_dim_dr_dr_tdate_local_sk_dim_d]

    GO

    ALTER TABLE [dbo].[dim_dr] WITH NOCHECK ADD CONSTRAINT [FK_dim_dr_dr_tdate_utc_sk_dim_d] FOREIGN KEY([dr_tdate_utc_sk])

    REFERENCES [dbo].[dim_d] ([d_sk])

    NOT FOR REPLICATION

    GO

    ALTER TABLE [dbo].[dim_dr] NOCHECK CONSTRAINT [FK_dim_dr_dr_tdate_utc_sk_dim_d]

    GO

    ALTER TABLE [dbo].[dim_dr] WITH NOCHECK ADD CONSTRAINT [FK_dim_dr_dr_ttime_hq_sk_dim_t] FOREIGN KEY([dr_ttime_hq_sk])

    REFERENCES [dbo].[dim_t] ([t_sk])

    NOT FOR REPLICATION

    GO

    ALTER TABLE [dbo].[dim_dr] NOCHECK CONSTRAINT [FK_dim_dr_dr_ttime_hq_sk_dim_t]

    GO

    ALTER TABLE [dbo].[dim_dr] WITH NOCHECK ADD CONSTRAINT [FK_dim_dr_dr_ttime_local_sk_dim_t] FOREIGN KEY([dr_ttime_local_sk])

    REFERENCES [dbo].[dim_t] ([t_sk])

    NOT FOR REPLICATION

    GO

    ALTER TABLE [dbo].[dim_dr] NOCHECK CONSTRAINT [FK_dim_dr_dr_ttime_local_sk_dim_t]

    GO

    ALTER TABLE [dbo].[dim_dr] WITH NOCHECK ADD CONSTRAINT [FK_dim_dr_dr_ttime_utc_sk_dim_t] FOREIGN KEY([dr_ttime_utc_sk])

    REFERENCES [dbo].[dim_t] ([t_sk])

    NOT FOR REPLICATION

    GO

    ALTER TABLE [dbo].[dim_dr] NOCHECK CONSTRAINT [FK_dim_dr_dr_ttime_utc_sk_dim_t]

    GO

    ALTER TABLE [dbo].[dim_dr] WITH NOCHECK ADD CONSTRAINT [FK_dim_dr_s_sk_dim_s] FOREIGN KEY([s_sk])

    REFERENCES [dbo].[dim_s] ([s_sk])

    NOT FOR REPLICATION

    GO

    ALTER TABLE [dbo].[dim_dr] NOCHECK CONSTRAINT [FK_dim_dr_s_sk_dim_s]

    GO

    CREATE NONCLUSTERED INDEX [IX_dim_dr_dr_nk] ON [dbo].[dim_dr]

    (

    [dr_nk] ASC

    )WITH (PAD_INDEX = OFF, STATISTICS_NORECOMPUTE = OFF, SORT_IN_TEMPDB = OFF, DROP_EXISTING = OFF, ONLINE = OFF, ALLOW_ROW_LOCKS = ON, ALLOW_PAGE_LOCKS = ON) ON [DIM_DR]

    GO

    CREATE NONCLUSTERED INDEX [IX_dim_dr_s_nk] ON [dbo].[dim_dr]

    (

    [s_nk] ASC,

    [dr_frdatetime_hq] ASC,

    [dr_tdatetime_hq] ASC

    )WITH (PAD_INDEX = OFF, STATISTICS_NORECOMPUTE = OFF, SORT_IN_TEMPDB = OFF, DROP_EXISTING = OFF, ONLINE = OFF, ALLOW_ROW_LOCKS = ON, ALLOW_PAGE_LOCKS = ON) ON [DIM_DR]

    GO

    CREATE NONCLUSTERED INDEX [IX_dim_dr_s_sk] ON [dbo].[dim_dr]

    (

    [s_sk] ASC,

    [dr_frdatetime_hq] ASC,

    [dr_tdatetime_hq] ASC

    )WITH (PAD_INDEX = OFF, STATISTICS_NORECOMPUTE = OFF, SORT_IN_TEMPDB = OFF, DROP_EXISTING = OFF, ONLINE = OFF, ALLOW_ROW_LOCKS = ON, ALLOW_PAGE_LOCKS = ON) ON [DIM_DR]

    GO

  • Except for the dr_sk column, I don't see any other unique indexes on this table and that's a good thing for paritioning-sake.

    What's the nature of the dr_frdatetime_hq column?

    1. Is it assigned a value as part of the insert of the row? If not, is there such a date column in the table (we need one to partition)?

    2. Is it (that column) ever updated?

    The reason I ask is because (especially because of the indexes on this table) it looks like it might make a decent column to base monthly partitions on.

    Also, (and I have to double-check when I get home) the existing constraints with the NO CHECK hint might be a problem for paritioning. IIRC, no constraints in a table can have the NO CHECK option if you want fully aligned partitions that can someday take advantage of SWITCH (out or in) but that might only apply to the parititioning column. Like I said, I have to double-echeck on that.

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

    Change is inevitable... Change for the better is not.


    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)

  • Some of the int fields are loaded as YYYYMMDD so these may be good partitioning columns also but I see what you say about the indexes (looking at the dmv's it seems one index is not being used)

    I am approaching the partitioning mainly from a maintenance perspective, but will also be looking at queries that may hit the table.

    I'm trying to find out if there are any big pitfalls to partitioning a dimension, from a performance, load, querying perspective etc or what are the reasons to avoid doing it?

  • If the rows eventually become static, there are some huge pitfalls with individual FileGroups having mounds of wasted space.... and individual FileGroups by month is definitely the way to go for maintenance especially where backups and piece-meal restores are concerned.

    I've solved those pitfalls but it's not easy because of the problems associated with extra space generated during index rebuilds and the fact that DBCC SHRINKFILE hammers indexes for fragmentation.

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

    Change is inevitable... Change for the better is not.


    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)

Viewing 8 posts - 1 through 7 (of 7 total)

You must be logged in to reply to this topic. Login to reply