SQL Clone
SQLServerCentral is supported by Redgate
 
Log in  ::  Register  ::  Not logged in
 
 
 


Best way to implement partitioning


Best way to implement partitioning

Author
Message
OmegaZero
OmegaZero
Forum Newbie
Forum Newbie (6 reputation)Forum Newbie (6 reputation)Forum Newbie (6 reputation)Forum Newbie (6 reputation)Forum Newbie (6 reputation)Forum Newbie (6 reputation)Forum Newbie (6 reputation)Forum Newbie (6 reputation)

Group: General Forum Members
Points: 6 Visits: 258
I have my own idea on how to handle this, but want to se if there is a different perspective on this. The problem I have with this is I have the following tables (sample tables)


-------------------------------------------------------------------------------------------------------------------------
CREATE TABLE Account
(
AccountId int identity(1,1),
CreationDate datetime,
CONSTRAINT [PK_Account] PRIMARY KEY CLUSTERED
(
[AccountId] ASC
)WITH (FILLFACTOR = 80) ON [PRIMARY]
) ON [PRIMARY]

CREATE TABLE AccountSales
(
AccountSalesId int identity(1,1),
AccountId int
CONSTRAINT [PK_AccountSales] PRIMARY KEY CLUSTERED
(
[AccountSalesId] ASC
)WITH (FILLFACTOR = 80) ON [PRIMARY]
) ON [PRIMARY]

ALTER TABLE [dbo].[AccountSales] WITH CHECK ADD CONSTRAINT [FK_AccountSales_Account] FOREIGN KEY([AccountId])
REFERENCES [dbo].[Account] ([AccountID])
GO

ALTER TABLE [dbo].[AccountSales] CHECK CONSTRAINT [FK_AccountSales_Account]
GO
------------------------------------------------------------------------------------------------------------------------------



Let's say the Account table is 500GB and the AccountSales is 1TB. I would like to partition these tables for obvious reasons. The way the current logic works is that data is dealt with on a day by day basis, so it makes sense to partition on the date field for the Account table, however the AccountSales table has no date field. Also the data in both tables is no sequential when it comes to the id and date as data is loaded from different servers at different times. So like accounts may have the following data


Id Date Server loaded from (not a column - just for display purposes)
--------------------------
1 1/1/2000 00:00 1
2 1/1/2000 01:00 1
3 1/1/2000 02:00 1
4 1/1/2000 00:00 2
5 1/1/2000 01:00 2
6 1/1/2000 0:300 1



I was thinking about for the Accounts table just making the clustered index on CreationDate and setting the AccountId as a PK with Unique NC index. I would then partition on the date.

I'm not 100% sure of how to handle the AccountSales table though - since if I did it by id the dates would not match up correctly if joined with the Account table, however I don't see a way to fix this.

Any ideas? What is the best way to handle this? If any more information is needed please let me know. Thanks if advance!
OmegaZero
OmegaZero
Forum Newbie
Forum Newbie (6 reputation)Forum Newbie (6 reputation)Forum Newbie (6 reputation)Forum Newbie (6 reputation)Forum Newbie (6 reputation)Forum Newbie (6 reputation)Forum Newbie (6 reputation)Forum Newbie (6 reputation)

Group: General Forum Members
Points: 6 Visits: 258
bump
Jared Karney
Jared Karney
SSCertifiable
SSCertifiable (5.2K reputation)SSCertifiable (5.2K reputation)SSCertifiable (5.2K reputation)SSCertifiable (5.2K reputation)SSCertifiable (5.2K reputation)SSCertifiable (5.2K reputation)SSCertifiable (5.2K reputation)SSCertifiable (5.2K reputation)

Group: General Forum Members
Points: 5164 Visits: 3694
Add a created date column to the accountsales table.

Thanks,
Jared
PFE - Microsoft
SQL Know-It-All
How to post data/code on a forum to get the best help - Jeff Moden
Gullimeel
Gullimeel
Old Hand
Old Hand (327 reputation)Old Hand (327 reputation)Old Hand (327 reputation)Old Hand (327 reputation)Old Hand (327 reputation)Old Hand (327 reputation)Old Hand (327 reputation)Old Hand (327 reputation)

Group: General Forum Members
Points: 327 Visits: 506
Just a small thing.Why do you have a fill factor of 80 for increasing identity field?Shouldnt it be 100%?

GulliMeel

Finding top n Worst Performing queries
Improve the performance of Merge Join(special case)
How to Post Performance Problem -Gail Shaw
Gullimeel
Gullimeel
Old Hand
Old Hand (327 reputation)Old Hand (327 reputation)Old Hand (327 reputation)Old Hand (327 reputation)Old Hand (327 reputation)Old Hand (327 reputation)Old Hand (327 reputation)Old Hand (327 reputation)

Group: General Forum Members
Points: 327 Visits: 506
Will you work on the date alone or you will have lots of queries using account as well?

I would suggest If you are going to query the full day's data the use the date column in accountsales table as suggested in earlier post in account sales table.

if you are going to query say like hourly or like that then you can use a derived column like diff in minutes since say Jan 01.01.1900 or some other date and put that columns in both the tables.Then use that column for join along with the accountid.if you need to do partitioning then you can have on that column as well.

GulliMeel

Finding top n Worst Performing queries
Improve the performance of Merge Join(special case)
How to Post Performance Problem -Gail Shaw
Jared Karney
Jared Karney
SSCertifiable
SSCertifiable (5.2K reputation)SSCertifiable (5.2K reputation)SSCertifiable (5.2K reputation)SSCertifiable (5.2K reputation)SSCertifiable (5.2K reputation)SSCertifiable (5.2K reputation)SSCertifiable (5.2K reputation)SSCertifiable (5.2K reputation)

Group: General Forum Members
Points: 5164 Visits: 3694
Gullimeel (7/9/2012)
I would suggest a derived column like diff in minutes since say Jan 01.01.1900 and put that columns in both the tables.Then use that column. for join along with the accountid.
Umm... Why? What relevance does the diff in minutes from 1900.01.01 have? To me, partitioning on date makes sense because the value is relevant and you can make a filegroup read only. Difference in minutes to an arbitrary date makes absolutely no sense to me.

Thanks,
Jared
PFE - Microsoft
SQL Know-It-All
How to post data/code on a forum to get the best help - Jeff Moden
Jared Karney
Jared Karney
SSCertifiable
SSCertifiable (5.2K reputation)SSCertifiable (5.2K reputation)SSCertifiable (5.2K reputation)SSCertifiable (5.2K reputation)SSCertifiable (5.2K reputation)SSCertifiable (5.2K reputation)SSCertifiable (5.2K reputation)SSCertifiable (5.2K reputation)

Group: General Forum Members
Points: 5164 Visits: 3694
To the OP, what difference does it make if the ids in the accountsales table don't match up with dates in the accounts table? They will never match up, presumably. However, it seems counter-intuitive to me to not have a date in the AccountSales table (This assumes that the Account table holds a customer's account info and gives them an id when they are first created and the AccountSales table attributes a sale to that Account, don't you need the date of the sale?).

Thanks,
Jared
PFE - Microsoft
SQL Know-It-All
How to post data/code on a forum to get the best help - Jeff Moden
Gullimeel
Gullimeel
Old Hand
Old Hand (327 reputation)Old Hand (327 reputation)Old Hand (327 reputation)Old Hand (327 reputation)Old Hand (327 reputation)Old Hand (327 reputation)Old Hand (327 reputation)Old Hand (327 reputation)

Group: General Forum Members
Points: 327 Visits: 506


Gullimeel (7/9/2012)
--------------------------------------------------------------------------------
I would suggest a derived column like diff in minutes since say Jan 01.01.1900 and put that columns in both the tables.Then use that column. for join along with the accountid.
Umm... Why? What relevance does the diff in minutes from 1900.01.01 have? To me, partitioning on date makes sense because the value is relevant and you can make a filegroup read only. Difference in minutes to an arbitrary date makes absolutely no sense to me.

Thanks,



This depends on what kind of query you would like to go against these tables. if he is going to write all the queries like date >=01/01/2012 and dat < 01/02/2012. Then just having a date column(just 3 bytes) makes much more sense.

But if he is quering say based on hours or minutes the adding that kind of column(int) makes much more sense. But it all depends on query...

GulliMeel

Finding top n Worst Performing queries
Improve the performance of Merge Join(special case)
How to Post Performance Problem -Gail Shaw
Jared Karney
Jared Karney
SSCertifiable
SSCertifiable (5.2K reputation)SSCertifiable (5.2K reputation)SSCertifiable (5.2K reputation)SSCertifiable (5.2K reputation)SSCertifiable (5.2K reputation)SSCertifiable (5.2K reputation)SSCertifiable (5.2K reputation)SSCertifiable (5.2K reputation)

Group: General Forum Members
Points: 5164 Visits: 3694
Gullimeel (7/9/2012)


Gullimeel (7/9/2012)
--------------------------------------------------------------------------------
I would suggest a derived column like diff in minutes since say Jan 01.01.1900 and put that columns in both the tables.Then use that column. for join along with the accountid.
Umm... Why? What relevance does the diff in minutes from 1900.01.01 have? To me, partitioning on date makes sense because the value is relevant and you can make a filegroup read only. Difference in minutes to an arbitrary date makes absolutely no sense to me.

Thanks,



This depends on what kind of query you would like to go against these tables. if he is going to write all the queries like date >=01/01/2012 and dat < 01/02/2012. Then just having a date column(just 3 bytes) makes much more sense.

But if he is quering say based on hours or minutes the adding that kind of column(int) makes much more sense. But it all depends on query...
I completely disagree. If I know I want rows where the createdDate was between '2012-05-01 13:32:00.000' AND '2012-05-01 13:35:00.000' why on earth would I want to convert those to integers first? I don't believe there is any evidence that it would be faster to scan or seek based on an integer or bigint versus a datetime. Logically, it makes no sense to me and theoretically it makes no sense to me. Based on the way that SQL Server stores this information, I would prefer not to do the calculation first.

Thanks,
Jared
PFE - Microsoft
SQL Know-It-All
How to post data/code on a forum to get the best help - Jeff Moden
Gullimeel
Gullimeel
Old Hand
Old Hand (327 reputation)Old Hand (327 reputation)Old Hand (327 reputation)Old Hand (327 reputation)Old Hand (327 reputation)Old Hand (327 reputation)Old Hand (327 reputation)Old Hand (327 reputation)

Group: General Forum Members
Points: 327 Visits: 506
I completely disagree. If I know I want rows where the createdDate was between '2012-05-01 13:32:00.000' AND '2012-05-01 13:35:00.000' why on earth would I want to convert those to integers first? I don't believe there is any evidence that it would be faster to scan or seek based on an integer or bigint versus a datetime. Logically, it makes no sense to me and theoretically it makes no sense to me. Based on the way that SQL Server stores this information, I would prefer not to do the calculation first.


I am not trying to say that int is faster than date or like that. It is simply to save the space both on table and any index. This doesnt matter on small table but when you are talking about the 500GB table then 4 extra bytes in a table also makes diff to the size of the table as well as index..

There will be just two calculations based on start and end..

GulliMeel

Finding top n Worst Performing queries
Improve the performance of Merge Join(special case)
How to Post Performance Problem -Gail Shaw
Go


Permissions

You can't post new topics.
You can't post topic replies.
You can't post new polls.
You can't post replies to polls.
You can't edit your own topics.
You can't delete your own topics.
You can't edit other topics.
You can't delete other topics.
You can't edit your own posts.
You can't edit other posts.
You can't delete your own posts.
You can't delete other posts.
You can't post events.
You can't edit your own events.
You can't edit other events.
You can't delete your own events.
You can't delete other events.
You can't send private messages.
You can't send emails.
You can read topics.
You can't vote in polls.
You can't upload attachments.
You can download attachments.
You can't post HTML code.
You can't edit HTML code.
You can't post IFCode.
You can't post JavaScript.
You can post emoticons.
You can't post or upload images.

Select a forum

































































































































































SQLServerCentral


Search