Click here to monitor SSC
SQLServerCentral is supported by Red Gate Software Ltd.
 
Log in  ::  Register  ::  Not logged in
 
 
 
        
Home       Members    Calendar    Who's On


Add to briefcase 123»»»

Best way to implement partitioning Expand / Collapse
Author
Message
Posted Friday, July 06, 2012 2:27 PM
Forum Newbie

Forum NewbieForum NewbieForum NewbieForum NewbieForum NewbieForum NewbieForum NewbieForum Newbie

Group: General Forum Members
Last Login: Thursday, August 29, 2013 2:59 PM
Points: 6, Visits: 251
I have my own idea on how to handle this, but want to se if there is a different perspective on this. The problem I have with this is I have the following tables (sample tables)

-------------------------------------------------------------------------------------------------------------------------
CREATE TABLE Account
(
AccountId int identity(1,1),
CreationDate datetime,
CONSTRAINT [PK_Account] PRIMARY KEY CLUSTERED
(
[AccountId] ASC
)WITH (FILLFACTOR = 80) ON [PRIMARY]
) ON [PRIMARY]

CREATE TABLE AccountSales
(
AccountSalesId int identity(1,1),
AccountId int
CONSTRAINT [PK_AccountSales] PRIMARY KEY CLUSTERED
(
[AccountSalesId] ASC
)WITH (FILLFACTOR = 80) ON [PRIMARY]
) ON [PRIMARY]

ALTER TABLE [dbo].[AccountSales] WITH CHECK ADD CONSTRAINT [FK_AccountSales_Account] FOREIGN KEY([AccountId])
REFERENCES [dbo].[Account] ([AccountID])
GO

ALTER TABLE [dbo].[AccountSales] CHECK CONSTRAINT [FK_AccountSales_Account]
GO
------------------------------------------------------------------------------------------------------------------------------

Let's say the Account table is 500GB and the AccountSales is 1TB. I would like to partition these tables for obvious reasons. The way the current logic works is that data is dealt with on a day by day basis, so it makes sense to partition on the date field for the Account table, however the AccountSales table has no date field. Also the data in both tables is no sequential when it comes to the id and date as data is loaded from different servers at different times. So like accounts may have the following data

Id     Date             Server loaded from (not a column - just for display purposes)
--------------------------
1 1/1/2000 00:00 1
2 1/1/2000 01:00 1
3 1/1/2000 02:00 1
4 1/1/2000 00:00 2
5 1/1/2000 01:00 2
6 1/1/2000 0:300 1

I was thinking about for the Accounts table just making the clustered index on CreationDate and setting the AccountId as a PK with Unique NC index. I would then partition on the date.

I'm not 100% sure of how to handle the AccountSales table though - since if I did it by id the dates would not match up correctly if joined with the Account table, however I don't see a way to fix this.

Any ideas? What is the best way to handle this? If any more information is needed please let me know. Thanks if advance!
Post #1326319
Posted Monday, July 09, 2012 12:10 PM
Forum Newbie

Forum NewbieForum NewbieForum NewbieForum NewbieForum NewbieForum NewbieForum NewbieForum Newbie

Group: General Forum Members
Last Login: Thursday, August 29, 2013 2:59 PM
Points: 6, Visits: 251
bump
Post #1327033
Posted Monday, July 09, 2012 1:20 PM


SSCrazy

SSCrazySSCrazySSCrazySSCrazySSCrazySSCrazySSCrazySSCrazy

Group: General Forum Members
Last Login: Friday, April 11, 2014 7:37 AM
Points: 2,673, Visits: 3,325
Add a created date column to the accountsales table.

Thanks,

Jared
SQL Know-It-All

How to post data/code on a forum to get the best help - Jeff Moden
Post #1327092
Posted Monday, July 09, 2012 1:23 PM
SSC Veteran

SSC VeteranSSC VeteranSSC VeteranSSC VeteranSSC VeteranSSC VeteranSSC VeteranSSC Veteran

Group: General Forum Members
Last Login: Sunday, June 23, 2013 10:39 PM
Points: 231, Visits: 506
Just a small thing.Why do you have a fill factor of 80 for increasing identity field?Shouldnt it be 100%?

GulliMeel

Finding top n Worst Performing queries
Improve the performance of Merge Join(special case)
How to Post Performance Problem -Gail Shaw
Post #1327094
Posted Monday, July 09, 2012 1:29 PM
SSC Veteran

SSC VeteranSSC VeteranSSC VeteranSSC VeteranSSC VeteranSSC VeteranSSC VeteranSSC Veteran

Group: General Forum Members
Last Login: Sunday, June 23, 2013 10:39 PM
Points: 231, Visits: 506
Will you work on the date alone or you will have lots of queries using account as well?

I would suggest If you are going to query the full day's data the use the date column in accountsales table as suggested in earlier post in account sales table.

if you are going to query say like hourly or like that then you can use a derived column like diff in minutes since say Jan 01.01.1900 or some other date and put that columns in both the tables.Then use that column for join along with the accountid.if you need to do partitioning then you can have on that column as well.


GulliMeel

Finding top n Worst Performing queries
Improve the performance of Merge Join(special case)
How to Post Performance Problem -Gail Shaw
Post #1327097
Posted Monday, July 09, 2012 1:33 PM


SSCrazy

SSCrazySSCrazySSCrazySSCrazySSCrazySSCrazySSCrazySSCrazy

Group: General Forum Members
Last Login: Friday, April 11, 2014 7:37 AM
Points: 2,673, Visits: 3,325
Gullimeel (7/9/2012)
I would suggest a derived column like diff in minutes since say Jan 01.01.1900 and put that columns in both the tables.Then use that column. for join along with the accountid.
Umm... Why? What relevance does the diff in minutes from 1900.01.01 have? To me, partitioning on date makes sense because the value is relevant and you can make a filegroup read only. Difference in minutes to an arbitrary date makes absolutely no sense to me.


Thanks,

Jared
SQL Know-It-All

How to post data/code on a forum to get the best help - Jeff Moden
Post #1327100
Posted Monday, July 09, 2012 1:37 PM


SSCrazy

SSCrazySSCrazySSCrazySSCrazySSCrazySSCrazySSCrazySSCrazy

Group: General Forum Members
Last Login: Friday, April 11, 2014 7:37 AM
Points: 2,673, Visits: 3,325
To the OP, what difference does it make if the ids in the accountsales table don't match up with dates in the accounts table? They will never match up, presumably. However, it seems counter-intuitive to me to not have a date in the AccountSales table (This assumes that the Account table holds a customer's account info and gives them an id when they are first created and the AccountSales table attributes a sale to that Account, don't you need the date of the sale?).

Thanks,

Jared
SQL Know-It-All

How to post data/code on a forum to get the best help - Jeff Moden
Post #1327105
Posted Monday, July 09, 2012 1:41 PM
SSC Veteran

SSC VeteranSSC VeteranSSC VeteranSSC VeteranSSC VeteranSSC VeteranSSC VeteranSSC Veteran

Group: General Forum Members
Last Login: Sunday, June 23, 2013 10:39 PM
Points: 231, Visits: 506


Gullimeel (7/9/2012)
--------------------------------------------------------------------------------
I would suggest a derived column like diff in minutes since say Jan 01.01.1900 and put that columns in both the tables.Then use that column. for join along with the accountid.
Umm... Why? What relevance does the diff in minutes from 1900.01.01 have? To me, partitioning on date makes sense because the value is relevant and you can make a filegroup read only. Difference in minutes to an arbitrary date makes absolutely no sense to me.

Thanks,



This depends on what kind of query you would like to go against these tables. if he is going to write all the queries like date >=01/01/2012 and dat < 01/02/2012. Then just having a date column(just 3 bytes) makes much more sense.

But if he is quering say based on hours or minutes the adding that kind of column(int) makes much more sense. But it all depends on query...


GulliMeel

Finding top n Worst Performing queries
Improve the performance of Merge Join(special case)
How to Post Performance Problem -Gail Shaw
Post #1327107
Posted Monday, July 09, 2012 1:46 PM


SSCrazy

SSCrazySSCrazySSCrazySSCrazySSCrazySSCrazySSCrazySSCrazy

Group: General Forum Members
Last Login: Friday, April 11, 2014 7:37 AM
Points: 2,673, Visits: 3,325
Gullimeel (7/9/2012)


Gullimeel (7/9/2012)
--------------------------------------------------------------------------------
I would suggest a derived column like diff in minutes since say Jan 01.01.1900 and put that columns in both the tables.Then use that column. for join along with the accountid.
Umm... Why? What relevance does the diff in minutes from 1900.01.01 have? To me, partitioning on date makes sense because the value is relevant and you can make a filegroup read only. Difference in minutes to an arbitrary date makes absolutely no sense to me.

Thanks,



This depends on what kind of query you would like to go against these tables. if he is going to write all the queries like date >=01/01/2012 and dat < 01/02/2012. Then just having a date column(just 3 bytes) makes much more sense.

But if he is quering say based on hours or minutes the adding that kind of column(int) makes much more sense. But it all depends on query...
I completely disagree. If I know I want rows where the createdDate was between '2012-05-01 13:32:00.000' AND '2012-05-01 13:35:00.000' why on earth would I want to convert those to integers first? I don't believe there is any evidence that it would be faster to scan or seek based on an integer or bigint versus a datetime. Logically, it makes no sense to me and theoretically it makes no sense to me. Based on the way that SQL Server stores this information, I would prefer not to do the calculation first.


Thanks,

Jared
SQL Know-It-All

How to post data/code on a forum to get the best help - Jeff Moden
Post #1327109
Posted Monday, July 09, 2012 1:51 PM
SSC Veteran

SSC VeteranSSC VeteranSSC VeteranSSC VeteranSSC VeteranSSC VeteranSSC VeteranSSC Veteran

Group: General Forum Members
Last Login: Sunday, June 23, 2013 10:39 PM
Points: 231, Visits: 506
I completely disagree. If I know I want rows where the createdDate was between '2012-05-01 13:32:00.000' AND '2012-05-01 13:35:00.000' why on earth would I want to convert those to integers first? I don't believe there is any evidence that it would be faster to scan or seek based on an integer or bigint versus a datetime. Logically, it makes no sense to me and theoretically it makes no sense to me. Based on the way that SQL Server stores this information, I would prefer not to do the calculation first.


I am not trying to say that int is faster than date or like that. It is simply to save the space both on table and any index. This doesnt matter on small table but when you are talking about the 500GB table then 4 extra bytes in a table also makes diff to the size of the table as well as index..

There will be just two calculations based on start and end..


GulliMeel

Finding top n Worst Performing queries
Improve the performance of Merge Join(special case)
How to Post Performance Problem -Gail Shaw
Post #1327113
« Prev Topic | Next Topic »

Add to briefcase 123»»»

Permissions Expand / Collapse