Click here to monitor SSC
SQLServerCentral is supported by Redgate
 
Log in  ::  Register  ::  Not logged in
 
 
 


Field Sizes in Staging database : all varchar(2000)?


Field Sizes in Staging database : all varchar(2000)?

Author
Message
devereauxj
devereauxj
SSC Veteran
SSC Veteran (249 reputation)SSC Veteran (249 reputation)SSC Veteran (249 reputation)SSC Veteran (249 reputation)SSC Veteran (249 reputation)SSC Veteran (249 reputation)SSC Veteran (249 reputation)SSC Veteran (249 reputation)

Group: General Forum Members
Points: 249 Visits: 140
NEWBIE to DW : the IT Manager wants to use varchar2(2000) for all fields in all tables.

I thought we should be more realistic. Sure, use varchar2, but if it is a state, use 2 or even 4.

Any norms is this area? Anything "bad" about using varchar2(2000) for everything?

Manager's idea is to make sure we capture the data in staging exactly like it is in the flat file.

Thanks!
Joe



waqaszubairy
waqaszubairy
Forum Newbie
Forum Newbie (1 reputation)Forum Newbie (1 reputation)Forum Newbie (1 reputation)Forum Newbie (1 reputation)Forum Newbie (1 reputation)Forum Newbie (1 reputation)Forum Newbie (1 reputation)Forum Newbie (1 reputation)

Group: General Forum Members
Points: 1 Visits: 59
I think it depends on number of factors like disk space and processing time of ETL, you also might face problems in converting those values to their original shape in future, converting all data in varchar(2000) is not a good idea indeed
Orlando Colamatteo
Orlando Colamatteo
SSCrazy Eights
SSCrazy Eights (8.3K reputation)SSCrazy Eights (8.3K reputation)SSCrazy Eights (8.3K reputation)SSCrazy Eights (8.3K reputation)SSCrazy Eights (8.3K reputation)SSCrazy Eights (8.3K reputation)SSCrazy Eights (8.3K reputation)SSCrazy Eights (8.3K reputation)

Group: General Forum Members
Points: 8265 Visits: 14368
varchar2 ? Is this an Oracle question? This is a Microsoft SQL Server Forum.

In general it is proper practice to choose the appropriate data type from the outset. It is fundamental to the success of a database in terms of performance, longevity and feeds into maintenance costs as well.

I am not sure about Oracle internals but in SQL Server there is a penalty for using a wider column than necessary in terms of memory allocations for space where data will never reside because the column is always far wider than the maximum expected data lengths.

Again, in general, anyone proposing to use a wider datatype than is necessary "just in case we receive wider data later" should not be influencing data modeling decisions.

__________________________________________________________________________________________________
There are no special teachers of virtue, because virtue is taught by the whole community. --Plato
PaulB-TheOneAndOnly
PaulB-TheOneAndOnly
Hall of Fame
Hall of Fame (3.1K reputation)Hall of Fame (3.1K reputation)Hall of Fame (3.1K reputation)Hall of Fame (3.1K reputation)Hall of Fame (3.1K reputation)Hall of Fame (3.1K reputation)Hall of Fame (3.1K reputation)Hall of Fame (3.1K reputation)

Group: General Forum Members
Points: 3095 Visits: 4639
devereauxj (4/13/2012)
NEWBIE to DW : the IT Manager wants to use varchar2(2000) for all fields in all tables.


In general DWH Staging tables are modeled after the OLTP tables from where they are going to source the data.

Is it wrong to use varchar2(2000) for all non-numeric columns on staging tables? well, it is not elegant but it will work. You may want to let IT Manager know - very politely - that max size for varchar2() moved from 2,000 to 4,000 around Ora8i.

_____________________________________
Pablo (Paul) Berzukov

Author of Understanding Database Administration available at Amazon and other bookstores.

Disclaimer: Advice is provided to the best of my knowledge but no implicit or explicit warranties are provided. Since the advisor explicitly encourages testing any and all suggestions on a test non-production environment advisor should not held liable or responsible for any actions taken based on the given advice.
Steve Jones
Steve Jones
SSC-Dedicated
SSC-Dedicated (36K reputation)SSC-Dedicated (36K reputation)SSC-Dedicated (36K reputation)SSC-Dedicated (36K reputation)SSC-Dedicated (36K reputation)SSC-Dedicated (36K reputation)SSC-Dedicated (36K reputation)SSC-Dedicated (36K reputation)

Group: Administrators
Points: 36316 Visits: 18752
Here's how I see it.

If you use varchar(2k) in all fields, it's usually easier to import the data, especially if you have quality issues. Once it's in there, you have to rely on your conversion to get it set for the final tables, but you do have it in a database, and you can manipulate it there. I like this if I have flat files, web services, etc, where I might have some flaky connection or a loss of the file after some time.

If I use proper fields, then I need a solid import process that can handle problem data and clean it before it's staged. The movement to the OLTP database (or warehouse) is then easier.

Which is better? depends on where you want to spend time on the process.

I like importing into generic tables, then moving and cleaning to a 2nd staging table with valid datatypes (if I have space) and then moving with some MERGE process to the final tables.

Follow me on Twitter: @way0utwest
Forum Etiquette: How to post data/code on a forum to get the best help
My Blog: www.voiceofthedba.com
Lokesh Vij
Lokesh Vij
SSCommitted
SSCommitted (1.6K reputation)SSCommitted (1.6K reputation)SSCommitted (1.6K reputation)SSCommitted (1.6K reputation)SSCommitted (1.6K reputation)SSCommitted (1.6K reputation)SSCommitted (1.6K reputation)SSCommitted (1.6K reputation)

Group: General Forum Members
Points: 1572 Visits: 1599
Short and simple answer:

If your flat file is fixed width (delimited) use CHAR data type else use VARCHAR

~ Lokesh Vij

Guidelines for quicker answers on T-SQL question
Guidelines for answers on Performance questions

Link to my Blog Post --> www.SQLPathy.com

Follow me @Twitter


Orlando Colamatteo
Orlando Colamatteo
SSCrazy Eights
SSCrazy Eights (8.3K reputation)SSCrazy Eights (8.3K reputation)SSCrazy Eights (8.3K reputation)SSCrazy Eights (8.3K reputation)SSCrazy Eights (8.3K reputation)SSCrazy Eights (8.3K reputation)SSCrazy Eights (8.3K reputation)SSCrazy Eights (8.3K reputation)

Group: General Forum Members
Points: 8265 Visits: 14368
lokeshvij (7/19/2012)
Short and simple answer:

If your flat file is fixed width (delimited) use CHAR data type else use VARCHAR

I am not sure I agree with that. What if your fixed-width data file contains lines more than 8000 bytes wide? Second, it could be considered a waste of space to have a staging table use CHAR columns when most of the data values have trailing blank spaces. I am all for using the right data type for the data when discussing destination tables but in staging tables all bets are off. I am not sure we need to match CHAR to fixed-width files. VARCHAR would be my default choice for a staging table.

__________________________________________________________________________________________________
There are no special teachers of virtue, because virtue is taught by the whole community. --Plato
PaulB-TheOneAndOnly
PaulB-TheOneAndOnly
Hall of Fame
Hall of Fame (3.1K reputation)Hall of Fame (3.1K reputation)Hall of Fame (3.1K reputation)Hall of Fame (3.1K reputation)Hall of Fame (3.1K reputation)Hall of Fame (3.1K reputation)Hall of Fame (3.1K reputation)Hall of Fame (3.1K reputation)

Group: General Forum Members
Points: 3095 Visits: 4639
devereauxj (4/13/2012)
the IT Manager wants to use varchar2(2000) for all fields in all tables...
...Manager's idea is to make sure we capture the data in staging exactly like it is in the flat file.


Ask manager if he plans to user varchar2(2000) also on the core FACT and DIM tables - if not, how is he/she planning to ensure staging varchar2(2000) fits into properly defined columns on FACT and DIM?

Again... Staging columns datatype and lenghts should be modeled after the source system and never ever larger than the definition in core FACT/DIM tables.

_____________________________________
Pablo (Paul) Berzukov

Author of Understanding Database Administration available at Amazon and other bookstores.

Disclaimer: Advice is provided to the best of my knowledge but no implicit or explicit warranties are provided. Since the advisor explicitly encourages testing any and all suggestions on a test non-production environment advisor should not held liable or responsible for any actions taken based on the given advice.
jerry-621596
jerry-621596
Old Hand
Old Hand (373 reputation)Old Hand (373 reputation)Old Hand (373 reputation)Old Hand (373 reputation)Old Hand (373 reputation)Old Hand (373 reputation)Old Hand (373 reputation)Old Hand (373 reputation)

Group: General Forum Members
Points: 373 Visits: 644
I agree with PaulB.

When a flat file is presented, the layout/structure is typically well known. That is part of the initial discovery.

If there are freeform notes or something, I can totally see a reason to make that column in the staging table larger.

In general, best practice is to start bringing the data in using a table definition as close to the final destination table as possible. There will be some tradeoffs depending upon how well you can perform an initial scrub/cleanse from the flat file. But never grab numeric or date data and place it in a varchar without good reason.

I typically use SSIS to import flat files. I always run through the flat file and define it with the proper datatypes up front. This not only makes life easier in subsequent stages, but it lets me know up front if the data structure I was given for the flat file is correct, or even if there is errant data in any of the file's fields.
Go


Permissions

You can't post new topics.
You can't post topic replies.
You can't post new polls.
You can't post replies to polls.
You can't edit your own topics.
You can't delete your own topics.
You can't edit other topics.
You can't delete other topics.
You can't edit your own posts.
You can't edit other posts.
You can't delete your own posts.
You can't delete other posts.
You can't post events.
You can't edit your own events.
You can't edit other events.
You can't delete your own events.
You can't delete other events.
You can't send private messages.
You can't send emails.
You can read topics.
You can't vote in polls.
You can't upload attachments.
You can download attachments.
You can't post HTML code.
You can't edit HTML code.
You can't post IFCode.
You can't post JavaScript.
You can post emoticons.
You can't post or upload images.

Select a forum

































































































































































SQLServerCentral


Search