Click here to monitor SSC
SQLServerCentral is supported by Red Gate Software Ltd.
 
Log in  ::  Register  ::  Not logged in
 
 
 
        
Home       Members    Calendar    Who's On


Add to briefcase

Field Sizes in Staging database : all varchar(2000)? Expand / Collapse
Author
Message
Posted Friday, April 13, 2012 5:52 AM
SSC Veteran

SSC VeteranSSC VeteranSSC VeteranSSC VeteranSSC VeteranSSC VeteranSSC VeteranSSC Veteran

Group: General Forum Members
Last Login: Wednesday, June 6, 2012 6:49 AM
Points: 249, Visits: 140
NEWBIE to DW : the IT Manager wants to use varchar2(2000) for all fields in all tables.

I thought we should be more realistic. Sure, use varchar2, but if it is a state, use 2 or even 4.

Any norms is this area? Anything "bad" about using varchar2(2000) for everything?

Manager's idea is to make sure we capture the data in staging exactly like it is in the flat file.

Thanks!
Joe



Post #1283053
Posted Friday, April 13, 2012 6:07 AM
Forum Newbie

Forum NewbieForum NewbieForum NewbieForum NewbieForum NewbieForum NewbieForum NewbieForum Newbie

Group: General Forum Members
Last Login: Friday, August 22, 2014 5:03 PM
Points: 1, Visits: 41
I think it depends on number of factors like disk space and processing time of ETL, you also might face problems in converting those values to their original shape in future, converting all data in varchar(2000) is not a good idea indeed
Post #1283066
Posted Friday, April 13, 2012 10:34 AM


SSCertifiable

SSCertifiableSSCertifiableSSCertifiableSSCertifiableSSCertifiableSSCertifiableSSCertifiableSSCertifiableSSCertifiable

Group: General Forum Members
Last Login: Thursday, December 18, 2014 8:51 PM
Points: 7,140, Visits: 12,763
varchar2 ? Is this an Oracle question? This is a Microsoft SQL Server Forum.

In general it is proper practice to choose the appropriate data type from the outset. It is fundamental to the success of a database in terms of performance, longevity and feeds into maintenance costs as well.

I am not sure about Oracle internals but in SQL Server there is a penalty for using a wider column than necessary in terms of memory allocations for space where data will never reside because the column is always far wider than the maximum expected data lengths.

Again, in general, anyone proposing to use a wider datatype than is necessary "just in case we receive wider data later" should not be influencing data modeling decisions.


__________________________________________________________________________________________________
There are no special teachers of virtue, because virtue is taught by the whole community. --Plato
Post #1283262
Posted Friday, April 13, 2012 4:30 PM


Hall of Fame

Hall of FameHall of FameHall of FameHall of FameHall of FameHall of FameHall of FameHall of FameHall of Fame

Group: General Forum Members
Last Login: Tuesday, January 28, 2014 8:15 AM
Points: 3,065, Visits: 4,639
devereauxj (4/13/2012)
NEWBIE to DW : the IT Manager wants to use varchar2(2000) for all fields in all tables.


In general DWH Staging tables are modeled after the OLTP tables from where they are going to source the data.

Is it wrong to use varchar2(2000) for all non-numeric columns on staging tables? well, it is not elegant but it will work. You may want to let IT Manager know - very politely - that max size for varchar2() moved from 2,000 to 4,000 around Ora8i.


_____________________________________
Pablo (Paul) Berzukov

Author of Understanding Database Administration available at Amazon and other bookstores.

Disclaimer: Advice is provided to the best of my knowledge but no implicit or explicit warranties are provided. Since the advisor explicitly encourages testing any and all suggestions on a test non-production environment advisor should not held liable or responsible for any actions taken based on the given advice.
Post #1283536
Posted Friday, April 13, 2012 5:48 PM


SSC-Dedicated

SSC-DedicatedSSC-DedicatedSSC-DedicatedSSC-DedicatedSSC-DedicatedSSC-DedicatedSSC-DedicatedSSC-DedicatedSSC-DedicatedSSC-DedicatedSSC-DedicatedSSC-DedicatedSSC-Dedicated

Group: Administrators
Last Login: Yesterday @ 3:11 PM
Points: 31,368, Visits: 15,837
Here's how I see it.

If you use varchar(2k) in all fields, it's usually easier to import the data, especially if you have quality issues. Once it's in there, you have to rely on your conversion to get it set for the final tables, but you do have it in a database, and you can manipulate it there. I like this if I have flat files, web services, etc, where I might have some flaky connection or a loss of the file after some time.

If I use proper fields, then I need a solid import process that can handle problem data and clean it before it's staged. The movement to the OLTP database (or warehouse) is then easier.

Which is better? depends on where you want to spend time on the process.

I like importing into generic tables, then moving and cleaning to a 2nd staging table with valid datatypes (if I have space) and then moving with some MERGE process to the final tables.







Follow me on Twitter: @way0utwest

Forum Etiquette: How to post data/code on a forum to get the best help
Post #1283547
Posted Thursday, July 19, 2012 11:21 AM


Ten Centuries

Ten CenturiesTen CenturiesTen CenturiesTen CenturiesTen CenturiesTen CenturiesTen CenturiesTen Centuries

Group: General Forum Members
Last Login: Tuesday, December 2, 2014 9:20 AM
Points: 1,372, Visits: 1,567
Short and simple answer:

If your flat file is fixed width (delimited) use CHAR data type else use VARCHAR


~ Lokesh Vij

Guidelines for quicker answers on T-SQL question
Guidelines for answers on Performance questions

Link to my Blog Post --> www.SQLPathy.com

Follow me @Twitter

Post #1332407
Posted Thursday, July 19, 2012 12:58 PM


SSCertifiable

SSCertifiableSSCertifiableSSCertifiableSSCertifiableSSCertifiableSSCertifiableSSCertifiableSSCertifiableSSCertifiable

Group: General Forum Members
Last Login: Thursday, December 18, 2014 8:51 PM
Points: 7,140, Visits: 12,763
lokeshvij (7/19/2012)
Short and simple answer:

If your flat file is fixed width (delimited) use CHAR data type else use VARCHAR

I am not sure I agree with that. What if your fixed-width data file contains lines more than 8000 bytes wide? Second, it could be considered a waste of space to have a staging table use CHAR columns when most of the data values have trailing blank spaces. I am all for using the right data type for the data when discussing destination tables but in staging tables all bets are off. I am not sure we need to match CHAR to fixed-width files. VARCHAR would be my default choice for a staging table.


__________________________________________________________________________________________________
There are no special teachers of virtue, because virtue is taught by the whole community. --Plato
Post #1332491
Posted Monday, July 23, 2012 3:45 PM


Hall of Fame

Hall of FameHall of FameHall of FameHall of FameHall of FameHall of FameHall of FameHall of FameHall of Fame

Group: General Forum Members
Last Login: Tuesday, January 28, 2014 8:15 AM
Points: 3,065, Visits: 4,639
devereauxj (4/13/2012)
the IT Manager wants to use varchar2(2000) for all fields in all tables...
...Manager's idea is to make sure we capture the data in staging exactly like it is in the flat file.


Ask manager if he plans to user varchar2(2000) also on the core FACT and DIM tables - if not, how is he/she planning to ensure staging varchar2(2000) fits into properly defined columns on FACT and DIM?

Again... Staging columns datatype and lenghts should be modeled after the source system and never ever larger than the definition in core FACT/DIM tables.


_____________________________________
Pablo (Paul) Berzukov

Author of Understanding Database Administration available at Amazon and other bookstores.

Disclaimer: Advice is provided to the best of my knowledge but no implicit or explicit warranties are provided. Since the advisor explicitly encourages testing any and all suggestions on a test non-production environment advisor should not held liable or responsible for any actions taken based on the given advice.
Post #1334109
Posted Tuesday, November 20, 2012 12:03 PM


Old Hand

Old HandOld HandOld HandOld HandOld HandOld HandOld HandOld Hand

Group: General Forum Members
Last Login: Tuesday, August 19, 2014 1:58 PM
Points: 367, Visits: 615
I agree with PaulB.

When a flat file is presented, the layout/structure is typically well known. That is part of the initial discovery.

If there are freeform notes or something, I can totally see a reason to make that column in the staging table larger.

In general, best practice is to start bringing the data in using a table definition as close to the final destination table as possible. There will be some tradeoffs depending upon how well you can perform an initial scrub/cleanse from the flat file. But never grab numeric or date data and place it in a varchar without good reason.

I typically use SSIS to import flat files. I always run through the flat file and define it with the proper datatypes up front. This not only makes life easier in subsequent stages, but it lets me know up front if the data structure I was given for the flat file is correct, or even if there is errant data in any of the file's fields.
Post #1387093
« Prev Topic | Next Topic »

Add to briefcase

Permissions Expand / Collapse