Data import

  • Greetings friends,

    I have a .dta file that's formatted in a strange way and I would like to discuss ways of importing into a staging database.

    The first field of each record consists of a record type. For each entity there will be one record of type 01. Also, for each record type 01 there will be one or more record type 02. There maybe zero or more record type 03, 04, 05 and 06

    For example

    01*9517469000*117844165*0335*The Occupier* CRICKET CLUB***

    02*1*Ground*Tea Room*35.40*9.50*336

    02*2*Ground*Changing*28.00*9.50*266

    02*3*Ground*Kitchen*16.40*9.50*156

    02*4*Ground*Scorer*3.80*9.50*36

    02*5*Ground*Mower Shed*15.00*2.38*36

    03*Cricket Square*2.0*100.00*+200

    03*Cricket Ground*2.0*850.00*+1726

    01*9517470000*200027165*0335*The Occupier* CRICKET CLUB***

    02*1*Ground*Club Room*32.60*19.00*619

    02*2*Ground*Changing Room*21.60*12.67*274

    02*3*Ground*Internal Storage*2.80*9.50*27

    02*4*Ground*Bar*13.90*19.00*264

    03*Cricket Square*1.0*100.00*+100

    03*Cricket Pitch*2.8*500.00*+1420

    01*9517471000*203509165*0335*The Occupier*CRICKET & FOOTBALL CLUB***

    02*1*Ground*Pavilion*86.20*20.00*1724

    02*2*Ground*Garage*23.40*10.00*234

    02*3*Ground*Changing Room*25.00*13.33*333

    03*Cricket Square*1.0*100.00*+100

    03*Playing Field*1.8*500.00*+910

    01*9517472000*200736165*0335*The Occupier*PT GNDOUSE****ROWOSE*

    02*1*Ground*Kitchen*37.14*16.67*619

    I would like to create 6 staging tables to hold the different types but I'm not sure if it's best to load all the data into SQL first then split it out or go down the SSIS route for this kind of thing? Obviously once the data is split there must be a unique ID which links the records together. Any suggestions?

    Thanks in advance.

    ---------------------------------------------------------

    It takes a minimal capacity for rational thought to see that the corporate 'free press' is a structurally irrational and biased, and extremely violent, system of elite propaganda.
    David Edwards - Media lens[/url]

    Society has varying and conflicting interests; what is called objectivity is the disguise of one of these interests - that of neutrality. But neutrality is a fiction in an unneutral world. There are victims, there are executioners, and there are bystanders... and the 'objectivity' of the bystander calls for inaction while other heads fall.
    Howard Zinn

  • I would like to create 6 staging tables to hold the different types but I'm not sure if it's best to load all the data into SQL first then split it out or go down the SSIS route for this kind of thing? Obviously once the data is split there must be a unique ID which links the records together. Any suggestions?

    Thanks in advance.

    Looks like there is no natural "linkage" beween ros in the data file provided. The only thing what links your records is their natural sequencial position.

    I guess in order to ensure the link/rleationship between your data rows, you need to preprocess your file and ad something which will identify the group. It needs to be done row by row, otherwise there will be no guarantee that you will link rows properly.

    Or, ask your data provider to do a bit better job than this sh*t.

    _____________________________________________
    "The only true wisdom is in knowing you know nothing"
    "O skol'ko nam otkrytiy chudnyh prevnosit microsofta duh!":-D
    (So many miracle inventions provided by MS to us...)

    How to post your question to get the best and quick help[/url]

  • I can see that records of type 1 have something looking like ID: *9517469000*

    Basically you need this one in each related row.

    It should not take a lot of time for provider to add it into their extract.

    Otherwise, create a script (vb, pearl or anything else for working with text files), which will grab this bit of data from line with type "01" and copy it into all subsequent lines of text, until next line which starts with "01"

    _____________________________________________
    "The only true wisdom is in knowing you know nothing"
    "O skol'ko nam otkrytiy chudnyh prevnosit microsofta duh!":-D
    (So many miracle inventions provided by MS to us...)

    How to post your question to get the best and quick help[/url]

  • In addition to what Eugene has already pointed out, it looks like the type 01 and type 02 rows contain different # of fields. This is going to cause you issues if you try to import with bcp or with a standard flat file source component in SSIS. This looks to me like a script source in the making. Once in the script source, you can read the rows in, determine the row type, and re-route them in the pipeline on different routes based on type. The script task will also let you tag the type 02 rows with the previous type 01 identifier to maintain the relationship.

    John Rowan

    ======================================================
    ======================================================
    Forum Etiquette: How to post data/code on a forum to get the best help[/url] - by Jeff Moden

  • Yea I've already started with the script task in SSIS route. I will post another update over in the next 24 hours.

    Thanks to you both for your contribution.

    ---------------------------------------------------------

    It takes a minimal capacity for rational thought to see that the corporate 'free press' is a structurally irrational and biased, and extremely violent, system of elite propaganda.
    David Edwards - Media lens[/url]

    Society has varying and conflicting interests; what is called objectivity is the disguise of one of these interests - that of neutrality. But neutrality is a fiction in an unneutral world. There are victims, there are executioners, and there are bystanders... and the 'objectivity' of the bystander calls for inaction while other heads fall.
    Howard Zinn

  • To keep it simple, I've used streamreader to get each line and output the records into the relevant failes of Type01 to Typ07

    public void Main()

    {

    String inputstring;

    String strUARN;

    String[] strRT1;

    strUARN = "999999";

    StreamReader sr = new StreamReader(@"C:\Users00bie\TestData.txt");

    StreamWriter sw_type01 = new StreamWriter(@"C:\Users00bie\Type01.txt");

    StreamWriter sw_type02 = new StreamWriter(@"C:\Users00bie\Type02.txt");

    StreamWriter sw_type03 = new StreamWriter(@"C:\Users00bie\Type03.txt");

    StreamWriter sw_type04 = new StreamWriter(@"C:\Users00bie\Type04.txt");

    StreamWriter sw_type05 = new StreamWriter(@"C:\Users00bie\Type05.txt");

    StreamWriter sw_type06 = new StreamWriter(@"C:\Users00bie\Type06.txt");

    StreamWriter sw_type07 = new StreamWriter(@"C:\Users00bie\Type07.txt");

    while ((inputstring = sr.ReadLine()) != null)

    {

    strRT1 = inputstring.Split('*');

    if (strRT1[0] == "01")

    {

    strUARN = strRT1[2].ToString();

    sw_type01.WriteLine(strUARN + '*' + inputstring);

    }

    if (strRT1[0] == "02")

    {

    sw_type02.WriteLine(strUARN + '*' + inputstring);

    }

    if (strRT1[0] == "03")

    {

    sw_type03.WriteLine(strUARN + '*' + inputstring);

    }

    if (strRT1[0] == "04")

    {

    sw_type04.WriteLine(strUARN + '*' + inputstring);

    }

    if (strRT1[0] == "05")

    {

    sw_type05.WriteLine(strUARN + '*' + inputstring);

    }

    if (strRT1[0] == "06")

    {

    sw_type06.WriteLine(strUARN + '*' + inputstring);

    }

    if (strRT1[0] == "07")

    {

    sw_type07.WriteLine(strUARN + '*' + inputstring);

    }

    }

    sw_type01.Close();

    sw_type02.Close();

    sw_type03.Close();

    sw_type04.Close();

    sw_type05.Close();

    sw_type06.Close();

    sw_type07.Close();

    Dts.TaskResult = (int)ScriptResults.Success;

    }

    ---------------------------------------------------------

    It takes a minimal capacity for rational thought to see that the corporate 'free press' is a structurally irrational and biased, and extremely violent, system of elite propaganda.
    David Edwards - Media lens[/url]

    Society has varying and conflicting interests; what is called objectivity is the disguise of one of these interests - that of neutrality. But neutrality is a fiction in an unneutral world. There are victims, there are executioners, and there are bystanders... and the 'objectivity' of the bystander calls for inaction while other heads fall.
    Howard Zinn

Viewing 6 posts - 1 through 5 (of 5 total)

You must be logged in to reply to this topic. Login to reply