How do they do that?

  • Wondering if someone could shed some light. I regularly imoprt data fom a flat file to a table. Pretty simple improt of a tab delimited file into 3 columns. One thing i noticed is in one column I get the same string but the other 2 columns are different.

    I have sen some DB's where a vednor has created a "refernece table". They have 2 columns first column is a ref number and the second is a value. when importing data into a table it looks in the reference column to see if a value already exists (same value of course) and then inserts the ref number that coresponds to the value. If the value doesnt exist it creates it in the "reference table" and then inserts the new reference number into the original table.

    Im sure this makes sense to those more seasoned SQL people. I was thinking I could use this in my DB to reduce the amout of data in my table. Can someone explin how this works or show me?

    Thanks.

  • Hi Lance,

    It sounds as you want to normalize the data while importing.

    Here is an example on how this can be done:

    -- Reference table

    DECLARE @cities TABLE (

    idintIDENTITY(1,1),

    namevarchar(50)

    )

    INSERT INTO @cities (name) VALUES ('Tokyo')

    INSERT INTO @cities (name) VALUES ('Stockholm')

    INSERT INTO @cities (name) VALUES ('Jerusalem')

    -- Temp table, containing all data from the file

    DECLARE @temp TABLE (

    cityvarchar(50),

    namevarchar(50),

    numberint

    )

    INSERT INTO @temp VALUES ('Stockholm', 'Sven', 142)

    INSERT INTO @temp VALUES ('New York', 'John', 87)

    INSERT INTO @temp VALUES ('Stockholm', 'Arne', 93)

    INSERT INTO @temp VALUES ('Berlin', 'Hans', 134)

    INSERT INTO @temp VALUES ('New York', 'Jack', 72)

    -- Data table (to import to)

    DECLARE @persons TABLE (

    idintIDENTITY(1,1),

    namevarchar(50),

    numberint,

    cityIDint

    )

    -- Add new items to reference table

    INSERT INTO @cities (name)

    SELECT DISTINCT city

    FROM @temp

    WHERE city not in (SELECT name FROM @cities)

    -- insert the data in the target table, with ref number to the reference table

    INSERT INTO @persons (name, number, cityID)

    SELECT T.name, T.number, C.id

    FROM @temp T inner join

    @cities C on T.city = C.name

    SELECT *

    FROM @persons

    Good luck!

    /Markus

  • Hi Hunterwood

    I see that Lance is importing data from a flat file - something we regularly do. Can the references be set up during the import as in a package or is it done after getting the data into the database?

    Thanks in Advance.

  • >>Can the references be set up during the import as in a package or is it done after getting the data into the database?

    Either way is possible. You can have an SSIS package do it all for you or you can import the data into a staging table and then do the processing in TSQL before the final import.

    I prefer the latter and I do tend to discourage elaborate SSIS packages. This is based on my experience and the experience of the folks I work with, we have more TSQL experience than SSIS experience. I also find troubleshooting a stored procedure easier than a package SSIS.

Viewing 4 posts - 1 through 3 (of 3 total)

You must be logged in to reply to this topic. Login to reply