Number of column mismatch in source file and staging table

  • Hi All,

    I am having one store procedure which use to load data from flat file to staging table dynamically.

    everything is working fine.Staging_temp table have single column.All the data stored in that single column

    below is the sample Data.

    AB¯ALBERTA ¯93¯AI

    AI¯ALBERTA INDIRECT ¯94¯AI

    AL¯ALABAMA ¯30¯

    After the Staging_temp data gets inserted into main table.

    There are 2 scenario which I want to handle

    1-suppose some flat files where number of columns are more than the actual table.

    if you see the sample rows there are 4 column separated by "¯".but actual I am having only 3 columns in my main table.

    output should be like below.

    AB¯ALBERTA ¯93

    AI¯ALBERTA INDIRECT ¯94

    AL¯ALABAMA ¯30

    I am able to achieve above scenarios using below function.

    ALTER FUNCTION [dbo].[fnGetFirstOcurrences](

    @Text nvarchar(max),

    @delimiter char(1),

    @StopAt smallint)

    ReturnS nvarchar(max)

    BEGIN

    Declare @Return nvarchar(max)

    ,@Finalint = 0

    ,@times smallint = 1

    ,@Delimiters smallint

    select @Delimiters = len(@Text) - len(replace(@Text, @Delimiter, ''));

    If (@Delimiters >= @StopAt)

    begin

    while @times <= @StopAt

    Begin

    SET @final = CHARINDEX(@delimiter,@text + @delimiter,@final+1)

    Set @times = @times + 1

    END

    Set @Final = @Final - 1

    END

    Else

    Set @Final = len(@Text)

    select @Return = left(@Text,@Final);

    Return isnull(@Return,@Text)

    END

    2-Second scenarios is what I want to handle is if some column of the table have lesscolumn than actual table in that case how to handle it

    for example below is the sample Data.

    AB¯ALBERTA

    AI¯ALBERTA INDIRECT ¯94¯AI

    AL¯ALABAMA ¯30¯

    but I am actually having 3 column in my main table.

    Please help me to achieve above scenario

    Thanks in advance.

    Regards,

    Vipin Jha

  • I've answered this question in the first post you made about this here --> http://www.sqlservercentral.com/Forums/Topic1676026-3077-1.aspx%5B/url%5D.

    So, quoting myself: -

    Cadavre (4/10/2015)


    I knocked up a bit more sample data: -

    IF OBJECT_ID('tempdb..#stagingTable') IS NOT NULL

    BEGIN

    DROP TABLE #stagingTable;

    END;

    -- 1,000,000 rows of sample data

    SELECT IDENTITY( INT,1,1 ) AS [ID],

    SUBSTRING(REPLACE(NEWID(), '-', '¯'), 1, CASE WHEN [bool] = 1 THEN 18

    ELSE 23

    END) AS [DATA]

    INTO #stagingTable

    FROM ( SELECT TOP 1000000

    CRYPT_GEN_RANDOM(1) % 2

    FROM master.dbo.syscolumns sc1

    CROSS JOIN master.dbo.syscolumns sc2

    CROSS JOIN master.dbo.syscolumns sc3

    ) a ( [bool] );

    Solution using the 8K splitter: -

    IF EXISTS ( SELECT 1

    FROM sys.[objects]

    WHERE [name] = 'DelimitedSplit8K'

    AND type = 'IF' )

    BEGIN;

    DROP FUNCTION [dbo].[DelimitedSplit8K];

    END;

    GO

    CREATE FUNCTION [dbo].[DelimitedSplit8K]

    --===== Define I/O parameters

    (@pString VARCHAR(8000), @pDelimiter CHAR(1))

    --WARNING!!! DO NOT USE MAX DATA-TYPES HERE! IT WILL KILL PERFORMANCE!

    RETURNS TABLE WITH SCHEMABINDING AS

    RETURN

    --===== "Inline" CTE Driven "Tally Table" produces values from 1 up to 10,000...

    -- enough to cover VARCHAR(8000)

    WITH E1(N) AS (

    SELECT 1 UNION ALL SELECT 1 UNION ALL SELECT 1 UNION ALL

    SELECT 1 UNION ALL SELECT 1 UNION ALL SELECT 1 UNION ALL

    SELECT 1 UNION ALL SELECT 1 UNION ALL SELECT 1 UNION ALL SELECT 1

    ), --10E+1 or 10 rows

    E2(N) AS (SELECT 1 FROM E1 a, E1 b), --10E+2 or 100 rows

    E4(N) AS (SELECT 1 FROM E2 a, E2 b), --10E+4 or 10,000 rows max

    cteTally(N) AS (--==== This provides the "base" CTE and limits the number of rows right up front

    -- for both a performance gain and prevention of accidental "overruns"

    SELECT TOP (ISNULL(DATALENGTH(@pString),0)) ROW_NUMBER() OVER (ORDER BY (SELECT NULL)) FROM E4

    ),

    cteStart(N1) AS (--==== This returns N+1 (starting position of each "element" just once for each delimiter)

    SELECT 1 UNION ALL

    SELECT t.N+1 FROM cteTally t WHERE SUBSTRING(@pString,t.N,1) = @pDelimiter

    ),

    cteLen(N1,L1) AS(--==== Return start and length (for use in substring)

    SELECT s.N1,

    ISNULL(NULLIF(CHARINDEX(@pDelimiter,@pString,s.N1),0)-s.N1,8000)

    FROM cteStart s

    )

    --===== Do the actual split. The ISNULL/NULLIF combo handles the length for the final element when no delimiter is found.

    SELECT ItemNumber = ROW_NUMBER() OVER(ORDER BY l.N1),

    Item = SUBSTRING(@pString, l.N1, l.L1)

    FROM cteLen l

    ;

    GO

    Actual solution: -

    SELECT [DATA],

    MAX(CASE WHEN ItemNumber = 1 THEN Item ELSE '' END) AS [1],

    MAX(CASE WHEN ItemNumber = 2 THEN Item ELSE '' END) AS [2],

    MAX(CASE WHEN ItemNumber = 3 THEN Item ELSE '' END) AS [3]

    FROM #stagingTable

    CROSS APPLY [dbo].[DelimitedSplit8K]([DATA], '¯') AS de

    WHERE ItemNumber < 4

    GROUP BY [DATA];

    Produces: -

    DATA 1 2 3

    ----------------------- ----------------------- ----------------------- -----------------------

    00772BD2¯9396¯49ED 00772BD2 9396 49ED

    00AB3FDE¯4DE1¯4030 00AB3FDE 4DE1 4030

    01476877¯6DCC¯4B06 01476877 6DCC 4B06

    014C256A¯930E¯4071¯B70E 014C256A 930E 4071

    01F145BF¯371A¯4754 01F145BF 371A 4754

    02D4C463¯47C2¯4B75¯863B 02D4C463 47C2 4B75

    035C3548¯F07E¯4ED4¯81DB 035C3548 F07E 4ED4[/quote-0]


    Forever trying to learn
    My blog - http://www.cadavre.co.uk/
    For better, quicker answers on T-SQL questions, click on the following...http://www.sqlservercentral.com/articles/Best+Practices/61537/
    For better, quicker answers on SQL Server performance related questions, click on the following...http://www.sqlservercentral.com/articles/SQLServerCentral/66909/

  • The solution posted by Cadavre looks very complete.

    One possible change, when looking at the situation where there are fewer columns, in the SELECT clause, I'd probably use COALESCE instead of a CASE statement. But that's a pretty trivial change.

    "The credit belongs to the man who is actually in the arena, whose face is marred by dust and sweat and blood"
    - Theodore Roosevelt

    Author of:
    SQL Server Execution Plans
    SQL Server Query Performance Tuning

Viewing 3 posts - 1 through 2 (of 2 total)

You must be logged in to reply to this topic. Login to reply