Concatenate contents of column if it exists - 'invalid column name' error

  • I'm scraping data that sometimes has several footnotes per record.  If so, the csv file which is my source material has additional columns named Footnote, Footnote2, Footnote3 etc.   The number of such Footnote columns varies depending on the particular records scraped.  Sometimes there is only 1, sometimes there are 3 or 4.

    After ingesting the CSV into a table in my database, I'm trying to write a select script that will concatenate all of these footnotes into a single column.

    The script with my attempt follows.  As you can see it attempts to check for existence of the columns I am concatenating.  If footnote2 doesn't exist, then the first option should be returned.  However what is happening is that SQL appears to be checking for existence of all of the columns I've mentioned in subsequent 'WHEN' clauses before execution and it returns an 'Invalid column name' error if for example footnote2 does not exist.

    Is there a way I can re-write this to accomplish what I am trying to do?

     

    SELECT 'Footnotes' = CASE 
    WHEN
    -- There is only 1 footnote column
    NOT EXISTS(SELECT 1 FROM sys.columns WHERE Name = N'Footnote2' AND Object_ID = Object_ID(N'dbo.TempparsedHeaderCSVData')) -- Test for existence of Footnote number
    AND EXISTS(SELECT 1 FROM sys.columns WHERE Name = N'Footnote' AND Object_ID = Object_ID(N'dbo.TempparsedHeaderCSVData')) -- Test for existence of Footnote number
    THEN h.Footnote

    WHEN
    -- There are 2 footnote columns
    NOT EXISTS(SELECT 1 FROM sys.columns WHERE Name = N'Footnote3' AND Object_ID = Object_ID(N'dbo.TempparsedHeaderCSVData')) -- Test for existence of Footnote number
    AND EXISTS(SELECT 1 FROM sys.columns WHERE Name = N'Footnote' AND Object_ID = Object_ID(N'dbo.TempparsedHeaderCSVData')) -- Test for existence of Footnote number
    AND EXISTS(SELECT 1 FROM sys.columns WHERE Name = N'Footnote2' AND Object_ID = Object_ID(N'dbo.TempparsedHeaderCSVData')) -- Test for existence of Footnote number
    THEN 'Footnotes for all transactions in this filing: ' + ISNULL(h.Footnote,'') + ISNULL(h.Footnote2,'')
    WHEN
    -- There are 3 footnote columns
    NOT EXISTS(SELECT 1 FROM sys.columns WHERE Name = N'Footnote4' AND Object_ID = Object_ID(N'dbo.TempparsedHeaderCSVData')) -- Test for existence of Footnote number
    AND EXISTS(SELECT 1 FROM sys.columns WHERE Name = N'Footnote' AND Object_ID = Object_ID(N'dbo.TempparsedHeaderCSVData')) -- Test for existence of Footnote number
    AND EXISTS(SELECT 1 FROM sys.columns WHERE Name = N'Footnote2' AND Object_ID = Object_ID(N'dbo.TempparsedHeaderCSVData')) -- Test for existence of Footnote number
    AND EXISTS(SELECT 1 FROM sys.columns WHERE Name = N'Footnote3' AND Object_ID = Object_ID(N'dbo.TempparsedHeaderCSVData')) -- Test for existence of Footnote number
    THEN 'Footnotes for all transactions in this filing: ' + ISNULL(h.Footnote,'') + ISNULL(h.Footnote2,'')+ ISNULL(h.Footnote3,'')
    ELSE NULL
    END

    FROM TempparsedHeaderCSVData h

     

    Edit: adding sample data:

    Here's sample data.  Since as I mentioned the source table sometimes has more than 1 footnote column, I'm creating 2 tables here; one for a situation where there is just 1 footnote column, a second for a situation where there are 2 footnote columns.  As I mentioned I'm trying to create a script that will take data from both footnote columns if both exists (like in the second table here) and concatenate it into a single column.  Obviously if only 1 exists like the first table, I just want to take the text from that 1 footnote column.

     

     

    -- Create table. This is an example where there is only 1 footnote column in the source table (that comes in from an external source)

    CREATE TABLE TempparsedHeaderCSVData

    (Footnote nvarchar(1000) NULL)

    -- Insert test data

    INSERT INTO TempparsedHeaderCSVData

    (Footnote)

    VALUES

    ('This is footnote 1')

    , ('Helllloooooo')

    , ('Size 9')

    , ('Are these my legs?')

    Second table:

    -- Create table. This is an example where there are 2 footnote columns in the source table (that comes in from an external source)

    CREATE TABLE TempparsedHeaderCSVData

    (Footnote nvarchar(1000) NULL, Footnote1 nvarchar(1000) NULL)

    -- Insert test data

    INSERT INTO TempparsedHeaderCSVData

    (Footnote, Footnote1)

    VALUES

    ('This is a footnote', 'This also is a footnote' )

    , ('Helllloooooo', 'Goodbye')

    , ('Size 9', 'But what was the question?')

    , ('Are these my legs?', 'Size 9')


     

    • This topic was modified 4 years ago by  caspersql.
  • "I'm scraping data ..."

    What do you mean by this?

    Your query is checking for the existence of columns in tables, and has nothing to do with data.

    Is TempparsedHeaderCSVData a table? Maybe you need to be querying that, rather than sys.columns?

    If you haven't even tried to resolve your issue, please don't expect the hard-working volunteers here to waste their time providing links to answers which you could easily have found yourself.

  • What about if you break that out into IF statements instead?

    The above is all just my opinion on what you should do. 
    As with all advice you find on a random internet forum - you shouldn't blindly follow it.  Always test on a test server to see if there is negative side effects before making changes to live!
    I recommend you NEVER run "random code" you found online on any system you care about UNLESS you understand and can verify the code OR you don't care if the code trashes your system.

  • You could build the query using dynamic SQL, that way you wouldn't be referring to columns that don't exist in your query.

    Also, (this isn't going to fix your problem) instead of

    NOT EXISTS(SELECT 1 FROM sys.columns WHERE Name = N'Footnote2' AND Object_ID = Object_ID(N'dbo.TempparsedHeaderCSVData'))

    You could have

    COLUMNPROPERTY(OBJECT_ID('dbo.TempparsedHeaderCSVData'),'Footnote2','ColumnId') IS NULL

    which is a bit shorter to write.

  • Phil Parkin wrote:

    "I'm scraping data ..."

    What do you mean by this?

    Your query is checking for the existence of columns in tables, and has nothing to do with data.

    The scraping I referred to isn't shown in this script Phil.  I said that by way of background.

     

    Phil Parkin wrote:

    Is TempparsedHeaderCSVData a table? Maybe you need to be querying that, rather than sys.columns?

    Yes TempparsedHeaderCSVData is a table.  The FROM clause at the end queries TempparsedHeaderCSVData.  My understanding is that querying sys.columns as I've attempted to do in the WHEN clauses allows to check for existence of columns in the table TempparsedHeaderCSVData (when done in isolation rather than as I've done here)

    • This reply was modified 4 years ago by  caspersql.
    • This reply was modified 4 years ago by  caspersql.
    • This reply was modified 4 years ago by  caspersql.
  • Mr. Brian Gale wrote:

    What about if you break that out into IF statements instead?

    Hmm interesting...so it would be a series of nested IF statements?

  • Jonathan AC Roberts wrote:

    You could build the query using dynamic SQL, that way you wouldn't be referring to columns that don't exist in your query

    Thank you - I think this gets to the heart of the issue if doing it that way wouldn't result in an invalid column result.  I'll test.

  • I am not 100% sure if IF statements would handle that better or not.  Been a while since I tried it.  Just tried it and it does not work.  So don't go with that idea.

    Dynamic SQL should work though, but you will want to be VERY careful with that.  It would be possible for someone to do SQL injection attack doing that.  I've done similar things with dynamic SQL to get database names and build up my backup scripts (gets database names and their recovery model).  Not quite the same thing you are going for, but a similar idea.

    If you go with the dynamic SQL route, I would test it with some ' characters in the footer on a test system just to make sure you don't get someone being dumb and injecting SQL into that.

    Sorry about the bad suggestion of IF statements; that suggestion will not work.

    A different suggestion (that I tested) is to have multiple stored procedures for this.  Use an IF statement to determine which stored procedure to run, then call the proper stored procedure depending on how many "footer"s there are.

    I personally try to avoid dynamic SQL that can have user-input text in it.  If I don't know or trust the input data, I don't put it in dynamic SQL.

    The above is all just my opinion on what you should do. 
    As with all advice you find on a random internet forum - you shouldn't blindly follow it.  Always test on a test server to see if there is negative side effects before making changes to live!
    I recommend you NEVER run "random code" you found online on any system you care about UNLESS you understand and can verify the code OR you don't care if the code trashes your system.

  • select
    concat(iif((columnproperty(object_id('dbo.TempparsedHeaderCSVData'),'Footnote','ColumnId') IS NOT NULL) and
    (columnproperty(object_id('dbo.TempparsedHeaderCSVData'),'Footnote2','ColumnId') IS NULL),
    '', 'Footnotes for all transactions in this filing: '),
    isnull(h.Footnote,''), isnull(h.Footnote2,''), isnull(h.Footnote3,''))
    from
    TempparsedHeaderCSVData h;

    Aus dem Paradies, das Cantor uns geschaffen, soll uns niemand vertreiben können

  • scdecade - will that not fail still if columns don't exist (ie footnote3 may not exist)?  I just did a test on that and, like my solution, SQL sees "footnote3" in the query and if it doesn't exist in the table it will throw an error.

    I think the only good solutions are going to be dynamic SQL OR stored procedures.

    The above is all just my opinion on what you should do. 
    As with all advice you find on a random internet forum - you shouldn't blindly follow it.  Always test on a test server to see if there is negative side effects before making changes to live!
    I recommend you NEVER run "random code" you found online on any system you care about UNLESS you understand and can verify the code OR you don't care if the code trashes your system.

  • Well I have no sample data so that was a stab in the dark.  Makes sense though!  Maybe more iif's

    Aus dem Paradies, das Cantor uns geschaffen, soll uns niemand vertreiben können

  • Steve Collins wrote:

    Well I have no sample data so that was a stab in the dark.  Makes sense though!  Maybe more iif's

    Sorry about that.  Here's sample data.  Since as I mentioned the source table sometimes has more than 1 footnote column, I'm creating 2 tables here; one for a situation where there is just 1 footnote column, a second for a situation where there are 2 footnote columns.  As I mentioned I'm trying to create a script that will take data from both footnote columns if both exists (like in the second table here) and concatenate it into a single column.  Obviously if only 1 exists like the first table, I just want to take the text from that 1 footnote column.

     

     

    -- Create table. This is an example where there is only 1 footnote column in the source table (that comes in from an external source)

    CREATE TABLE TempparsedHeaderCSVData

    (Footnote nvarchar(1000) NULL)

    -- Insert test data

    INSERT INTO TempparsedHeaderCSVData

    (Footnote)

    VALUES

    ('This is footnote 1')

    ,('Helllloooooo')

    ,('Size 9')

    ,('Are these my legs?')

    Second table:

    -- Create table. This is an example where there are 2 footnote columns in the source table (that comes in from an external source)

    CREATE TABLE TempparsedHeaderCSVData

    (Footnote nvarchar(1000) NULL, Footnote1 nvarchar(1000) NULL)

    -- Insert test data

    INSERT INTO TempparsedHeaderCSVData

    (Footnote, Footnote1)

    VALUES

    ('This is a footnote', 'This also is a footnote' )

    ,('Helllloooooo', 'Goodbye')

    ,('Size 9', 'But what was the question?')

    ,('Are these my legs?', 'Size 9')

    • This reply was modified 4 years ago by  caspersql.
  • Mr. Brian Gale wrote:

    Dynamic SQL should work though, but you will want to be VERY careful with that.  It would be possible for someone to do SQL injection attack doing that.  I've done similar things with dynamic SQL to get database names and build up my backup scripts (gets database names and their recovery model).  Not quite the same thing you are going for, but a similar idea.

    If you go with the dynamic SQL route, I would test it with some ' characters in the footer on a test system just to make sure you don't get someone being dumb and injecting SQL into that.

    Very good point, thank you.  I'm reasonably sure the data is solid because it's coming from a reputable source, that being said can't be too careful so I like the idea of your multiple stored procedures solution

    Mr. Brian Gale wrote:

    A different suggestion (that I tested) is to have multiple stored procedures for this.  Use an IF statement to determine which stored procedure to run, then call the proper stored procedure depending on how many "footer"s there are.

    I personally try to avoid dynamic SQL that can have user-input text in it.  If I don't know or trust the input data, I don't put it in dynamic SQL.

    So to make sure I understand:

    • My stored procedures in this case would be the Select statements.
    • I would have one stored procedure for the situation where only 1 footnote column exists.
    • Then I would have a different one for the situation where columns footnote, footnote1 exist, a third for where columns footnote, footnote1 and footnote2 exist etc. right?
    • Then to select the 'correct' stored procedure to use when the data is being proceed, I would test for existence of those columns and choose the appropriate procedure?
  • As a thought, rather than handling this in SQL, could you handle this in your import or export tool?  Instead of having multiple "footer" fields, could you concat them before putting them in the database?

    The above is all just my opinion on what you should do. 
    As with all advice you find on a random internet forum - you shouldn't blindly follow it.  Always test on a test server to see if there is negative side effects before making changes to live!
    I recommend you NEVER run "random code" you found online on any system you care about UNLESS you understand and can verify the code OR you don't care if the code trashes your system.

  • Mr. Brian Gale wrote:

    As a thought, rather than handling this in SQL, could you handle this in your import or export tool?  Instead of having multiple "footer" fields, could you concat them before putting them in the database?

    At the moment I'm using BULK INSERT  in T-SQL to import the CSV file that comes in from the external source, but I guess I could build something in visual basic that would concatenate...not sure how I'd do that 'dynamically' though to handle situations where the CSV sometimes has multiple columns like footnote, footnote1, footnote2 etc and sometimes only has a single footnote column.

    I guess the ideal way to do it would be with Python or something, not really my area though.

Viewing 15 posts - 1 through 15 (of 15 total)

You must be logged in to reply to this topic. Login to reply