pulling out part of a column from a csv

  • I'm wondering if it's easier to put this into a table (or a temp table - never done that before), but what I'm trying to do is this. I have a column in the csv that has an id number embedded in it. I want to pull the id number and check it against a table in my DW. If it's there, it keeps going in the process, if not, I want to dump it into another table.

    The csv has detail and summary records, and the only way you can tell the difference is by looking up the ID number. If it's in the dimresident table, you have a person. If not, it's a summary record.

    The string looks like this - Cushing Matthew(123456)

    I have a piece of SQL that will do the job on a column in a table -

    SUBSTRING([client_name], CHARINDEX('(', [client_name]) + 1, CHARINDEX(')', [client_name]) - CHARINDEX('(', [client_name]) - 1)

    I'm trying to see how I can use a derived column expression to do the same thing, but CHARINDEX is not an SSIS function. Do I drop it into a worktable and go from there, a temp table (how would I do that?), or is there an SSIS function I am just overlooking?

    thanks!

    M@

  • Use Findstring instead of Charindex. Does the same thing.

    - Gus "GSquared", RSVP, OODA, MAP, NMVP, FAQ, SAT, SQL, DNA, RNA, UOI, IOU, AM, PM, AD, BC, BCE, USA, UN, CF, ROFL, LOL, ETC
    Property of The Thread

    "Nobody knows the age of the human race, but everyone agrees it's old enough to know better." - Anon

  • Thanks, found an article that pointed me at that. So now it's looking like this:

    SUBSTRING(["client_name"],FINDSTRING(["client_name"],"(",1),FINDSTRING(["client_name"],"(",1) - FINDSTRING(["client_name"],"(",1))

    if evaluates in the expression box, but when I try to debug it, I get this:[265]] Error: An error occurred while evaluating the function.

    I'm wondering since I'm pulling it from a csv, do I have to cast it to something else maybe? It's an alphanumeric - cushing matthew (123456). I would think if there was an issue with it, it would balk in the expression builder.

    thanks!

  • Is it possible there are rows that don't have parentheses in them? If so, it'll get an eval error.

    - Gus "GSquared", RSVP, OODA, MAP, NMVP, FAQ, SAT, SQL, DNA, RNA, UOI, IOU, AM, PM, AD, BC, BCE, USA, UN, CF, ROFL, LOL, ETC
    Property of The Thread

    "Nobody knows the age of the human race, but everyone agrees it's old enough to know better." - Anon

  • Actually yea, that's what I'm testing for. All of the residents have (1234) format. Some of the summary records CAN, but don't have to. Do I need to surround my expression with some kind of test for the (?

    if there is no (, there is no ).

  • Probably. Or some form of error handling on those rows, as per usual "divert to a separate workflow" kind of things.

    - Gus "GSquared", RSVP, OODA, MAP, NMVP, FAQ, SAT, SQL, DNA, RNA, UOI, IOU, AM, PM, AD, BC, BCE, USA, UN, CF, ROFL, LOL, ETC
    Property of The Thread

    "Nobody knows the age of the human race, but everyone agrees it's old enough to know better." - Anon

  • hit another snag, but it's unrelated. that works, thanks so much.

  • You're welcome.

    - Gus "GSquared", RSVP, OODA, MAP, NMVP, FAQ, SAT, SQL, DNA, RNA, UOI, IOU, AM, PM, AD, BC, BCE, USA, UN, CF, ROFL, LOL, ETC
    Property of The Thread

    "Nobody knows the age of the human race, but everyone agrees it's old enough to know better." - Anon

  • Okay, the other snag turns out to be because the expression is evaluating and passing nothing to the next step. I'm looking in the viewer and for that column that I'm adding, it's empty.

    Would this be easier if I put it into a script task and used vb to do all of this stuff? *shudder*

  • Honestly, I'm more used to handling that kind of thing in T-SQL than SSIS or .NET. I usually use SSIS to get data into a staging table, then do the heavy lifting in SQL Server.

    - Gus "GSquared", RSVP, OODA, MAP, NMVP, FAQ, SAT, SQL, DNA, RNA, UOI, IOU, AM, PM, AD, BC, BCE, USA, UN, CF, ROFL, LOL, ETC
    Property of The Thread

    "Nobody knows the age of the human race, but everyone agrees it's old enough to know better." - Anon

  • That's what I'm thinking of. I'm wondering if all of this crap wouldn't be much easier if I just dropped it into a staging table and did some magic from there.

    Actually, I think that was my original question - would you drop to a staging table or a temp table? I've never worked with temp tables before.

  • Permanent staging tables will be easier.

    - Gus "GSquared", RSVP, OODA, MAP, NMVP, FAQ, SAT, SQL, DNA, RNA, UOI, IOU, AM, PM, AD, BC, BCE, USA, UN, CF, ROFL, LOL, ETC
    Property of The Thread

    "Nobody knows the age of the human race, but everyone agrees it's old enough to know better." - Anon

  • Sometimes it's just better to have those intermediary objects. I mostly do it for audit purposes but sometimes it's the path of least resistance to providing the correct and reliable solution.

    I'm currently pulling in 2 new different types of information from 2 different sources to be "weaved into" my existing fact table etl. It simply made sense to store the new info into tables because those records aren't all "used" in the main flow and having that data stick around beyond fact population has proved to be very useful. I can easily validate that the right records hit the right fact table rows when the etl succeeds.

    I do truncate them at teh start of each run, but at least if the fact table fails (error or validation disqualify), I can see what was introduced to the flow that may have caused the fail. Or see that the package didn't make it far enuf to populate them.

Viewing 13 posts - 1 through 12 (of 12 total)

You must be logged in to reply to this topic. Login to reply