pulling out part of a column from a csv

Question

pulling out part of a column from a csv

Matthew Cushing

SSCrazy

Points: 2898
More actions
January 13, 2012 at 11:11 am

#266433

I'm wondering if it's easier to put this into a table (or a temp table - never done that before), but what I'm trying to do is this. I have a column in the csv that has an id number embedded in it. I want to pull the id number and check it against a table in my DW. If it's there, it keeps going in the process, if not, I want to dump it into another table.
The csv has detail and summary records, and the only way you can tell the difference is by looking up the ID number. If it's in the dimresident table, you have a person. If not, it's a summary record.
The string looks like this - Cushing Matthew(123456)
I have a piece of SQL that will do the job on a column in a table -
SUBSTRING([client_name], CHARINDEX('(', [client_name]) + 1, CHARINDEX(')', [client_name]) - CHARINDEX('(', [client_name]) - 1)
I'm trying to see how I can use a derived column expression to do the same thing, but CHARINDEX is not an SSIS function. Do I drop it into a worktable and go from there, a temp table (how would I do that?), or is there an SSIS function I am just overlooking?
thanks!
M@

Viewing 13 posts - 1 through 13 (of 13 total)

You must be logged in to reply to this topic. Login to reply

GSquared SSC Guru Points: 260824 More actions · Answer 1

Use Findstring instead of Charindex. Does the same thing.

- Gus "GSquared", RSVP, OODA, MAP, NMVP, FAQ, SAT, SQL, DNA, RNA, UOI, IOU, AM, PM, AD, BC, BCE, USA, UN, CF, ROFL, LOL, ETC
Property of The Thread

"Nobody knows the age of the human race, but everyone agrees it's old enough to know better." - Anon

Matthew Cushing SSCrazy Points: 2898 More actions · Answer 2

Thanks, found an article that pointed me at that. So now it's looking like this:

SUBSTRING(["client_name"],FINDSTRING(["client_name"],"(",1),FINDSTRING(["client_name"],"(",1) - FINDSTRING(["client_name"],"(",1))

if evaluates in the expression box, but when I try to debug it, I get this:[265]] Error: An error occurred while evaluating the function.

I'm wondering since I'm pulling it from a csv, do I have to cast it to something else maybe? It's an alphanumeric - cushing matthew (123456). I would think if there was an issue with it, it would balk in the expression builder.

thanks!

GSquared SSC Guru Points: 260824 More actions · Answer 3

Is it possible there are rows that don't have parentheses in them? If so, it'll get an eval error.

- Gus "GSquared", RSVP, OODA, MAP, NMVP, FAQ, SAT, SQL, DNA, RNA, UOI, IOU, AM, PM, AD, BC, BCE, USA, UN, CF, ROFL, LOL, ETC
Property of The Thread

"Nobody knows the age of the human race, but everyone agrees it's old enough to know better." - Anon

Matthew Cushing SSCrazy Points: 2898 More actions · Answer 4

Actually yea, that's what I'm testing for. All of the residents have (1234) format. Some of the summary records CAN, but don't have to. Do I need to surround my expression with some kind of test for the (?

if there is no (, there is no ).

GSquared SSC Guru Points: 260824 More actions · Answer 5

Probably. Or some form of error handling on those rows, as per usual "divert to a separate workflow" kind of things.

- Gus "GSquared", RSVP, OODA, MAP, NMVP, FAQ, SAT, SQL, DNA, RNA, UOI, IOU, AM, PM, AD, BC, BCE, USA, UN, CF, ROFL, LOL, ETC
Property of The Thread

"Nobody knows the age of the human race, but everyone agrees it's old enough to know better." - Anon

Matthew Cushing SSCrazy Points: 2898 More actions · Answer 6

hit another snag, but it's unrelated. that works, thanks so much.

GSquared SSC Guru Points: 260824 More actions · Answer 7

You're welcome.

- Gus "GSquared", RSVP, OODA, MAP, NMVP, FAQ, SAT, SQL, DNA, RNA, UOI, IOU, AM, PM, AD, BC, BCE, USA, UN, CF, ROFL, LOL, ETC
Property of The Thread

"Nobody knows the age of the human race, but everyone agrees it's old enough to know better." - Anon

Matthew Cushing SSCrazy Points: 2898 More actions · Answer 8

Okay, the other snag turns out to be because the expression is evaluating and passing nothing to the next step. I'm looking in the viewer and for that column that I'm adding, it's empty.

Would this be easier if I put it into a script task and used vb to do all of this stuff? *shudder*

GSquared SSC Guru Points: 260824 More actions · Answer 9

Honestly, I'm more used to handling that kind of thing in T-SQL than SSIS or .NET. I usually use SSIS to get data into a staging table, then do the heavy lifting in SQL Server.

- Gus "GSquared", RSVP, OODA, MAP, NMVP, FAQ, SAT, SQL, DNA, RNA, UOI, IOU, AM, PM, AD, BC, BCE, USA, UN, CF, ROFL, LOL, ETC
Property of The Thread

"Nobody knows the age of the human race, but everyone agrees it's old enough to know better." - Anon

Matthew Cushing SSCrazy Points: 2898 More actions · Answer 10

That's what I'm thinking of. I'm wondering if all of this crap wouldn't be much easier if I just dropped it into a staging table and did some magic from there.

Actually, I think that was my original question - would you drop to a staging table or a temp table? I've never worked with temp tables before.

GSquared SSC Guru Points: 260824 More actions · Answer 11

Permanent staging tables will be easier.

- Gus "GSquared", RSVP, OODA, MAP, NMVP, FAQ, SAT, SQL, DNA, RNA, UOI, IOU, AM, PM, AD, BC, BCE, USA, UN, CF, ROFL, LOL, ETC
Property of The Thread

"Nobody knows the age of the human race, but everyone agrees it's old enough to know better." - Anon

herladygeekedness SSCertifiable Points: 5938 More actions · Answer 12

Sometimes it's just better to have those intermediary objects. I mostly do it for audit purposes but sometimes it's the path of least resistance to providing the correct and reliable solution.

I'm currently pulling in 2 new different types of information from 2 different sources to be "weaved into" my existing fact table etl. It simply made sense to store the new info into tables because those records aren't all "used" in the main flow and having that data stick around beyond fact population has proved to be very useful. I can easily validate that the right records hit the right fact table rows when the etl succeeds.

I do truncate them at teh start of each run, but at least if the fact table fails (error or validation disqualify), I can see what was introduced to the flow that may have caused the fail. Or see that the package didn't make it far enuf to populate them.