Getting ethnicity information - multiple columns to single column

Question

Post reply

Getting ethnicity information - multiple columns to single column

torpkev

SSCarpal Tunnel

Points: 4819
More actions
December 20, 2010 at 9:47 am

#388471

I have some demographic information available to me, and I'm trying to calculate the value for ethnicity and I'm running into some issues
I have data like
[ETH_CAUCASIAN], [ETH_AFRAMER], [ETH_HISP], [ETH_NATIVEAMER], [ETH_NATIVEHAW], [ETH_SOUTHASIAN]....
with 12 different columns for each ethnicity type with the values being Yes or No - each row in the table is a person, and they could have multiple ethnicity types
I need to translate this into a single column, with a value assigned to the ethnicity, so for example, 'Caucasian' would be 1, 'African-American' = 2, 'Hispanic or Latino' = 3 etc. and then i have to be able to allow for mixed ethnicity, and put in the value 'Mixed Ethnicity', but only if the mixed ethnicity matches some criteria such as (1, 8, 9, 10, 11) + Any other except 3 = 'Mixed Ethnicity' - or (2, 8, 9, 10, 11) + Any other except 3 = 'Mixed Ethnicity', (8, 8, 9, 10, 12) = 'Other' etc.
Then finally just put an 'Unknown' as a catch-all if I don't have a match to any
I tried putting a case statement in, despite it being ugly, its a query that will run maybe once every few months, I don't need it to be pretty and i'm not even that worried about it being efficient (though obviously I'd prefer it) - but I got an error 'Case expressions may only be nested to level 10.' (its a single case with multiple WHEN/THEN's)
I also thought I could pivot it and do some summing to try and work it out, but it got weird and overly complicated..
Is there a better way to go about this?
Thanks
--------------------------
I long for a job where my databases dont have any pesky users accessing them 🙂

Viewing 15 posts - 1 through 15 (of 15 total)

You must be logged in to reply to this topic. Login to reply

spaghettidba SSC Guru Points: 105732 More actions · Answer 1

Have you tried with UNPIVOT?

Can you post table scripts and some sample data?

-- Gianluca Sartori

torpkev SSCarpal Tunnel Points: 4819 More actions · Answer 2

At one point I did pivot, then unpivoted and then had a duh moment when i realized that the data was back how it started.

This is basically what I have - its actually much more complicated a table, but here's the jist of it:

CREATE TABLE [dbo].[Ethnicity](

[PersonID] INT NULL,

[ETH_CAUCASIAN] [varchar](10) NULL,

[ETH_AFRAMER] [varchar](10) NULL,

[ETH_HISP] [varchar](10) NULL,

[ETH_NATIVEAMER] [varchar](10) NULL,

[ETH_NATIVEHAW] [varchar](10) NULL,

[ETH_SOUTHASIAN] [varchar](10) NULL,

[ETH_EASTASIAN] [varchar](10) NULL,

[ETH_MIDEAST] [varchar](10) NULL,

[ETH_EASTJEW] [varchar](10) NULL,

[ETH_SEPHJEW] [varchar](10) NULL,

[ETH_FRNCAN] [varchar](10) NULL,

[ETH_OTHER] [varchar](10) NULL

) ON [PRIMARY]

GO

INSERT INTO Ethnicity

SELECT 1, 'Yes', 'No', 'No', 'No', 'No', 'No', 'No', 'No', 'No', 'Yes', 'No', 'No'

UNION

SELECT 2, 'No', 'Yes', 'No', 'No', 'No', 'No', 'No', 'No', 'No', 'No', 'No', 'No'

UNION

SELECT 3, 'Yes', 'No', 'No', 'No', 'No', 'No', 'No', 'No', 'No', 'Yes', 'No', 'No'

What I should end up with is (remembering my rules from my earlier post: I need to translate this into a single column, with a value assigned to the ethnicity, so for example, 'Caucasian' would be 1, 'African-American' = 2, 'Hispanic or Latino' = 3 etc. and then i have to be able to allow for mixed ethnicity, and put in the value 'Mixed Ethnicity', but only if the mixed ethnicity matches some criteria such as (1, 8, 9, 10, 11) + Any other except 3 = 'Mixed Ethnicity' - or (2, 8, 9, 10, 11) + Any other except 3 = 'Mixed Ethnicity', (8, 8, 9, 10, 12) = 'Other' etc.)

PersonID, Ethnicity

1, 'Mixed Ethnicity'

2, '2'

3, '1'

--------------------------

I long for a job where my databases dont have any pesky users accessing them 🙂

WayneS SSC Guru Points: 95461 More actions · Answer 3

Something like:

SELECT PersonID,

Ethnicity = CASE WHEN (Expression to determine Caucasion) THEN 'Caucasion'

WHEN (Expression to determine Mixed) THEN 'Mixed'

END

FROM dbo.Ethnicity

Wayne
Microsoft Certified Master: SQL Server 2008
Author - SQL Server T-SQL Recipes

If you can't explain to another person how the code that you're copying from the internet works, then DON'T USE IT on a production system! After all, you will be the one supporting it!

Links:
For better assistance in answering your questions
Performance Problems
Common date/time routines
Understanding and Using APPLY Part 1 & Part 2

torpkev SSCarpal Tunnel Points: 4819 More actions · Answer 4

Hi Wayne,

Thats the route I went down originally - but I got the error that nested case statements could only goto 10 levels

I had

CASE

WHEN ETH_CAUCASIAN = 'Yes' THEN '1'

WHEN ETH_AFRAMER = 'Yes' THEN '2'

WHEN 'ETH_HISP = 'Yes' THEN '3'

etc. all the way to 12

Then I'd have to do the expressions to find the mixed after that - and the mixed could be '1' + any other, or, '2' + any other etc all the way to '11' + any other - sort of thing.. the case statement doesn't seem to allow that level of detail..

I could put a cursor on the table - use variables to hold the ethnicity types and various if statements to get the end value and then do an update - but I'm really hoping to avoid that..

--------------------------

I long for a job where my databases dont have any pesky users accessing them 🙂

RichB SSCrazy Eights Points: 9651 More actions · Answer 5

you could pop it into a different structure and use some bit operators on it, just to upset the folk on here!

eg:

create table sources

(sourceval int, name varchar(20))

go

create table folk

(personid int, sourceval int)

go

insert sources

select 1, 'uk'

insert sources

select 2, 'us'

insert sources

select 4, 'germ'

insert sources

select 8, 'franc'

insert sources

select 16, 'ned'

go

insert folk

select 1, 4^2^8

insert folk

select 2, 1^2^4

insert folk

select 3, 16^2

insert folk

select 4, 2

go

select *

from folk f

join sources s

on s.sourceval = f.sourceval & s.sourceval

order by 1,2

Then you just need to ensure you get the config in place to allow the various combinations to return that you want.

spaghettidba SSC Guru Points: 105732 More actions · Answer 6

I did not understand the mixed ethnicity part, but I think you can work on something like this:

DECLARE @Ethnicity TABLE (

[PersonID] INT NULL,

[ETH_CAUCASIAN] [varchar](10) NULL,

[ETH_AFRAMER] [varchar](10) NULL,

[ETH_HISP] [varchar](10) NULL,

[ETH_NATIVEAMER] [varchar](10) NULL,

[ETH_NATIVEHAW] [varchar](10) NULL,

[ETH_SOUTHASIAN] [varchar](10) NULL,

[ETH_EASTASIAN] [varchar](10) NULL,

[ETH_MIDEAST] [varchar](10) NULL,

[ETH_EASTJEW] [varchar](10) NULL,

[ETH_SEPHJEW] [varchar](10) NULL,

[ETH_FRNCAN] [varchar](10) NULL,

[ETH_OTHER] [varchar](10) NULL

)

INSERT INTO @Ethnicity

SELECT 1, 'Yes', 'No', 'No', 'No', 'No', 'No', 'No', 'No', 'No', 'Yes', 'No', 'No'

UNION

SELECT 2, 'No', 'Yes', 'No', 'No', 'No', 'No', 'No', 'No', 'No', 'No', 'No', 'No'

UNION

SELECT 3, 'Yes', 'No', 'No', 'No', 'No', 'No', 'No', 'No', 'No', 'Yes', 'No', 'No'

SELECT PersonId, CASE WHEN COUNT(*) > 1 THEN 'MIXED ETHNICITY' ELSE MIN(ETH) END AS Ethnicity

FROM (

SELECT *

FROM @Ethnicity AS ETH

) AS E

UNPIVOT (

Ethnicity FOR ETH IN (

[ETH_CAUCASIAN]

,[ETH_AFRAMER]

,[ETH_HISP]

,[ETH_NATIVEAMER]

,[ETH_NATIVEHAW]

,[ETH_SOUTHASIAN]

,[ETH_EASTASIAN]

,[ETH_MIDEAST]

,[ETH_EASTJEW]

,[ETH_SEPHJEW]

,[ETH_FRNCAN]

,[ETH_OTHER]

)

) AS UNPVT

WHERE Ethnicity = 'Yes'

GROUP BY PersonId

-- Gianluca Sartori

torpkev SSCarpal Tunnel Points: 4819 More actions · Answer 7

I see where you're going with it - the mixed could be, for example

if PersonID 1 is mixed Caucasian and Hispanic, then they wouldn't be marked as 'Mixed Ethnicity' they need to be marked as (for example) 'Mixed Hispanic' because I would then also need additional information in another column afterwards because of it (this is a research thing - not just random information for say a sales database - the ethnicity has a direct bearing on results) - so just putting 'mixed' won't really work..

though.. i could perhaps use the unpivot to get the non-mixed, and then put a case statement around the mixed and join back to the original table and calculate the mixed ethnicity column's only (so i wouldn't run into the case statement issue (i think))

--------------------------

I long for a job where my databases dont have any pesky users accessing them 🙂

WayneS SSC Guru Points: 95461 More actions · Answer 8

torpkev (12/20/2010)
I could put a cursor on the table - use variables to hold the ethnicity types and various if statements to get the end value and then do an update - but I'm really hoping to avoid that..

Where's that banishment or exorcism Emoticon at when you need it??? 😀

I think that Lutz is on the right track, but to proceed I think that we need what the definitions are of all of the mixed ethnicities that you have to deal with.

Wayne
Microsoft Certified Master: SQL Server 2008
Author - SQL Server T-SQL Recipes

If you can't explain to another person how the code that you're copying from the internet works, then DON'T USE IT on a production system! After all, you will be the one supporting it!

Links:
For better assistance in answering your questions
Performance Problems
Common date/time routines
Understanding and Using APPLY Part 1 & Part 2

Jeff Moden SSC Guru Points: 1004664 More actions · Answer 9

torpkev (12/20/2010)
I need to translate this into a single column...

Heh.. your request has people speaking of BIT maps, Cursors, and other basic "Worlds-of-Pain" . With that in mind, would you mind telling us what the business requirements are that require this information to be in a single column? Perhaps we can find a better way if we understood the need more clearly. Thanks.

--Jeff Moden

RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
First step towards the paradigm shift of writing Set Based code:
________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

Change is inevitable... Change for the better is not.

Helpful Links:
How to post code problems
How to Post Performance Problems
Create a Tally Function (fnTally)

Jeff Moden SSC Guru Points: 1004664 More actions · Answer 10

Also, if you do go to a "BIT" map on this, you should really consider a binary numbering scheme to support actualy "Bit Level Math". 1 = CAUCASIAN, 2 = AFRAMER, 4 = HISP, 8 = NATIVEAMER, etc, etc.

--Jeff Moden

RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
First step towards the paradigm shift of writing Set Based code:
________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

Change is inevitable... Change for the better is not.

Helpful Links:
How to post code problems
How to Post Performance Problems
Create a Tally Function (fnTally)

spaghettidba SSC Guru Points: 105732 More actions · Answer 11

I don't want to look like you-know-who, but you really have to fix your design.

Try putting the ethnicity column in a separate 0-N table. This is what the UNPIVOT does (with a dirty trick).

Moreover, I second Jeff's advice: describe in more details the business requirements, maybe you don't even need to do that.

-- Gianluca Sartori

Paul White SSC Guru Points: 150468 More actions · Answer 12

Gianluca Sartori (12/21/2010)
I don't want to look like you-know-who, but you really have to fix your design.

Celko often makes good points, it's just the delivery that is sometimes a bit lacking. I'd go further along the lines of his normal advice and suggest torpkev use one of the many standard ethnicity classification systems out there, and base a good relational design around that.

Paul White
All articles available on SQL.kiwi
@SQL_Kiwi

spaghettidba SSC Guru Points: 105732 More actions · Answer 13

Now we just need somebody to say SQL is nothing like COBOL...:-D

-- Gianluca Sartori

WayneS SSC Guru Points: 95461 More actions · Answer 14

Gianluca Sartori (12/21/2010)
Now we just need somebody to say SQL is nothing like COBOL...:-D

And maybe relate this coding style to yee ole punch cards?

Wayne
Microsoft Certified Master: SQL Server 2008
Author - SQL Server T-SQL Recipes

If you can't explain to another person how the code that you're copying from the internet works, then DON'T USE IT on a production system! After all, you will be the one supporting it!

Links:
For better assistance in answering your questions
Performance Problems
Common date/time routines
Understanding and Using APPLY Part 1 & Part 2