December 20, 2010 at 9:47 am
I have some demographic information available to me, and I'm trying to calculate the value for ethnicity and I'm running into some issues
I have data like
[ETH_CAUCASIAN], [ETH_AFRAMER], [ETH_HISP], [ETH_NATIVEAMER], [ETH_NATIVEHAW], [ETH_SOUTHASIAN]....
with 12 different columns for each ethnicity type with the values being Yes or No - each row in the table is a person, and they could have multiple ethnicity types
I need to translate this into a single column, with a value assigned to the ethnicity, so for example, 'Caucasian' would be 1, 'African-American' = 2, 'Hispanic or Latino' = 3 etc. and then i have to be able to allow for mixed ethnicity, and put in the value 'Mixed Ethnicity', but only if the mixed ethnicity matches some criteria such as (1, 8, 9, 10, 11) + Any other except 3 = 'Mixed Ethnicity' - or (2, 8, 9, 10, 11) + Any other except 3 = 'Mixed Ethnicity', (8, 8, 9, 10, 12) = 'Other' etc.
Then finally just put an 'Unknown' as a catch-all if I don't have a match to any
I tried putting a case statement in, despite it being ugly, its a query that will run maybe once every few months, I don't need it to be pretty and i'm not even that worried about it being efficient (though obviously I'd prefer it) - but I got an error 'Case expressions may only be nested to level 10.' (its a single case with multiple WHEN/THEN's)
I also thought I could pivot it and do some summing to try and work it out, but it got weird and overly complicated..
Is there a better way to go about this?
Thanks
--------------------------
I long for a job where my databases dont have any pesky users accessing them 🙂
December 20, 2010 at 10:02 am
Have you tried with UNPIVOT?
Can you post table scripts and some sample data?
-- Gianluca Sartori
December 20, 2010 at 10:14 am
At one point I did pivot, then unpivoted and then had a duh moment when i realized that the data was back how it started.
This is basically what I have - its actually much more complicated a table, but here's the jist of it:
CREATE TABLE [dbo].[Ethnicity](
[PersonID] INT NULL,
[ETH_CAUCASIAN] [varchar](10) NULL,
[ETH_AFRAMER] [varchar](10) NULL,
[ETH_HISP] [varchar](10) NULL,
[ETH_NATIVEAMER] [varchar](10) NULL,
[ETH_NATIVEHAW] [varchar](10) NULL,
[ETH_SOUTHASIAN] [varchar](10) NULL,
[ETH_EASTASIAN] [varchar](10) NULL,
[ETH_MIDEAST] [varchar](10) NULL,
[ETH_EASTJEW] [varchar](10) NULL,
[ETH_SEPHJEW] [varchar](10) NULL,
[ETH_FRNCAN] [varchar](10) NULL,
[ETH_OTHER] [varchar](10) NULL
) ON [PRIMARY]
GO
INSERT INTO Ethnicity
SELECT 1, 'Yes', 'No', 'No', 'No', 'No', 'No', 'No', 'No', 'No', 'Yes', 'No', 'No'
UNION
SELECT 2, 'No', 'Yes', 'No', 'No', 'No', 'No', 'No', 'No', 'No', 'No', 'No', 'No'
UNION
SELECT 3, 'Yes', 'No', 'No', 'No', 'No', 'No', 'No', 'No', 'No', 'Yes', 'No', 'No'
What I should end up with is (remembering my rules from my earlier post: I need to translate this into a single column, with a value assigned to the ethnicity, so for example, 'Caucasian' would be 1, 'African-American' = 2, 'Hispanic or Latino' = 3 etc. and then i have to be able to allow for mixed ethnicity, and put in the value 'Mixed Ethnicity', but only if the mixed ethnicity matches some criteria such as (1, 8, 9, 10, 11) + Any other except 3 = 'Mixed Ethnicity' - or (2, 8, 9, 10, 11) + Any other except 3 = 'Mixed Ethnicity', (8, 8, 9, 10, 12) = 'Other' etc.)
PersonID, Ethnicity
1, 'Mixed Ethnicity'
2, '2'
3, '1'
--------------------------
I long for a job where my databases dont have any pesky users accessing them 🙂
December 20, 2010 at 10:21 am
Something like:
SELECT PersonID,
Ethnicity = CASE WHEN (Expression to determine Caucasion) THEN 'Caucasion'
WHEN (Expression to determine Mixed) THEN 'Mixed'
END
FROM dbo.Ethnicity
Wayne
Microsoft Certified Master: SQL Server 2008
Author - SQL Server T-SQL Recipes
December 20, 2010 at 10:26 am
Hi Wayne,
Thats the route I went down originally - but I got the error that nested case statements could only goto 10 levels
I had
CASE
WHEN ETH_CAUCASIAN = 'Yes' THEN '1'
WHEN ETH_AFRAMER = 'Yes' THEN '2'
WHEN 'ETH_HISP = 'Yes' THEN '3'
etc. all the way to 12
Then I'd have to do the expressions to find the mixed after that - and the mixed could be '1' + any other, or, '2' + any other etc all the way to '11' + any other - sort of thing.. the case statement doesn't seem to allow that level of detail..
I could put a cursor on the table - use variables to hold the ethnicity types and various if statements to get the end value and then do an update - but I'm really hoping to avoid that..
--------------------------
I long for a job where my databases dont have any pesky users accessing them 🙂
December 20, 2010 at 10:31 am
you could pop it into a different structure and use some bit operators on it, just to upset the folk on here!
eg:
create table sources
(sourceval int, name varchar(20))
go
create table folk
(personid int, sourceval int)
go
insert sources
select 1, 'uk'
insert sources
select 2, 'us'
insert sources
select 4, 'germ'
insert sources
select 8, 'franc'
insert sources
select 16, 'ned'
go
insert folk
select 1, 4^2^8
insert folk
select 2, 1^2^4
insert folk
select 3, 16^2
insert folk
select 4, 2
go
select *
from folk f
join sources s
on s.sourceval = f.sourceval & s.sourceval
order by 1,2
Then you just need to ensure you get the config in place to allow the various combinations to return that you want.
December 20, 2010 at 10:40 am
I did not understand the mixed ethnicity part, but I think you can work on something like this:
DECLARE @Ethnicity TABLE (
[PersonID] INT NULL,
[ETH_CAUCASIAN] [varchar](10) NULL,
[ETH_AFRAMER] [varchar](10) NULL,
[ETH_HISP] [varchar](10) NULL,
[ETH_NATIVEAMER] [varchar](10) NULL,
[ETH_NATIVEHAW] [varchar](10) NULL,
[ETH_SOUTHASIAN] [varchar](10) NULL,
[ETH_EASTASIAN] [varchar](10) NULL,
[ETH_MIDEAST] [varchar](10) NULL,
[ETH_EASTJEW] [varchar](10) NULL,
[ETH_SEPHJEW] [varchar](10) NULL,
[ETH_FRNCAN] [varchar](10) NULL,
[ETH_OTHER] [varchar](10) NULL
)
INSERT INTO @Ethnicity
SELECT 1, 'Yes', 'No', 'No', 'No', 'No', 'No', 'No', 'No', 'No', 'Yes', 'No', 'No'
UNION
SELECT 2, 'No', 'Yes', 'No', 'No', 'No', 'No', 'No', 'No', 'No', 'No', 'No', 'No'
UNION
SELECT 3, 'Yes', 'No', 'No', 'No', 'No', 'No', 'No', 'No', 'No', 'Yes', 'No', 'No'
SELECT PersonId, CASE WHEN COUNT(*) > 1 THEN 'MIXED ETHNICITY' ELSE MIN(ETH) END AS Ethnicity
FROM (
SELECT *
FROM @Ethnicity AS ETH
) AS E
UNPIVOT (
Ethnicity FOR ETH IN (
[ETH_CAUCASIAN]
,[ETH_AFRAMER]
,[ETH_HISP]
,[ETH_NATIVEAMER]
,[ETH_NATIVEHAW]
,[ETH_SOUTHASIAN]
,[ETH_EASTASIAN]
,[ETH_MIDEAST]
,[ETH_EASTJEW]
,[ETH_SEPHJEW]
,[ETH_FRNCAN]
,[ETH_OTHER]
)
) AS UNPVT
WHERE Ethnicity = 'Yes'
GROUP BY PersonId
-- Gianluca Sartori
December 20, 2010 at 10:54 am
I see where you're going with it - the mixed could be, for example
if PersonID 1 is mixed Caucasian and Hispanic, then they wouldn't be marked as 'Mixed Ethnicity' they need to be marked as (for example) 'Mixed Hispanic' because I would then also need additional information in another column afterwards because of it (this is a research thing - not just random information for say a sales database - the ethnicity has a direct bearing on results) - so just putting 'mixed' won't really work..
though.. i could perhaps use the unpivot to get the non-mixed, and then put a case statement around the mixed and join back to the original table and calculate the mixed ethnicity column's only (so i wouldn't run into the case statement issue (i think))
--------------------------
I long for a job where my databases dont have any pesky users accessing them 🙂
December 20, 2010 at 12:59 pm
torpkev (12/20/2010)
I could put a cursor on the table - use variables to hold the ethnicity types and various if statements to get the end value and then do an update - but I'm really hoping to avoid that..
Where's that banishment or exorcism Emoticon at when you need it??? 😀
I think that Lutz is on the right track, but to proceed I think that we need what the definitions are of all of the mixed ethnicities that you have to deal with.
Wayne
Microsoft Certified Master: SQL Server 2008
Author - SQL Server T-SQL Recipes
December 20, 2010 at 8:09 pm
torpkev (12/20/2010)
I need to translate this into a single column...
Heh.. your request has people speaking of BIT maps, Cursors, and other basic "Worlds-of-Pain" . With that in mind, would you mind telling us what the business requirements are that require this information to be in a single column? Perhaps we can find a better way if we understood the need more clearly. Thanks.
--Jeff Moden
Change is inevitable... Change for the better is not.
December 20, 2010 at 8:13 pm
Also, if you do go to a "BIT" map on this, you should really consider a binary numbering scheme to support actualy "Bit Level Math". 1 = CAUCASIAN, 2 = AFRAMER, 4 = HISP, 8 = NATIVEAMER, etc, etc.
--Jeff Moden
Change is inevitable... Change for the better is not.
December 21, 2010 at 12:58 am
I don't want to look like you-know-who, but you really have to fix your design.
Try putting the ethnicity column in a separate 0-N table. This is what the UNPIVOT does (with a dirty trick).
Moreover, I second Jeff's advice: describe in more details the business requirements, maybe you don't even need to do that.
-- Gianluca Sartori
December 21, 2010 at 2:01 am
Gianluca Sartori (12/21/2010)
I don't want to look like you-know-who, but you really have to fix your design.
Celko often makes good points, it's just the delivery that is sometimes a bit lacking. I'd go further along the lines of his normal advice and suggest torpkev use one of the many standard ethnicity classification systems out there, and base a good relational design around that.
December 21, 2010 at 2:05 am
Now we just need somebody to say SQL is nothing like COBOL...:-D
-- Gianluca Sartori
December 21, 2010 at 7:06 am
Gianluca Sartori (12/21/2010)
Now we just need somebody to say SQL is nothing like COBOL...:-D
And maybe relate this coding style to yee ole punch cards?
Wayne
Microsoft Certified Master: SQL Server 2008
Author - SQL Server T-SQL Recipes
Viewing 15 posts - 1 through 15 (of 15 total)
You must be logged in to reply to this topic. Login to reply