Getting ethnicity information - multiple columns to single column

  • I have some demographic information available to me, and I'm trying to calculate the value for ethnicity and I'm running into some issues

    I have data like

    [ETH_CAUCASIAN], [ETH_AFRAMER], [ETH_HISP], [ETH_NATIVEAMER], [ETH_NATIVEHAW], [ETH_SOUTHASIAN]....

    with 12 different columns for each ethnicity type with the values being Yes or No - each row in the table is a person, and they could have multiple ethnicity types

    I need to translate this into a single column, with a value assigned to the ethnicity, so for example, 'Caucasian' would be 1, 'African-American' = 2, 'Hispanic or Latino' = 3 etc. and then i have to be able to allow for mixed ethnicity, and put in the value 'Mixed Ethnicity', but only if the mixed ethnicity matches some criteria such as (1, 8, 9, 10, 11) + Any other except 3 = 'Mixed Ethnicity' - or (2, 8, 9, 10, 11) + Any other except 3 = 'Mixed Ethnicity', (8, 8, 9, 10, 12) = 'Other' etc.

    Then finally just put an 'Unknown' as a catch-all if I don't have a match to any

    I tried putting a case statement in, despite it being ugly, its a query that will run maybe once every few months, I don't need it to be pretty and i'm not even that worried about it being efficient (though obviously I'd prefer it) - but I got an error 'Case expressions may only be nested to level 10.' (its a single case with multiple WHEN/THEN's)

    I also thought I could pivot it and do some summing to try and work it out, but it got weird and overly complicated..

    Is there a better way to go about this?

    Thanks

    --------------------------

    I long for a job where my databases dont have any pesky users accessing them 🙂

  • Have you tried with UNPIVOT?

    Can you post table scripts and some sample data?

    -- Gianluca Sartori

  • At one point I did pivot, then unpivoted and then had a duh moment when i realized that the data was back how it started.

    This is basically what I have - its actually much more complicated a table, but here's the jist of it:

    CREATE TABLE [dbo].[Ethnicity](

    [PersonID] INT NULL,

    [ETH_CAUCASIAN] [varchar](10) NULL,

    [ETH_AFRAMER] [varchar](10) NULL,

    [ETH_HISP] [varchar](10) NULL,

    [ETH_NATIVEAMER] [varchar](10) NULL,

    [ETH_NATIVEHAW] [varchar](10) NULL,

    [ETH_SOUTHASIAN] [varchar](10) NULL,

    [ETH_EASTASIAN] [varchar](10) NULL,

    [ETH_MIDEAST] [varchar](10) NULL,

    [ETH_EASTJEW] [varchar](10) NULL,

    [ETH_SEPHJEW] [varchar](10) NULL,

    [ETH_FRNCAN] [varchar](10) NULL,

    [ETH_OTHER] [varchar](10) NULL

    ) ON [PRIMARY]

    GO

    INSERT INTO Ethnicity

    SELECT 1, 'Yes', 'No', 'No', 'No', 'No', 'No', 'No', 'No', 'No', 'Yes', 'No', 'No'

    UNION

    SELECT 2, 'No', 'Yes', 'No', 'No', 'No', 'No', 'No', 'No', 'No', 'No', 'No', 'No'

    UNION

    SELECT 3, 'Yes', 'No', 'No', 'No', 'No', 'No', 'No', 'No', 'No', 'Yes', 'No', 'No'

    What I should end up with is (remembering my rules from my earlier post: I need to translate this into a single column, with a value assigned to the ethnicity, so for example, 'Caucasian' would be 1, 'African-American' = 2, 'Hispanic or Latino' = 3 etc. and then i have to be able to allow for mixed ethnicity, and put in the value 'Mixed Ethnicity', but only if the mixed ethnicity matches some criteria such as (1, 8, 9, 10, 11) + Any other except 3 = 'Mixed Ethnicity' - or (2, 8, 9, 10, 11) + Any other except 3 = 'Mixed Ethnicity', (8, 8, 9, 10, 12) = 'Other' etc.)

    PersonID, Ethnicity

    1, 'Mixed Ethnicity'

    2, '2'

    3, '1'

    --------------------------

    I long for a job where my databases dont have any pesky users accessing them 🙂

  • Something like:

    SELECT PersonID,

    Ethnicity = CASE WHEN (Expression to determine Caucasion) THEN 'Caucasion'

    WHEN (Expression to determine Mixed) THEN 'Mixed'

    END

    FROM dbo.Ethnicity

    Wayne
    Microsoft Certified Master: SQL Server 2008
    Author - SQL Server T-SQL Recipes


    If you can't explain to another person how the code that you're copying from the internet works, then DON'T USE IT on a production system! After all, you will be the one supporting it!
    Links:
    For better assistance in answering your questions
    Performance Problems
    Common date/time routines
    Understanding and Using APPLY Part 1 & Part 2

  • Hi Wayne,

    Thats the route I went down originally - but I got the error that nested case statements could only goto 10 levels

    I had

    CASE

    WHEN ETH_CAUCASIAN = 'Yes' THEN '1'

    WHEN ETH_AFRAMER = 'Yes' THEN '2'

    WHEN 'ETH_HISP = 'Yes' THEN '3'

    etc. all the way to 12

    Then I'd have to do the expressions to find the mixed after that - and the mixed could be '1' + any other, or, '2' + any other etc all the way to '11' + any other - sort of thing.. the case statement doesn't seem to allow that level of detail..

    I could put a cursor on the table - use variables to hold the ethnicity types and various if statements to get the end value and then do an update - but I'm really hoping to avoid that..

    --------------------------

    I long for a job where my databases dont have any pesky users accessing them 🙂

  • you could pop it into a different structure and use some bit operators on it, just to upset the folk on here!

    eg:

    create table sources

    (sourceval int, name varchar(20))

    go

    create table folk

    (personid int, sourceval int)

    go

    insert sources

    select 1, 'uk'

    insert sources

    select 2, 'us'

    insert sources

    select 4, 'germ'

    insert sources

    select 8, 'franc'

    insert sources

    select 16, 'ned'

    go

    insert folk

    select 1, 4^2^8

    insert folk

    select 2, 1^2^4

    insert folk

    select 3, 16^2

    insert folk

    select 4, 2

    go

    select *

    from folk f

    join sources s

    on s.sourceval = f.sourceval & s.sourceval

    order by 1,2

    Then you just need to ensure you get the config in place to allow the various combinations to return that you want.

  • I did not understand the mixed ethnicity part, but I think you can work on something like this:

    DECLARE @Ethnicity TABLE (

    [PersonID] INT NULL,

    [ETH_CAUCASIAN] [varchar](10) NULL,

    [ETH_AFRAMER] [varchar](10) NULL,

    [ETH_HISP] [varchar](10) NULL,

    [ETH_NATIVEAMER] [varchar](10) NULL,

    [ETH_NATIVEHAW] [varchar](10) NULL,

    [ETH_SOUTHASIAN] [varchar](10) NULL,

    [ETH_EASTASIAN] [varchar](10) NULL,

    [ETH_MIDEAST] [varchar](10) NULL,

    [ETH_EASTJEW] [varchar](10) NULL,

    [ETH_SEPHJEW] [varchar](10) NULL,

    [ETH_FRNCAN] [varchar](10) NULL,

    [ETH_OTHER] [varchar](10) NULL

    )

    INSERT INTO @Ethnicity

    SELECT 1, 'Yes', 'No', 'No', 'No', 'No', 'No', 'No', 'No', 'No', 'Yes', 'No', 'No'

    UNION

    SELECT 2, 'No', 'Yes', 'No', 'No', 'No', 'No', 'No', 'No', 'No', 'No', 'No', 'No'

    UNION

    SELECT 3, 'Yes', 'No', 'No', 'No', 'No', 'No', 'No', 'No', 'No', 'Yes', 'No', 'No'

    SELECT PersonId, CASE WHEN COUNT(*) > 1 THEN 'MIXED ETHNICITY' ELSE MIN(ETH) END AS Ethnicity

    FROM (

    SELECT *

    FROM @Ethnicity AS ETH

    ) AS E

    UNPIVOT (

    Ethnicity FOR ETH IN (

    [ETH_CAUCASIAN]

    ,[ETH_AFRAMER]

    ,[ETH_HISP]

    ,[ETH_NATIVEAMER]

    ,[ETH_NATIVEHAW]

    ,[ETH_SOUTHASIAN]

    ,[ETH_EASTASIAN]

    ,[ETH_MIDEAST]

    ,[ETH_EASTJEW]

    ,[ETH_SEPHJEW]

    ,[ETH_FRNCAN]

    ,[ETH_OTHER]

    )

    ) AS UNPVT

    WHERE Ethnicity = 'Yes'

    GROUP BY PersonId

    -- Gianluca Sartori

  • I see where you're going with it - the mixed could be, for example

    if PersonID 1 is mixed Caucasian and Hispanic, then they wouldn't be marked as 'Mixed Ethnicity' they need to be marked as (for example) 'Mixed Hispanic' because I would then also need additional information in another column afterwards because of it (this is a research thing - not just random information for say a sales database - the ethnicity has a direct bearing on results) - so just putting 'mixed' won't really work..

    though.. i could perhaps use the unpivot to get the non-mixed, and then put a case statement around the mixed and join back to the original table and calculate the mixed ethnicity column's only (so i wouldn't run into the case statement issue (i think))

    --------------------------

    I long for a job where my databases dont have any pesky users accessing them 🙂

  • torpkev (12/20/2010)


    I could put a cursor on the table - use variables to hold the ethnicity types and various if statements to get the end value and then do an update - but I'm really hoping to avoid that..

    Where's that banishment or exorcism Emoticon at when you need it??? 😀

    I think that Lutz is on the right track, but to proceed I think that we need what the definitions are of all of the mixed ethnicities that you have to deal with.

    Wayne
    Microsoft Certified Master: SQL Server 2008
    Author - SQL Server T-SQL Recipes


    If you can't explain to another person how the code that you're copying from the internet works, then DON'T USE IT on a production system! After all, you will be the one supporting it!
    Links:
    For better assistance in answering your questions
    Performance Problems
    Common date/time routines
    Understanding and Using APPLY Part 1 & Part 2

  • torpkev (12/20/2010)


    I need to translate this into a single column...

    Heh.. your request has people speaking of BIT maps, Cursors, and other basic "Worlds-of-Pain" . With that in mind, would you mind telling us what the business requirements are that require this information to be in a single column? Perhaps we can find a better way if we understood the need more clearly. Thanks.

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

    Change is inevitable... Change for the better is not.


    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)

  • Also, if you do go to a "BIT" map on this, you should really consider a binary numbering scheme to support actualy "Bit Level Math". 1 = CAUCASIAN, 2 = AFRAMER, 4 = HISP, 8 = NATIVEAMER, etc, etc.

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

    Change is inevitable... Change for the better is not.


    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)

  • I don't want to look like you-know-who, but you really have to fix your design.

    Try putting the ethnicity column in a separate 0-N table. This is what the UNPIVOT does (with a dirty trick).

    Moreover, I second Jeff's advice: describe in more details the business requirements, maybe you don't even need to do that.

    -- Gianluca Sartori

  • Gianluca Sartori (12/21/2010)


    I don't want to look like you-know-who, but you really have to fix your design.

    Celko often makes good points, it's just the delivery that is sometimes a bit lacking. I'd go further along the lines of his normal advice and suggest torpkev use one of the many standard ethnicity classification systems out there, and base a good relational design around that.

  • Now we just need somebody to say SQL is nothing like COBOL...:-D

    -- Gianluca Sartori

  • Gianluca Sartori (12/21/2010)


    Now we just need somebody to say SQL is nothing like COBOL...:-D

    And maybe relate this coding style to yee ole punch cards?

    Wayne
    Microsoft Certified Master: SQL Server 2008
    Author - SQL Server T-SQL Recipes


    If you can't explain to another person how the code that you're copying from the internet works, then DON'T USE IT on a production system! After all, you will be the one supporting it!
    Links:
    For better assistance in answering your questions
    Performance Problems
    Common date/time routines
    Understanding and Using APPLY Part 1 & Part 2

Viewing 15 posts - 1 through 15 (of 15 total)

You must be logged in to reply to this topic. Login to reply