Click here to monitor SSC
SQLServerCentral is supported by Red Gate Software Ltd.
 
Log in  ::  Register  ::  Not logged in
 
 
 
        
Home       Members    Calendar    Who's On


Add to briefcase 123»»»

Logic to determine Gender based on FirstName Expand / Collapse
Author
Message
Posted Wednesday, August 31, 2005 8:11 AM
Valued Member

Valued MemberValued MemberValued MemberValued MemberValued MemberValued MemberValued MemberValued Member

Group: General Forum Members
Last Login: Wednesday, October 05, 2005 8:50 AM
Points: 64, Visits: 1

Does anybody know of any logic or algorithm that will determine the gender of a person based on the FirstName column. 

Any tools out there that will let us determine that?

Any lead will be appreciated.

 

Thanks

 

 

Post #215697
Posted Wednesday, August 31, 2005 8:31 AM
SSCommitted

SSCommittedSSCommittedSSCommittedSSCommittedSSCommittedSSCommittedSSCommittedSSCommitted

Group: General Forum Members
Last Login: Yesterday @ 6:55 AM
Points: 1,559, Visits: 657

Easy, it can't be done.

Boy or girl? Kim, Robin, Stac(e)y, Hilary.. I'm sure there are lots of more gender-generic names out there...

/Kenneth




Post #215707
Posted Wednesday, August 31, 2005 9:28 AM


SSC Veteran

SSC VeteranSSC VeteranSSC VeteranSSC VeteranSSC VeteranSSC VeteranSSC VeteranSSC Veteran

Group: General Forum Members
Last Login: Wednesday, February 20, 2013 8:55 AM
Points: 283, Visits: 9
... and Leslie, and Jamie, and Evelyn, and even my own Lee.


Post #215744
Posted Wednesday, August 31, 2005 11:35 AM


SSChampion

SSChampionSSChampionSSChampionSSChampionSSChampionSSChampionSSChampionSSChampionSSChampionSSChampion

Group: General Forum Members
Last Login: Today @ 9:35 AM
Points: 11,645, Visits: 27,747

aww come on guys; don't let the 5% of exceptions prevent you from doing the other 95% of the work.

it'd be some work gathering the data, but  a simple google of "boys names" and "girls names"  could get you some raw data. first link i found was 11000 boys names, but i don't know how easy it'll be to extract them. It's a pain to spider a bunch of pages and rip out the data you want, but it's certainly doable.

stick them in a table with a genderflag related to the source you found it.

CREATE TABLE GOOGLENAMES(name varchar(20), assumedGender char(1))
insert into GOOGLENAMES(name,assumedGender) values ('Bill','M')
insert into GOOGLENAMES(name,assumedGender) values ('Jamie','M')
insert into GOOGLENAMES(name,assumedGender) values ('Jamie','F')

--eliminate non-deterministic names as an example:
select * from GOOGLENAMES where name in (select name from googlenames  group by name having count(name)=1)

--do an update.
update sometable set gender = assumedGender
from GOOGLENAMES
where sometable.firstname = GOOGLENAMES.name
and name in (select name from googlenames  group by name having count(name)=1

After that, it is up to you to do something with the exceptions, whether it is to assume all exceptions are male or female, or to leave blank, or to be reviewed and edited for probabilities...Kim is probably 95%female for example, and you could make assumptionsed like that.



Lowell

--There is no spoon, and there's no default ORDER BY in sql server either.
Actually, Common Sense is so rare, it should be considered a Superpower. --my son
Post #215820
Posted Wednesday, August 31, 2005 11:56 AM


SSC Veteran

SSC VeteranSSC VeteranSSC VeteranSSC VeteranSSC VeteranSSC VeteranSSC VeteranSSC Veteran

Group: General Forum Members
Last Login: Wednesday, February 20, 2013 8:55 AM
Points: 283, Visits: 9

Well, gee, sure you could do that, Lowell, if you wanted full employment to be your career goal. 

In my personal experience, however, you can disclaim up front all you like that your results are only going to be but x% reliable, and the users will nod their heads in agreement.  But then, once you sign onto to the project, you spend the next 800 years in purgatory making constant adjustments because somebody further down the data stream, but up the food chain, doesn't like the counts and doesn't like your assumptions.

Trust me on this.  Occasionally, I do work for statisticians -- Ph.Ds, mind you -- who get bent out of shape when rolled-up counts deviate by about .5 per 10,000 parts.  And this is *after* we've *already* explained that the numbers will be approximate.

The real problem is even worse that what you've already laid out.  Take age brackets, for example.  The name 'Lindsay' may 80% female and 20% male -- but if the subject is forty years or older, it probably flips.

Some worms are best left in the can.

I should add, however, I like your approach, Lowell! 




Post #215828
Posted Wednesday, August 31, 2005 12:48 PM


Ten Centuries

Ten CenturiesTen CenturiesTen CenturiesTen CenturiesTen CenturiesTen CenturiesTen CenturiesTen Centuries

Group: General Forum Members
Last Login: Monday, January 25, 2010 5:28 PM
Points: 1,132, Visits: 42

I agree with Lee.  This will cause you a great deal of difficulty down the line. 

I also agree that I liked Lowell's approach... 






I wasn't born stupid - I had to study.
Post #215855
Posted Wednesday, August 31, 2005 12:58 PM


SSC-Insane

SSC-InsaneSSC-InsaneSSC-InsaneSSC-InsaneSSC-InsaneSSC-InsaneSSC-InsaneSSC-InsaneSSC-InsaneSSC-InsaneSSC-Insane

Group: General Forum Members
Last Login: Today @ 4:09 AM
Points: 21,357, Visits: 9,539
I don't know why I'd do that... almost all the data would have to be manually recheck for validity... why not go down that road instead??
Post #215860
Posted Wednesday, August 31, 2005 1:32 PM
Hall of Fame

Hall of FameHall of FameHall of FameHall of FameHall of FameHall of FameHall of FameHall of FameHall of Fame

Group: General Forum Members
Last Login: Tuesday, May 21, 2013 1:48 PM
Points: 3,223, Visits: 414

Or what happens if a male named KIM gets an official document that lists him as a Female? Lawsuit maybe? It is possible.

A lot depends on what this data will be used for.

Remember the Johnny Cash song "A Boy Named Sue?"

-SQLBill




Post #215875
Posted Wednesday, August 31, 2005 1:35 PM


SSC Veteran

SSC VeteranSSC VeteranSSC VeteranSSC VeteranSSC VeteranSSC VeteranSSC VeteranSSC Veteran

Group: General Forum Members
Last Login: Wednesday, February 20, 2013 8:55 AM
Points: 283, Visits: 9

> Remember the Johnny Cash song "A Boy Named Sue?"

How do you do?




Post #215876
Posted Wednesday, August 31, 2005 1:42 PM
SSC Journeyman

SSC JourneymanSSC JourneymanSSC JourneymanSSC JourneymanSSC JourneymanSSC JourneymanSSC JourneymanSSC Journeyman

Group: General Forum Members
Last Login: Thursday, May 05, 2011 1:37 PM
Points: 86, Visits: 75

There are lots of products /services on the market that offer genderization tools. Companies offering NCOA list services will typically genderize your tables for a certain $$$ amount per thousand with a minimum order or ### records. Typically these companies don't like to do anything for less than $500, but they probably will charge around $3/1000 records with that $500 figure as a price floor.

Moral of the story - go google "NCOA list services"

Post #215883
« Prev Topic | Next Topic »

Add to briefcase 123»»»

Permissions Expand / Collapse