|
|
|
Valued Member
      
Group: General Forum Members
Last Login: Wednesday, October 05, 2005 8:50 AM
Points: 64,
Visits: 1
|
|
Does anybody know of any logic or algorithm that will determine the gender of a person based on the FirstName column. Any tools out there that will let us determine that? Any lead will be appreciated. Thanks
|
|
|
|
|
SSCommitted
      
Group: General Forum Members
Last Login: Yesterday @ 6:55 AM
Points: 1,559,
Visits: 657
|
|
Easy, it can't be done. Boy or girl? Kim, Robin, Stac(e)y, Hilary.. I'm sure there are lots of more gender-generic names out there... /Kenneth
|
|
|
|
|
SSC Veteran
      
Group: General Forum Members
Last Login: Wednesday, February 20, 2013 8:55 AM
Points: 283,
Visits: 9
|
|
... and Leslie, and Jamie, and Evelyn, and even my own Lee.
|
|
|
|
|
SSChampion
        
Group: General Forum Members
Last Login: Today @ 9:35 AM
Points: 11,645,
Visits: 27,747
|
|
aww come on guys; don't let the 5% of exceptions prevent you from doing the other 95% of the work. it'd be some work gathering the data, but a simple google of "boys names" and "girls names" could get you some raw data. first link i found was 11000 boys names, but i don't know how easy it'll be to extract them. It's a pain to spider a bunch of pages and rip out the data you want, but it's certainly doable. stick them in a table with a genderflag related to the source you found it. CREATE TABLE GOOGLENAMES(name varchar(20), assumedGender char(1)) insert into GOOGLENAMES(name,assumedGender) values ('Bill','M') insert into GOOGLENAMES(name,assumedGender) values ('Jamie','M') insert into GOOGLENAMES(name,assumedGender) values ('Jamie','F') --eliminate non-deterministic names as an example: select * from GOOGLENAMES where name in (select name from googlenames group by name having count(name)=1) --do an update. update sometable set gender = assumedGender from GOOGLENAMES where sometable.firstname = GOOGLENAMES.name and name in (select name from googlenames group by name having count(name)=1 After that, it is up to you to do something with the exceptions, whether it is to assume all exceptions are male or female, or to leave blank, or to be reviewed and edited for probabilities...Kim is probably 95%female for example, and you could make assumptionsed like that.
Lowell
--There is no spoon, and there's no default ORDER BY in sql server either. Actually, Common Sense is so rare, it should be considered a Superpower. --my son
|
|
|
|
|
SSC Veteran
      
Group: General Forum Members
Last Login: Wednesday, February 20, 2013 8:55 AM
Points: 283,
Visits: 9
|
|
Well, gee, sure you could do that, Lowell, if you wanted full employment to be your career goal.  In my personal experience, however, you can disclaim up front all you like that your results are only going to be but x% reliable, and the users will nod their heads in agreement. But then, once you sign onto to the project, you spend the next 800 years in purgatory making constant adjustments because somebody further down the data stream, but up the food chain, doesn't like the counts and doesn't like your assumptions. Trust me on this. Occasionally, I do work for statisticians -- Ph.Ds, mind you -- who get bent out of shape when rolled-up counts deviate by about .5 per 10,000 parts. And this is *after* we've *already* explained that the numbers will be approximate. The real problem is even worse that what you've already laid out. Take age brackets, for example. The name 'Lindsay' may 80% female and 20% male -- but if the subject is forty years or older, it probably flips. Some worms are best left in the can. I should add, however, I like your approach, Lowell! 
|
|
|
|
|
Ten Centuries
      
Group: General Forum Members
Last Login: Monday, January 25, 2010 5:28 PM
Points: 1,132,
Visits: 42
|
|
I agree with Lee. This will cause you a great deal of difficulty down the line.  I also agree that I liked Lowell's approach... 
I wasn't born stupid - I had to study.
|
|
|
|
|
SSC-Insane
         
Group: General Forum Members
Last Login: Today @ 4:09 AM
Points: 21,357,
Visits: 9,539
|
|
| I don't know why I'd do that... almost all the data would have to be manually recheck for validity... why not go down that road instead??
|
|
|
|
|
Hall of Fame
       
Group: General Forum Members
Last Login: Tuesday, May 21, 2013 1:48 PM
Points: 3,223,
Visits: 414
|
|
Or what happens if a male named KIM gets an official document that lists him as a Female? Lawsuit maybe? It is possible. A lot depends on what this data will be used for. Remember the Johnny Cash song "A Boy Named Sue?" -SQLBill
|
|
|
|
|
SSC Veteran
      
Group: General Forum Members
Last Login: Wednesday, February 20, 2013 8:55 AM
Points: 283,
Visits: 9
|
|
> Remember the Johnny Cash song "A Boy Named Sue?" How do you do?
|
|
|
|
|
SSC Journeyman
      
Group: General Forum Members
Last Login: Thursday, May 05, 2011 1:37 PM
Points: 86,
Visits: 75
|
|
There are lots of products /services on the market that offer genderization tools. Companies offering NCOA list services will typically genderize your tables for a certain $$$ amount per thousand with a minimum order or ### records. Typically these companies don't like to do anything for less than $500, but they probably will charge around $3/1000 records with that $500 figure as a price floor. Moral of the story - go google "NCOA list services" 
|
|
|
|