Mining for Experts

  • Comments posted to this topic are about the item Mining for Experts

  • I already know of certain jobs where the decision is made on your Uni scores; no interview, no CV, highest scorer gets the job. I wonder if this kind of thing would be used more to filter during a recruitment process.

    Shame to remove actual ability and personality from the mix.

  • As mining for information, as opposed to data, becomes more sophisticated, we will lose the human touch. This will prompt job seekers to put fake "work" on their devices to make it look like they are knowledgable about a technology. How will that be screened?

    I would not like to see this get to the point where an organization is looking for a DBA, scans SQLServerCentral.com, and picks those with a certain points range or a % correct on the QODs.

    I'm reminded of a saying that an early mentor of mine was fond of. "The more we automate, the less we know."

  • OK, I realized I accidentally posted this entirely in the wrong place, sorry!

  • If someone were to mine SSC for this kind of thing, I'm going to guess that would determine that a fair number of people are experts on Star Wars. Since the most important thread on the site (judging by volume of posts) has that as a recurring theme, I just can't see it not being given a huge amount of importance by any automated algorithm.

    I'm, as usual, skeptical of any attempt to replicate real judgement of human intelligence and skill by any sort of automatic, standardized system. IQ tests, SAT scores, aptitude tests, and now data mining, all look good on paper and in theory.

    The day I see such a system identify an actual expert that was otherwise overlooked, I'll be impressed. Till then, I bet it'll be, "Hey, the computer says that Jeff Moden is an expert on T-SQL! Who knew?"

    - Gus "GSquared", RSVP, OODA, MAP, NMVP, FAQ, SAT, SQL, DNA, RNA, UOI, IOU, AM, PM, AD, BC, BCE, USA, UN, CF, ROFL, LOL, ETC
    Property of The Thread

    "Nobody knows the age of the human race, but everyone agrees it's old enough to know better." - Anon

  • I think this system is more a way for new people, or widely distributed, but not well known (to each other) people, to find experts.

    It doesn't address the load issues, or the funneling of questions to experts, but it has potential. Think about universities, linked perhaps to help students find mentors or seniors that are working on something similar?

  • GSquared (5/4/2009)


    If someone were to mine SSC for this kind of thing, I'm going to guess that would determine that a fair number of people are experts on Star Wars. Since the most important thread on the site (judging by volume of posts) has that as a recurring theme, I just can't see it not being given a huge amount of importance by any automated algorithm.

    You'd be surprised. Despite all the efforts in that thread, it mostly is noise compared to other posts by those people.

  • Steve Jones - Editor (5/4/2009)


    GSquared (5/4/2009)


    If someone were to mine SSC for this kind of thing, I'm going to guess that would determine that a fair number of people are experts on Star Wars. Since the most important thread on the site (judging by volume of posts) has that as a recurring theme, I just can't see it not being given a huge amount of importance by any automated algorithm.

    You'd be surprised. Despite all the efforts in that thread, it mostly is noise compared to other posts by those people.

    I'm just being cynical.

    Data mining can and has found useful information that people didn't already know, but it's been on subjects that were specifically susceptible to mathematical analysis, like sales volumes and marketing sequences.

    Human beings can't even agree on what makes an expert, in many cases. We've had disagreements over what makes an MVP relatively recently, for example. If humans can't translate something into a coherent specification, computers can't do their "magic" on it with any degree of reliability.

    Here's an example that might help illustrate my disagreement with a system like this and the assumptions that it inherits from its designers:

    Person A, let's call him Joe, has aspirations of one day being an expert on T-SQL. He isn't yet, but he's got big plans. In order to one day accomplish this, he saves hundreds of articles on the subject on his local hard drive, with the plan that "one day, I'll read all of those, just as soon as I have time for it". He also spends time on sites like MSDN and SSC asking a LOT of questions, many of which he doesn't ask clearly, so they require follow-up posts to clarify what he's asking for, and he always includes a lot of social-fluff posts, like "thank you for answering", which are appreciated and are nice, but don't have technical content. However, his signature has keywords in it that the algorithm is looking for when it checks for SQL expertise. The algorithm, innocent of the fact that he hasn't read a single one of those hundreds of SQL articles on his hard drive, finds them there, and concludes, "this guy must be an expert", then it finds that he has hundreds, maybe even thousands, of posts on an SQL-related message board, and concludes, "he fits the profile of being easy to communicate with". He's going to get a very high score.

    Person B, Sally, doesn't store articles on her computer. She has several bookshelves of SQL related material, well studied, with post-it notes on the pages she refers to often. She also has a few books she co-authored on the subject, which are well-written and technically superb. She's regularly consulted by technical authors on a number of arcane subjects, and is always happy to help them out. Since her "knowledge" isn't stored electronically, she's going to score poorly on the "expertise" measure, and even if she scores well on the communication measure, she's not not going to get a high overall score.

    Thus, the algorithm is going to tell people to talk to Joe and is going to tell them to avoid Sally, or at best to pick Joe over Sally.

    Until the algorithm can do something on the order of judging the technical merit of the posts, it's going to have false positives and false negatives on a routine basis.

    Now, let's add in things like someone who stores fiction on their hard drive. Can the computer tell that all those files that reference "organic chemistry", rather than indicating an expertise in carbon-molecule behavior, indicate a fascination with books published by Harlequin? Maybe it can, but can you be sure of that?

    As far as connecting people who don't actually know each other, will it do a better job of that than something like LinkedIn, which allows actual human judgement?

    I'm a cynic when it comes to people claiming they can replace human judgement in things which aren't inherently mathematical in nature. Every attempt at it to date has, to my knowledge, failed miserably.

    I'm all for people continuing to try. Advancement is always in the direction of, "says who this is impossible?" No human innovation would ever occur if someone didn't first say, "I know fire can kill, but maybe it's got a good use too?", and then continue on despite a permanent lack of eyebrows after the first try or two.

    At the same time, it takes a bit of "hey, calm down, I know you think you just discovered a way to make an engine run on water, but let's be a little more thorough in our testing before we invest our life savings into it".

    If their product can prove that it can locate unknown expertise, or uncover false expertise that everyone thought was real, then it's worth it. Till it's done one of those two at least once, it's just vaporware. Till it can do that more effectively than something like LinkedIn, it's not cost effective, even if it's accurate. I need to see that it does both before I get excited about it at all.

    That's all I'm saying. (And, as usual, "all I'm saying" takes FAR more reading than I should expect anyone to bear with.)

    - Gus "GSquared", RSVP, OODA, MAP, NMVP, FAQ, SAT, SQL, DNA, RNA, UOI, IOU, AM, PM, AD, BC, BCE, USA, UN, CF, ROFL, LOL, ETC
    Property of The Thread

    "Nobody knows the age of the human race, but everyone agrees it's old enough to know better." - Anon

  • On a related note, I saw some interesting articles this weekend about "Wolfram Alpha" which calls itself a "computational knowledge engine" - perhaps not exactly a direct competitor to Google, but an interesting alternative for certain kinds of queries. Apparently it is pretty amazing in what it can do and may turn out to be "the next big thing" when introduced sometime later this month.

    To find out more about it, you can Google it. 😀

  • That reminds me of looking for key words on resumes or web crawlers gathering key words for search engines. They can be fooled, too. The theory sounds great by not practical, YET.

    Okay, I'll create a program to create a thousand text files with the word "sex" in each all with different file dates. What does that say I'm an expert in? Or, let's say I answer a thousand forum questions with "I don't know", that would lead to believe an outsider that so many questions are answered on some forum.

    I've had a post on somewebsite.com, was answered incorrectly on the first reply. So everyone ignored it, thinking it's been answered.

    Automated results can be fooled and results can be skewed and misleading. I usually trust computers more then people, they will reliably give you the same wrong answers.

    Maybe their webcam version of the program still looks for people wearing suit-n-ties not t-shirts like Steve Jones, for visually declaring expertise.

  • Heh... I learned a great East Indian word for the techniques used to find experts... ready? "Booils**t".

    I'm currently 4 for 4... That is, I've worked with 4 certified DBA's (2 very recently) that scored high on their exams and have all sorts of goodies on their harddrives... and still they can't do the job. Then you have people that absolutely suck when it comes to taking tests that'll eat those 4 DBA's for lunch and pick their teeth with their bones if you set them down in front of a real problem.

    There's only 1 way to mine for experts and the same holds true when interviewing them... talk with them. There is no quick way that is also an accurate way. I hired a kid fresh out of the military with no experience and a degree that had nothing to do with programming... she had zero professional experience with programming. But, she did it for a hobby. She didn't have the syntax memorized and almost always had to refer to the book (at first) but, man, could she write some great code.

    I ran into something similar at one of my old companies... we had a tech that just wasn't into it... spent most of his time drawing buildings, cars, and landscapes instead of studying tech stuff... and his drawings were incredible. I took him to my friend manager in the drafting department (which also did a lot of freehand work for Marketing) and they were both elated.

    If you want to mine for experts and you want to do it right, stop trying to do it by taking DNA samples... Stand up from the computer, walk around, and talk with people.

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

    Change is inevitable... Change for the better is not.


    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)

Viewing 11 posts - 1 through 10 (of 10 total)

You must be logged in to reply to this topic. Login to reply