Data mining model which assigns the job post to a cluster based on job description

  • I am building a project to study MS BI, and would like some input.

    My goal is to create an data mining model which assigns the job post to a cluster based on type of job.

    What follows is what I have done so far.

    I would appreciate any input you have, from descriptive style to process logic as I am working without significant DM experience, Relational DB management experience, or a mentor of any kind.

    Steps to this point...

    Build SSIS packages which

  • downloads job posts (company, city, jobtitle, description snippet,post date, etc)
  • downloads the origional job posting (detail of job description)
  • creates a dictionary of terms (from jobtitle, snippet and detail)
  • calculates term vectors
  • The data looks like this

  • Posts .. [jobkey,jobtitle,company,source,date,snippet,..]
  • Vectors .. [jobkey, term]
  • there are 11,761 job postings
  • there are 43,000 unique terms in dictionary & termVectors
  • 40% of the job descriptions have less than 40 terms indicating ...the job description download failed
  • Create Structure based on

  • Posts JOIN TermVectors ON Posts.jobkey = TermVectors.jobkey
  • Create mining model based on

  • Microsoft_clustering
  • set cluster count = 10
  • max_input_attributes = zero [/li]

  • default for other parameters[/li]

    Found a bunch of terms that were obviously not related to the job type so

  • junk terms >> dictionary_exclude (now 257 bogus terms like experience and Policy)
  • Found a bunch of terms with the same meaning so

  • SET term = 'analyst' WHERE term = 'analysis' etc...
  • SO now the top x terms for each cluster are beginning to show similaraties

    but there are lot of medical / nursing jobs in the database that do not appear in a cluster..

    I will try increasing the number of clusters, and report back.

    ?????

    Should my dictionary be much much smaller?

    Sould I pick examples from each category and search for similar posts (how would that be done)

    Here's hoping for a lively discussion!

    Rob

Viewing post 1 (of 1 total)

You must be logged in to reply to this topic. Login to reply