economist.com

Simple Skill-based Job Recommendation Engine

What are the most demanded skills for data scientists? Python, R, SQL, and the list goes on and on. There are many surveys and reports that show some good statistics on popular data skills. In this post, I am going to gather first-hand information by scraping data science jobs from indeed.ca, analyze top skills required by employers, and make job recommendations by matching skills from resume to posted jobs. It will be fun!

Quick summary of the project workflow:

Workflow
Workflow

Python code can be found on my GitHub.

This post is part of a series of people analytics experiments I am putting together:

Web Scraping

I like scraping data from Internet because it is free! But you should not abuse it just because it is free. There are basic rules to follow so we do not get our cyber friends intimidated and reject ourselves from visiting their websites. All can be summed in two words: BE NICE. Read their terms and conditions on the website first, and space out your requests so that their website does not get hit too hard. Here is a very good web scraping 101.

I scraped data science jobs from indeed.ca, the largest global recruiting website. I gathered data scientist/engineer/analyst jobs posted in the last 30 days (09/19/2018 – 10/19/2018), in 6 major Canadian cities, i.e. Toronto, Montreal, Vancouver, Ottawa, Calgary, and Edmonton. I used Selenium Webdriver to automate web scraping and saved results in a local JSON file.

In total 367 job postings were retrieved, and it took about 20 minutes.

Job Description Keyword Extraction

For each job, I tokenized its job description, cleaned up the list by removing words defined in the NLTK list of stopwords, and finally filtered on a list of popular data science related skill words.

Bar chart below shows top 30 data skills required by most employers . More than 60% jobs requires SQL, 48% requires Python, and 32% requires R. No surprise. It is worth noting that Excel is actually quite a common requirement, which I would not normally consider as a serious data skill. Agile is also among the top required skills. AWS knowledge is more demanded than Azure or GCP.

Data skills required
Data skills required

Almost 30% jobs need a bachelor or above degree, while 15% asks for Master and 9% requires PhD. It seems education requirement on becoming a data scientist is still quite high, comparing to other technical jobs.

Education required
Education required

Of all 6 big Canadian cities, Toronto accounts more than half of all posted data science jobs, simply because of its large population and tech activities. Vancouver is third and makes 15%. Not bad.

Job distribution by city
Job distribution by city

A word cloud of data skills

Data skill word cloud
Data skill word cloud

Resume Keyword Extraction

Keywords from a sample resume of mine were extracted, using the PyPDF2 python tool. Here are data skills that are extracted from my resume:

[‘python’, ‘java’, ‘excel’, ‘scikit’, ‘power’, ‘tensorflow’, ‘keras’,
‘sql’, ‘aws’, ‘azure’]

Job Match Recommendation

To recommend a job, we calculate similarity of skill keywords between sample resume and job description. I used the Jaccard similarity (i.e. intersection over union of two sets) for this task. Basically, the more matched keywords and fewer unmatched keywords, the higher the calculated score (between 0 and 1).

If I ran calculation and matching for jobs in all 6 cities, here are my top 5 job matches:

similarity company location title
0.4286 MasterCard Vancouver,BC Data Scientist – NuData Security
0.3846 Nutrien Calgary,AB Co-op Student, IT Data Services (Data Scientist)
0.3636 Mobify Vancouver,BC Senior Data Engineer
0.3636 Tundra Technical Solutions Vancouver,BC BI Analyst
0.3333 G2 PLACEMENTS TI Montréal,QC SCIENTIFIQUE DE DONNÉES PRINCIPAL

Here are the key skills required by the top match:

[‘agile’, ‘mxnet’, ‘google’, ‘scala’, ‘keras’, ‘tensorflow’, ‘python’, ‘java’, ‘power’, ‘aws’]

If job search is only desired in one city, city name should be passed as a parameter when run the python code.

Future Work

  • Extend the work and build a web service interface
  • Develop data skill vocabulary by exploring job descriptions instead of a pre-defined list of words.

That’s it! It is certainly not a sophisticated job recommendation engine and has a lot to improve. But I hope it shows how web scraping can be done and data similarity can be computed by keyword match.

Again, Python code can be found on my GitHub.

Happy Machine Learning!