What are the most demanded skills for data scientists? Python, R, SQL, and the list goes on and on. There are many surveys and reports that show some good statistics on popular data skills. In this post, I am going to gather first-hand information by scraping data science jobs from indeed.ca, analyze top skills required by employers, and make job recommendations by matching skills from resume to posted jobs. It will be fun!
Quick summary of the project workflow:
Python code can be found on my GitHub.
This post is part of a series of people analytics experiments I am putting together:
- Job skill match (Recruitment )
- Employee attrition prediction (Employee Management)
- Pay gap by gender, ethnicity, profession (Employee Compensation) FUTURE WORK
- Organizational network analysis (ONA) FUTURE WORK
Web Scraping
I like scraping data from Internet because it is free! But you should not abuse it just because it is free. There are basic rules to follow so we do not get our cyber friends intimidated and reject ourselves from visiting their websites. All can be summed in two words: BE NICE. Read their terms and conditions on the website first, and space out your requests so that their website does not get hit too hard. Here is a very good web scraping 101.
I scraped data science jobs from indeed.ca, the largest global recruiting website. I gathered data scientist/engineer/analyst jobs posted in the last 30 days (09/19/2018 – 10/19/2018), in 6 major Canadian cities, i.e. Toronto, Montreal, Vancouver, Ottawa, Calgary, and Edmonton. I used Selenium Webdriver to automate web scraping and saved results in a local JSON file.
In total 367 job postings were retrieved, and it took about 20 minutes.
Job Description Keyword Extraction
For each job, I tokenized its job description, cleaned up the list by removing words defined in the NLTK list of stopwords, and finally filtered on a list of popular data science related skill words.
Bar chart below shows top 30 data skills required by most employers . More than 60% jobs requires SQL, 48% requires Python, and 32% requires R. No surprise. It is worth noting that Excel is actually quite a common requirement, which I would not normally consider as a serious data skill. Agile is also among the top required skills. AWS knowledge is more demanded than Azure or GCP.
Almost 30% jobs need a bachelor or above degree, while 15% asks for Master and 9% requires PhD. It seems education requirement on becoming a data scientist is still quite high, comparing to other technical jobs.
Of all 6 big Canadian cities, Toronto accounts more than half of all posted data science jobs, simply because of its large population and tech activities. Vancouver is third and makes 15%. Not bad.
A word cloud of data skills
Resume Keyword Extraction
Keywords from a sample resume of mine were extracted, using the PyPDF2 python tool. Here are data skills that are extracted from my resume:
[‘python’, ‘java’, ‘excel’, ‘scikit’, ‘power’, ‘tensorflow’, ‘keras’,
‘sql’, ‘aws’, ‘azure’]
Job Match Recommendation
To recommend a job, we calculate similarity of skill keywords between sample resume and job description. I used the Jaccard similarity (i.e. intersection over union of two sets) for this task. Basically, the more matched keywords and fewer unmatched keywords, the higher the calculated score (between 0 and 1).
If I ran calculation and matching for jobs in all 6 cities, here are my top 5 job matches:
similarity | company | location | title |
0.4286 | MasterCard | Vancouver,BC | Data Scientist – NuData Security |
0.3846 | Nutrien | Calgary,AB | Co-op Student, IT Data Services (Data Scientist) |
0.3636 | Mobify | Vancouver,BC | Senior Data Engineer |
0.3636 | Tundra Technical Solutions | Vancouver,BC | BI Analyst |
0.3333 | G2 PLACEMENTS TI | Montréal,QC | SCIENTIFIQUE DE DONNÉES PRINCIPAL |
Here are the key skills required by the top match:
[‘agile’, ‘mxnet’, ‘google’, ‘scala’, ‘keras’, ‘tensorflow’, ‘python’, ‘java’, ‘power’, ‘aws’]
If job search is only desired in one city, city name should be passed as a parameter when run the python code.
Future Work
- Extend the work and build a web service interface
- Develop data skill vocabulary by exploring job descriptions instead of a pre-defined list of words.
That’s it! It is certainly not a sophisticated job recommendation engine and has a lot to improve. But I hope it shows how web scraping can be done and data similarity can be computed by keyword match.
Again, Python code can be found on my GitHub.
Happy Machine Learning!