Simple Skill-based Job Recommendation Engine

What are the most demanded skills for data scientists? Python, R, SQL, and the list goes on and on. There are many surveys and reports that show some good statistics on popular data skills. In this post, I am going to gather first-hand information by scraping data science jobs from indeed.ca, analyze top skills required by employers, and make job recommendations by matching skills from resume to posted jobs. It will be fun!

Quick summary of the project workflow:

Workflow
Workflow

Continue reading “Simple Skill-based Job Recommendation Engine”

How many bikes to be shared in Vancouver NEXT WEEK – Part 2

This is Part 2 of building predictive models on Vancouver bike share. Part 1 is here.  Python code can be found on my GitHub.

Model Training

Training dataset contains hourly bike rentals for each day from 01/01/2017 to 07/24/2018.

Two decision tree models were trained: Random Forest (RF) and Gradient Boosted Trees (GBM). They are well known for delivering better performance and efficiency on noisy datasets. However, tuning hyperparameters can be some challenges so that they will not overfit.

Continue reading “How many bikes to be shared in Vancouver NEXT WEEK – Part 2”

How many bikes to be shared in Vancouver NEXT WEEK – Part 1

Despite of worldwide debates on bike sharing benefits and challenges, Vancouver launched its own bike sharing program in summer 2016, Mobi sponsored by Shaw Go. First bike share appeared in Amsterdam in 1960’s, and then was introduced to other big European cities. It has got popularized by the Chinese in the last decade – 13 out of 15 world biggest bike share programs are in China.

I like bike share program because it is simply convenient and helps to save environment. So I decided to look into Vancouver bike share historical data, and hoped to find some trends/patterns. Thanks to Mobi who made their bike usage data available, predictive models can be built to forecast future rides.

Quick summary of the project workflow:

Project workflow
Project workflow

Continue reading “How many bikes to be shared in Vancouver NEXT WEEK – Part 1”

How much time will it take to cross the border at Peace Arch?

Living in Vancouver, it is so convenient driving cross the border and have some fun on the other side. However, if you have headache of waiting in the long border-crossing lines and getting stuck for almost an hour, you are not alone. We all know the basic strategies on best/worst days/hours to cross, for example, avoid long weekend or Christmas week, arriving the border early, etc. A crystal ball that can tell us ahead of time on our wait time at the border crossing will be just fantastic! Well, I decided to give it a swing and make a crystal ball – to build a machine learning model.

Below is a quick summary of the workflow on this mini project.

Project workflow

Continue reading “How much time will it take to cross the border at Peace Arch?”

Why so many wildfires in BC lately

Downtown Vancouver before and after hazy smokes caused by massive wildfires

British Columbia wildfires are burning out of control! There were a number of air quality warnings issued cross BC in summer 2018. Smoky air flew all the way to Alberta and even cross the border to US. It made me curious what’s going on with BC wildfires and what the situation was used to be.

Wildwire datasets for previous years and current year (updated in May 2018) were gathered from DataBC, data published by BC Wildfire Service. For relevance and clarity, data prior to 1980 is ignored.

Continue reading “Why so many wildfires in BC lately”

Build CNN for facial expression recognition with TensorFlow Eager on Google Colab

Key learning elements:

» Run experiments in Google Colab and access files on Google Drive

» Build and evaluate a model using Tensorflow Eager mode

» Build a Convolutional Neural Network (CNN) to recognize 7 facial expressions

For this exercise we are going to build a CNN for facial expression recognition on fer2013 dataset, available on Kaggle. fer2013 is a publicly accessible, and it contains 35,887 grayscale, 48 x 48 sized face images with 7 emotional expressions: angry, disgust, fear, happy, sad, surprise, and neutral. It was originally published on International Conference on Machine Learning (ICML) 2013, Challenges in Representation Learning: A report on three machine learning contests, Ian Goodfellow et al., 2013

Continue reading “Build CNN for facial expression recognition with TensorFlow Eager on Google Colab”