Outlier detection is is an interesting application of machine learning. The goal is to identify those useful data records that can accurately profile abnormal behavior of the system. However, in real life examples, such special data like fraud and spam takes very small percentage of overall data population, which imposes challenges for developing machine learning models.
In this experiment, we will examine Kaggle’s Credit Card Fraud Detection dataset and develop predictive models to detect fraud transactions which accounts for only 0.172% of all transactions. To deal with the unbalanced dateset issue, we will first balance the classes of our training data by a resampling technique (SMOTE), and then build a Logistic Regression model by optimizing the average precision score.
We will build and train our model on Google Colab, a free Jupyter notebook environment that runs on Google cloud and gives free GPU! For more information on Colab, checkĀ Colab official page.
Continue reading “Credit Card Fraud Detection Using SMOTE Technique”