October 2019 – AI Journey

Feature Extraction

Tweets or any text must be converted to a vector of numbers – the dictionary that describes the occurrence of words in the text (or corpus). The technique we use is called Bag of Words, a simple method of extracting text features. Here are the steps.

HR Technology Conference and Expo, world’s leading and largest conference for HR and IT professionals, just took place in Las Vegas, from Oct 1 – 4, 2019. An incredibly amount of HR technology topics were covered at the conference. Unfortunately not everyone could be there, including myself. Is it possible to tell what the buzzwords and topics are without being there? The answer is YES! I dig into Twitter for some quick insights.

I scrape tweets with #HRTechConf, and build Latent Dirichlet Allocation (LDA) model for auto detecting and interpreting topics in the tweets. Here is my pipeline:

Data gathering – twitter scrape
Data pre-processing
Generating word cloud
Train LDA model
Visualizing topics

I use Python 3.6 and the following packages:

TwitterScraper, a Python script to scrape for tweets
NLTK (Natural Language Toolkit), a NLP package for text processing, e.g. stop words, punctuation, tokenization, lemmatization, etc.
Gensim, “generate similar”, a popular NLP package for topic modeling
Latent Dirichlet Allocation (LDA), a generative, probabilistic model for topic clustering/modeling
pyLDAvis, an interactive LDA visualization package, designed to help interpret topics in a topic model that is trained on a corpus of text data

Month: October 2019

Auto Generated Insights of 2019 HR Tech Conference Twitter – Part 2 (Topic Modeling)

Feature Extraction

Auto Generated Insights of 2019 HR Tech Conference Twitter – Part 1 (Word Cloud)