Auto Generated Insights of 2019 HR Tech Conference Twitter – Part 2 (Topic Modeling)

Peng WangOctober 21, 2019October 21, 2019

In our last post, we extract #HRTechConf tweets, clean up the texts, and generate a word cloud that highlights some of the buzzwords from the conference. But, what are the tweets talking about? Without reviewing each of the 7,000 tweets, how could we find out the popular topics? Let's explore and see if tweet topics could be auto detected by developing a Latent Dirichlet Allocation (LDA) model.

Feature Extraction

Tweets or any text must be converted to a vector of numbers - the dictionary that describes the occurrence of words in the text (or corpus). The technique we use is called Bag of Words, a simple method of extracting text features. Here are the steps.

Peng WangOctober 14, 2019October 21, 2019

HR Technology Conference and Expo, world's leading and largest conference for HR and IT professionals, just took place in Las Vegas, from Oct 1 - 4, 2019. An incredibly amount of HR technology topics were covered at the conference. Unfortunately not everyone could be there, including myself. Is it possible to tell what the buzzwords and topics are without being there? The answer is YES! I dig into Twitter for some quick insights.

I scrape tweets with #HRTechConf, and build Latent Dirichlet Allocation (LDA) model for auto detecting and interpreting topics in the tweets. Here is my pipeline:

Data gathering - twitter scrape
Data pre-processing
Generating word cloud
Train LDA model
Visualizing topics

I use Python 3.6 and the following packages:

TwitterScraper, a Python script to scrape for tweets
NLTK (Natural Language Toolkit), a NLP package for text processing, e.g. stop words, punctuation, tokenization, lemmatization, etc.
Gensim, "generate similar", a popular NLP package for topic modeling
Latent Dirichlet Allocation (LDA), a generative, probabilistic model for topic clustering/modeling
pyLDAvis, an interactive LDA visualization package, designed to help interpret topics in a topic model that is trained on a corpus of text data

AI Journey

Month: October 2019

Auto Generated Insights of 2019 HR Tech Conference Twitter – Part 2 (Topic Modeling)

Feature Extraction

Auto Generated Insights of 2019 HR Tech Conference Twitter – Part 1 (Word Cloud)