HR Technology Conference and Expo, world's leading and largest conference for HR and IT professionals, just took place in Las Vegas, from Oct 1 - 4, 2019. An incredibly amount of HR technology topics were covered at the conference. Unfortunately not everyone could be there, including myself. Is it possible to tell what the buzzwords and topics are without being there? The answer is YES! I dig into Twitter for some quick insights.
I scrape tweets with #HRTechConf, and build Latent Dirichlet Allocation (LDA) model for auto detecting and interpreting topics in the tweets. Here is my pipeline:
- Data gathering - twitter scrape
- Data pre-processing
- Generating word cloud
- Train LDA model
- Visualizing topics
I use Python 3.6 and the following packages:
- TwitterScraper, a Python script to scrape for tweets
- NLTK (Natural Language Toolkit), a NLP package for text processing, e.g. stop words, punctuation, tokenization, lemmatization, etc.
- Gensim, "generate similar", a popular NLP package for topic modeling
- Latent Dirichlet Allocation (LDA), a generative, probabilistic model for topic clustering/modeling
- pyLDAvis, an interactive LDA visualization package, designed to help interpret topics in a topic model that is trained on a corpus of text data