Trump And Trudeau Twitter Analysis During COVID-19 Crisis Part 2

In our last post, we extract @realDonalTrump and @JustinTrudeau tweets, clean up the texts, and generate word clouds. In this article, we will build a Latent Dirichlet Allocation (LDA) model to study the topics of the hundreds of tweets posted by the two world leaders.

Topic Modeling

Topic modeling is an unsupervised machine learning technique which is widely used for discovering abstract topics of a collection of documents. It considers each document to be represented by several topics and each topic to be represented by a set of words that frequently appear together. For example, with a cluster of cloud, rain, wind, we can tell that the associated topic likely related to weather.

Trump And Trudeau Twitter Analysis During COVID-19 Crisis Part 1

During the COVID-19 pandemic, people take their worries, concerns, frustration, and loves to social media to share with the rest of the world about their feelings and thoughts. Twitter has become one of official channels where world leaders communicate with their supporters and followers. To understand what keep them busy, we extract tweets of two world leaders, Donald Trump (the President of United States) and Justin Trudeau (the Prime Minister of Canada). By applying natural language processing techniques and Latent Dirichlet Allocation (LDA) algorithm, topics of their tweets can be learned. So we can see what is on their mind during the crisis.

We use Python 3.6 and the following packages:

  • TwitterScraper, a Python script to scrape for tweets
  • NLTK (Natural Language Toolkit), a NLP package for text processing, e.g. stop words, punctuation, tokenization, lemmatization, etc.
  • Gensim, “generate similar”, a popular NLP package for topic modeling
  • Latent Dirichlet Allocation (LDA), a generative, probabilistic model for topic clustering/modeling
  • pyLDAvis, an interactive LDA visualization package, designed to help interpret topics in a topic model that is trained on a corpus of text data