Trump And Trudeau Twitter Analysis During COVID-19 Crisis Part 1

Word Cloud Trump

During the COVID-19 pandemic, people take their worries, concerns, frustration, and loves to social media to share with the rest of the world about their feelings and thoughts. Twitter has become one of official channels where world leaders communicate with their supporters and followers. To understand what keep them busy, we extract tweets of two world leaders, Donald Trump (the President of United States) and Justin Trudeau (the Prime Minister of Canada). By applying natural language processing techniques and Latent Dirichlet Allocation (LDA) algorithm, topics of their tweets can be learned. So we can see what is on their mind during the crisis.

We use Python 3.6 and the following packages:

  • TwitterScraper, a Python script to scrape for tweets
  • NLTK (Natural Language Toolkit), a NLP package for text processing, e.g. stop words, punctuation, tokenization, lemmatization, etc.
  • Gensim, “generate similar”, a popular NLP package for topic modeling
  • Latent Dirichlet Allocation (LDA), a generative, probabilistic model for topic clustering/modeling
  • pyLDAvis, an interactive LDA visualization package, designed to help interpret topics in a topic model that is trained on a corpus of text data

Data Gathering

We use TwitterScraper to scrape tweets from Twitter handle @realDonaldTrump and @JustineTrudeau. Only original tweets that are posted from March 1 to April 27, 2020 are collected, no retweet of others. It is English only.

Number of tweets by Week Day and Hour

Number of Tweets by Hour Trump
Number of Tweets by Hour Trump
Number of Tweets by Hour Trudeau
Number of Tweets by Hour Trudeau

It seems Trump likes to tweet from 1 to 4 pm, while Trudeau likes to tweet around 3 pm.

Number of Tweets by Week Day Trump
Number of Tweets by Week Day Trump
Number of Tweets by Week Day Trudeau
Number of Tweets by Week Day Trudeau

Both Trump and Trudeau tweet regularly during the week. It seems Trump likes to tweet even more on Sundays!

Tweet Length

From March 1 to April 27, 2020, Trump made 673 tweets, with an average of 27 words in a tweet, and Trudeau made 386 tweets, with an average of 41 words in a tweet. Trump had many short tweets (less than 10 words) and some lengthy tweets (over 40 words). Trudeau had most tweets with 40 to 50 words.

Number of Tweets Histogram Trump
Number of Tweets Histogram Trump
Number of Tweets Histogram Trudeau
Number of Tweets Histogram Trudeau

Data Pre-processing

Text pre-processing is needed for transferring text from human language to machine-readable format for further processing. The following pre-processing steps are applied to our Twitter texts.

  1. Convert all words to lowercase
  2. Remove non-alphabet characters
  3. Remove short word (length less than 3)
  4. Tokenization: breaking sentences into words
  5. Part-of-speech (POS) tagging: process of classifying words into their grammatical category, in order to understand their roles in a sentence, e.g. verbs, nouns, adjectives, etc. POS tagging provides grammar context for lemmatization.
  6. Lemmatization: converting a word to its base form e.g. car, cars, car’s to car
  7. Remove common English words e.g. a, the, of, etc., and remove common words that add very little value to our analysis, e.g. com, twitter, pic, etc.

We extract both unigrams and bigrams (pairs of consecutive words ) from the texts. After pre-processing, our tweets look like this:

texttokenbigram_token
WOW! Thank you, just landed, see everyone soon! #KAG2020pic.twitter.com/QGdfIsOp4u[wow, thank, land, see, everyone, soon, kag, qgdfisop][wow thank, thank land, land see, see everyone, everyone soon, soon kag, kag qgdfisop]
Departing for the Great State of North Carolina!pic.twitter.com/BjnyTnnHUt[depart, great, state, north, carolina, bjnytnnhut][depart great, great state, state north, north carolina, carolina bjnytnnhut]
They are staging a coup against Bernie![stag, coup, bernie][stag coup, coup bernie]

Word Count and Word Cloud

We use bigrams for our word count and word cloud as bigrams provide more meaningful insights than single word.

Word Count Trump
Word Count Trump
Word Count Trudeau
Word Count Trudeau

Top 5 mostly common words in Trump’s tweets are: fake news, white house, united state, news conference, mini mike.

Top 5 mostly common words in Trudeau’s tweets are: make sure, across country, keep safe, canada emergency, health care.

Here is the word cloud of Trump’s tweets:

Word Cloud Trump
Word Cloud Trump

Here is the word cloud of Trudeau’s tweets:

Word Cloud Trudeau
Word Cloud Trudeau

In the next post, we will show how to generate meaningful topics of the tweets by applying LDA algorithm.

Happy Machine Learning!