In our last post, we extract @realDonalTrump and @JustinTrudeau tweets, clean up the texts, and generate word clouds. In this article, we will build a Latent Dirichlet Allocation (LDA) model to study the topics of the hundreds of tweets posted by the two world leaders.
Topic Modeling
Topic modeling is an unsupervised machine learning technique which is widely used for discovering abstract topics of a collection of documents. It considers each document to be represented by several topics and each topic to be represented by a set of words that frequently appear together. For example, with a cluster of cloud, rain, wind
, we can tell that the associated topic likely related to weather
.
For topic modeling, we use LDA algorithm with TF-IDF vectors created for each tweet using unigrams that are obtained from our pre-processing.
How do we know if the topics we learn can best represent the original texts? We calculate and measure the coherence score.
Topic coherence — meaning semantic coherence — is a human judged quality that depends on the semantics of the words. [Improving Topic Coherence with Regularized Topic Models]
Coherence score measures how interpretable a topic is based on the degree of semantic similarity between high scoring words within the topic itself. To find out the optimal number of topics within our tweets, we calculate the coherence scores of different numbers of topics. The higher the score, the better the number of topics fits the texts.
Choosing a good number of topics that generates a higher coherence score offers meaningful and interpretable topics. More topics usually provide higher coherence scores but quite fragmented meanings.
For Trump’s tweets, an LDA model with 8 topics produces the highest coherence value.
For Trudeau’s tweets, an LDA model with 6 topics produces the highest coherence value.
Topic Generation
We generate the following 8 topics for Trump’s tweets. topic_keywords
shows top keywords with their importance (weight) values to the topic. We concatenate the top 4 words to label the topic in the hope that it represent the meaning of the topic. For example, conference-white-news-house
can be interpreted as topic related to White House news conference. thank-deal-great-call
is probably related to saying thanks to someone, or having a great call, or making a great deal.
topic_num | topic_desc | topic_keywords |
---|---|---|
0.0 | federal-government-full-test | 0.066*”federal” + 0.059*”government” + 0.037*”full” + 0.033*”test” + 0.030*”general” + 0.026*”bill” + 0.023*”hospital” + 0.022*”low” + 0.022*”know” + 0.021*”continue” |
1.0 | conference-white-news-house | 0.089*”conference” + 0.085*”white” + 0.082*”news” + 0.081*”house” + 0.072*”eastern” + 0.056*”today” + 0.039*”press” + 0.036*”million” + 0.032*”thank” + 0.031*”world” |
2.0 | thank-deal-great-call | 0.285*”thank” + 0.053*”deal” + 0.052*”great” + 0.038*”call” + 0.035*”company” + 0.026*”act” + 0.025*”leader” + 0.024*”together” + 0.021*”go” + 0.021*”work” |
3.0 | joe-bernie-mike-sleepy | 0.060*”joe” + 0.057*”bernie” + 0.056*”mike” + 0.053*”sleepy” + 0.049*”mini” + 0.036*”biden” + 0.033*”long” + 0.032*”democrat” + 0.030*”foxnews” + 0.027*”also” |
4.0 | fake-news-state-people | 0.022*”fake” + 0.020*”news” + 0.017*”state” + 0.017*”people” + 0.017*”get” + 0.017*”country” + 0.017*”united” + 0.017*”say” + 0.016*”time” + 0.015*”medium” |
5.0 | keep-total-complete-safe | 0.035*”keep” + 0.034*”total” + 0.034*”complete” + 0.031*”safe” + 0.029*”endorsement” + 0.029*”small” + 0.026*”strong” + 0.026*”business” + 0.025*”great” + 0.024*”amendment” |
6.0 | great-day-book-history | 0.107*”great” + 0.043*”day” + 0.039*”book” + 0.038*”history” + 0.032*”wonderful” + 0.031*”hard” + 0.031*”end” + 0.029*”national” + 0.028*”american” + 0.027*”situation” |
7.0 | kag-thank-ventilator-need | 0.162*”kag” + 0.093*”thank” + 0.067*”ventilator” + 0.042*”need” + 0.038*”spoke” + 0.032*”good” + 0.032*”help” + 0.028*”every” + 0.022*”deliver” + 0.018*”work” |
The following are 6 topics that are learned from Trudeau’s tweets. For example, business-small-help-owner
is probably related to offering help to small business owners.
topic_num | topic_desc | topic_keywords |
---|---|---|
0.0 | business-small-help-owner | 0.054*”business” + 0.034*”small” + 0.026*”help” + 0.025*”owner” + 0.024*”support” + 0.023*”announce” + 0.020*”non” + 0.020*”detail” + 0.019*”announcement” + 0.019*”emergency” |
1.0 | spoke-international-talk-spread | 0.040*”spoke” + 0.034*”international” + 0.031*”talk” + 0.030*”spread” + 0.029*”call” + 0.029*”impact” + 0.029*”today” + 0.028*”covid” + 0.026*”leader” + 0.025*”economy” |
2.0 | benefit-test-lose-apply | 0.051*”benefit” + 0.036*”test” + 0.034*”lose” + 0.033*”apply” + 0.033*”emergency” + 0.029*”month” + 0.028*”receive” + 0.025*”response” + 0.025*”year” + 0.023*”invest” |
3.0 | make-work-need-sure | 0.024*”make” + 0.022*”work” + 0.020*”need” + 0.018*”sure” + 0.018*”continue” + 0.018*”country” + 0.018*”keep” + 0.017*”health” + 0.016*”home” + 0.016*”safe” |
4.0 | celebrate-hope-life-around | 0.053*”celebrate” + 0.042*”hope” + 0.038*”life” + 0.037*”around” + 0.036*”clock” + 0.031*”please” + 0.030*”full” + 0.026*”year” + 0.022*”late” + 0.022*”world” |
5.0 | one-update-watch-family | 0.026*”one” + 0.024*”update” + 0.023*”watch” + 0.023*”family” + 0.022*”time” + 0.021*”friend” + 0.021*”kid” + 0.019*”give” + 0.019*”great” + 0.018*”live” |
We apply these topics to the tweets and assign the topic with the highest probability. Here are some Trump tweets with probable topics.
probability | topic_desc | token | text |
---|---|---|---|
0.544840 | joe-bernie-mike-sleepy | [stag, coup, bernie] | They are staging a coup against Bernie! |
0.808480 | keep-total-complete-safe | [michelle, fischbachmn, run, congress, minnesota, michelle, protect, unborn, strong, crime, border, cut, tax, love, military, vet, stand, great, farmer, michelle, complete, total, endorsement] | Michelle @FischbachMN7 is running for Congress in Minnesota. Michelle will protect the unborn, is Strong on Crime & Borders, Cutting Taxes, your #2A, Loves our Military, Vets, & will stand w/ our Great Farmers. Michelle has my Complete & Total Endorsement!https://secure.winred.com/MichelleFischbach/website-donations … |
0.441271 | joe-bernie-mike-sleepy | [people, favor, mini, mike, continue, hapless, campaign, political, consultant, get, richer, richer, day] | The only people in favor of Mini Mike continuing with his hapless campaign are me and his political consultants, who are getting richer and richer by the day! |
0.802032 | fake-news-state-people | [foxnews, work, hard, push, radical, left, nothing, democrat, want, unlike, competitor, cnn, msdnc, comcast, fair, balance, ever, learn, radical, left, never, even, give, foxnews, permission, partake, low, rat, debate] | .@FoxNews is working hard pushing the Radical Left, Do Nothing Democrats. They want to be, unlike their competitors, @CNN & MSDNC (Comcast), Fair & Balanced. When will they ever learn. The Radical Left never even gave @FoxNews permission to partake in their low rated debates! |
Here are some Trudeau’s tweets with probable topics.
probability | topic_desc | token | text |
---|---|---|---|
0.876077 | make-work-need-sure | [mining, long, building, block, canadian, economy, ever, important, role, play, transition, cleaner, future, today, pdac, spoke, opportunity, canada, world, cleanest, supplier, metal, mineral, pxmmmd] | Mining has long been a building block of the Canadian economy. And now more than ever, it has an important role to play in our transition to a cleaner future. Today at #PDAC2020, we spoke about the opportunity Canada has to be the world’s cleanest supplier of metals & minerals.pic.twitter.com/cs27PXMMmD |
0.791011 | celebrate-hope-life-around | [javier, rez, llar, dedicate, life, promote, universal, human, right, building, peaceful, world, legacy, live, generation, deepest, condolence, family, whose, life, touch, remarkable, work] | Javier Pérez de Cuéllar dedicated his life to promoting universal human rights & building a more peaceful world, and his legacy will live on for generations. My deepest condolences to his family & all those whose lives were touched by his remarkable work.https://pm.gc.ca/en/news/statements/2020/03/05/statement-prime-minister-death-javier-perez-de-cuellar … |
0.841723 | benefit-test-lose-apply | [since, million, canadian, lift, poverty, mean, family, money, pocket, senior, enjoy, retirement, young, people, like, met, esbgc, today, able, reach, full, potential, yesrzedyd] | Since 2015, over 1 million Canadians have been lifted out of poverty. And that means more families have more money in their pockets, more seniors can enjoy their retirement, and more young people – like those we met at @esbgc today – are able to reach their full potential.pic.twitter.com/5YesRZedYd |
0.833590 | business-small-help-owner | [make, affordable, canadian, business, switch, zero, emission, technology, mining, transportation, agriculture, incentive, propose, today, would, help, company, save, money, reduce, emission, get, detail] | We’re making it more affordable for Canadian businesses to switch to zero-emission technologies. From mining to transportation to agriculture, the incentive we proposed today would help companies save money & reduce emissions. Get the details:https://pm.gc.ca/en/news/news-releases/2020/03/02/making-zero-emissions-vehicles-more-affordable … |
The chart below shows proportion of Trump’s tweets of different topics from March 1 to April 27, 2020. Tweets of the topic fake-news-state-people
accounts 46%.
topic_desc | proportion |
---|---|
fake-news-state-people | 0.463596 |
keep-total-complete-safe | 0.104012 |
great-day-book-history | 0.087667 |
joe-bernie-mike-sleepy | 0.083210 |
conference-white-news-house | 0.072808 |
thank-deal-great-call | 0.072808 |
federal-government-full-test | 0.057949 |
kag-thank-ventilator-need | 0.057949 |
The chart below shows proportion of Trudeau’s tweets of different topics. Tweets of the topic make-work-need-sure
accounts 38%.
topic_desc | proportion |
---|---|
make-work-need-sure | 0.383420 |
one-update-watch-family | 0.222798 |
business-small-help-owner | 0.134715 |
benefit-test-lose-apply | 0.103627 |
spoke-international-talk-spread | 0.101036 |
celebrate-hope-life-around | 0.054404 |
Topic Visualization
We use pyLDAvis, an interactive LDA visualization package, to plot all generated topics and their keywords. Here is the link to Trump’s tweets topics, and link to Trudeau’s tweets topics.
Each bubble on the left represents a topic. The size of the bubble represents prevalence of the topic. The distance between the bubbles reflects the similarity between topics. The closer the two circles are, the more similar the topics are.
Closing Notes
To understand how our world leaders are handling the COVID-19 crisis and communicate to their followers, we turn to Twitter and scraped Donald Trump and Justin Trudeau’s tweets in the past two months. We generated word clouds to show the most appearing words, learned the topics of their tweets by developing LDA models with different number of topics.
Who do you think is making a better use of Twitter and communicating to his people during the crisis time?
All codes can be found on GitHub.
Happy Machine Learning!