What We Found in 3 Million Russian Troll Tweets
On July 31st, FiveThirtyEight shared 3 million Russian troll tweets to the public. With data so large and relevant to understanding Russian interference in American democracy, crowdsourcing this kind of research could unearth breakthrough insights.
As concerned citizens and patriots, Luminoso wanted to help analyze this data We saw the tweets were collectively all over the map (literally and figuratively). Some used hashtags; some used emojis; while others opted for slang, nonsense words, and timely memes (often riddled with spelling and grammar errors).
In other words, this is the kind of data Luminoso's “common sense” AI eats for breakfast.
(In this analysis, we only looked at English language data. But in future updates, we'll begin exploring the remaining 800,000 pieces of data containing Arabic, Russian, and Spanish among others.)
Here are some broad things we’ve found so far:
The Internet Research Agency targeted influencers. By our count, 27,323 tweets, or 1.2% of the 2.2 million tweets our AI looked at, “@” a politician, media personality, or influencer.
They mentioned influential politicians:
They also targeted media personalities:
And sent content to grassroot influencers:
Often, these tweets were sent to several personalities, thereby increasing the odds that one of them would spread the misinformation to their respective followers.
The strategy has persisted, with Russian trolls recently sending messages to the White House Twitter handle, to the POTUS handle, and to President Trump’s communications staff. The fact that this strategy has continued suggests that it has had some efficacy in spreading misinformation.
The ‘Newsfeed’ Twitter accounts focus on reposting news on politics, incidents that increase community anxiety, and, surprisingly, sports.
Our AI connects concepts to topics. So say you post two tweets about LeBron James. In the first, you describe one of his NBA Finals performances. Our AI recognizes that tweet as about sports. But if your second post discusses his now famous words to President Trump, that tweet is recognized as about politics.
We're also able to output the most frequently mentioned topics. And while many tweets are conceptually around politics, many more focus on spreading community news that instills fears.
While there will be overlap among some of the topics, their persistent appearance paints a strategy that focused on reposting news on politics, sports, and anxiety-inducing incidents.
The Russian trolls were hyper aware of emerging conspiracy theories and helped propagate them. QAnon, for example, has 2,797 conceptual appearances in the data, frequently resting alongside classifying hash tags like #FollowTheWhiteRabbit (0.85 correlation), #TheStorm (0.84), and #PedoWood (0.80).
Looking at all the tweets holistically, we found other interesting correlations.
Tweets that mention CNN also mention lies (0.35), being exposed (0.47), Clinton (0.46) and her emails (0.29), Donald Trump (0.42), and liberals (0.48)
Cops are almost never talked about in the context of Donald Trump (negative correlation), but often in the context of the FBI (0.36) and terrorists (0.25)
Muslims are almost always talked about in the context of terrorists and ISIS (0.66 and 0.63 correlation)
The different Twitter account categories have drastically different approaches to what they talk about as well. Collectively, the top 5 concepts discussed are: Trump, Clinton, Obama, cops, and workouts. Below are 2 charts showing how the top two concepts appear among the Twitter troll categories:
We hope this first look is helpful for folks out there who want to learn more about what's really in this massive dataset. And there’s a lot more that we can do.
Next, we'd like to look at other languages as a lot of the content is in Arabic and Russian. trends over time, which could reveal how the Russian troll strategy has evolved. We can also dig deeper into their influencer strategy, or find patterns that reveal how they convinced other users that the information they were sending was truthful.
Want to learn more about how Luminoso can discover the elusive "unknown unknowns" in your unstructured data? See a demo today.