Data mining has revealed previously unknown Russian Twitter troll campaigns

Trolls left forensic fingerprints that cybersecurity experts used to find other disinformation campaigns both in the US and elsewhere.

Emerging Technology from the arXivarchive page

October 11, 2018

Image of phone on the twitter login pagefreestocks.org

Human activity leaves all kinds of traces, some more obvious than others. For example, messages posted to services such as Twitter are obviously visible. But the pattern of tweets from a user over time is not as self-evident.

Various researchers have begun to study these patterns and found that they can identify certain types of accounts, particularly those that post in high volume. For example, accounts that post continuously, 24 hours a day, are unlikely to be operated by humans. Instead, this is a clear signal that a bot of some kind is at work.

Humans also generate specific patterns, albeit less obviously than bots. In particular, accounts that post high volumes of tweets often do so in a pattern whose unique signature forensic analysis can identify.

One corpus of interesting tweets encompasses the messages posted by Russian trolls attempting to influence the 2016 US presidential election. Now researchers have analyzed these to search for any unique fingerprints they might contain. The idea is to use these fingerprints to identify other disinformation campaigns by the same trolls that have gone unnoticed. But is this possible?

Today we get an answer thanks to the work of Christopher Griffin and Brady Bickel at Pennsylvania State University. These guys’ forensic analysis has identified a unique signature in these tweets and used it to find evidence of other disinformation campaigns. “We identify an operation that includes not only the 2016 US election, but also the French national and both local and national German elections,” say Griffin and Bickel.

Unique behavioral fingerprints are hard to identify because of the sheer volume of data on Twitter. A vast number of human users share similar behavioral characteristics and so cannot be easily distinguished. However, the behavioral signature becomes more distinctive as the volume of messages increases.

That’s why the Russian trolls are identifiable in this way. Griffin and Bickel downloaded a database of 200,000 Russian troll tweets gathered by Twitter and obtained by NBC News. They then analyzed the tweets by the most prolific users—those who posted more than 500 times during the election period.

The researchers examined the way these users tweeted over time and how they differed from other Twitter users. They also looked for communities within the database and then created word clouds of their tweets showing the most commonly used words.

This threw up a surprise. The analysis revealed seven communities that each use different word clouds. Four of these communities were clearly focused on topics such as the US Tea Party movement and African-Americans.

But two of these word clouds consisted entirely of words in Russian and German. Griffin and Bickel analyzed these further to show that the timing of the tweets spiked in the run-up to the German national election in 2017 and the local Berlin election in 2016. “The Berlin state election was significant because Chancellor Merkel’s party was beaten by right-wing populists,” say the researchers.

The team also found a similar spike in activity in the build-up to the French national election in 2017, although this involved only 588 messages. That’s too small for detailed analysis, but Griffin and Bickel speculate that it points to the existence of another group of trolls, as yet unidentified, who targeted France.

That’s interesting work suggesting that Russian troll activity was significantly more ambitious on an international scale than previously thought. It also suggests a way of spotting this kind of meddling as it is happening by looking for the kind of forensic fingerprint the team identified.

Of course, finding trolls is a cat-and-mouse game. For the organizations responsible for Russian troll activity, it ought to be a straightforward matter to change the pattern of activity in a way that does not create the same signature.

And yet, if this malicious activity is to be significant and effective, it will inevitably take place on a relatively large scale and so generate a different signature. The question is how to spot it in time to take action. And so the game continues.

Ref: arxiv.org/abs/1810.01466 : Unsupervised Machine Learning of Open Source Russian Twitter Data Reveals Global Scope and Operational Characteristics

Keep Reading

Large language models can do jaw-dropping things. But nobody knows exactly why.

And that's a problem. Figuring it out is one of the biggest scientific puzzles of our time and a crucial step towards controlling more powerful future models.