Skip to Content

Automated Processing of Wikileaks Cables Reveals U.S. Friends, Foes

Natural Language Processing of nearly 4,000 U.S. diplomatic cables reveals fraying relations with traditional allies, and a few other surprises

Software capable of determining the positive or negative sentiment of sentences written by humans has been unleashed on 3,891 U.S. diplomatic cables released by WikiLeaks, and the results are a systematic, if preliminary, analysis of which countries are our besties and which are in the doghouse.

The analysis was part of a class project (pdf) by a pair of computer science undergraduates at Stanford, Xuwen Cao and Beyang Li. By looking at how often a country was mentioned, as well as whether or not it was cast in a positive or negative light, Cao and Li identified four clusters to which countries could belong: countries we don’t like that aren’t mentioned very often (red), countries we sort-of don’t like that aren’t mentioned very often (teal), and countries spoken of positively that also aren’t mentioned very often (blue).

Since these cables were supposed to be classified, we can assume they are candid. There weren’t any countries that were mentioned frequently in a negative or especially positive light – just countries that were groused about fairly frequently (green).

Here’s a further breakdown of what each cluster represents:

Green locations (cast in a somewhat negative light, and frequently):

[‘london’, ‘paris’, ‘cuba’, ‘africa’, ‘brasilia’, ‘cairo’, ‘eu’, ‘brazil’,
‘afghanistan’, ‘egypt’, ‘europe’, ‘iran’, ‘china’, ‘iraq’, ‘libya’, ‘syria’,
‘pakistan’, ‘washington’, ‘turkey’, ‘israel’, ‘moscow’, ‘spain’, ‘uk’, ‘russia’,
‘madrid’, ‘india’, ‘tripoli’, ‘kabul’, ‘iceland’, ‘france’]

Red locations (countries talked about infrequently, and in the most negative context):

[‘djibouti’, ‘taiwan’, ‘tajikistan’, ‘islam’, ‘mumbai’, ‘zimbabwe’, ‘dubai’, ‘goa’,
‘tibet’, ‘armenia’, ‘yar’, ‘ecuador’, ‘benghazi’, ‘algiers’, ‘yemen’, ‘paraguay’,
‘caracas’, ‘south africa’, ‘ouagadougou’, ‘xxxxxxxxxxxx’, ‘guinea’]

(It’s worth noting that due to the nature of natural language processing, a country like Taiwan could be mentioned in the context of negative sentiment about its context, and not the country itself – e.g. the cross-strait tensions with mainland China.)

Teal locations (mentioned in a somewhat negative context, but relatively infrequently):

[‘kosovo’, ‘north korea’, ‘damascus’, ‘argentina’, ‘latin america’, ‘netherlands’,
‘uruzgan’, ‘switzerland’, ‘reykjavik’, ‘lebanon’, ‘qatar’, ‘sudan’, ‘somalia’,
‘venezuela’, ‘guantanamo’, ‘colombia’, ‘sao paulo’, ‘saudi arabia’, ‘america’,
‘peru’, ‘gaza’, ‘bolivia’, ‘ukraine’, ‘geneva’, ‘jordan’, ‘tehran’, ‘georgia’,
‘sweden’, ‘portugal’, ‘mexico’, ‘lula’, ‘kenya’, ‘italy’, ‘ethiopia’, ‘canada’,
‘germany’, ‘havana’, ‘algeria’]

Blue locations (mentioned in the most positive context, but not very often):

[‘azerbaijan’, ‘japan’, ‘chechnya’, ‘norway’, ‘australia’, ‘ankara’, ‘baghdad’,
‘poland’, ‘haiti’, ‘kazakhstan’, ‘honduras’, ‘belgrade’, ‘copenhagen’, ‘kuwait’,
‘karzai’, ‘amazon’, ‘burma’, ‘tunisia’, ‘west bank’, ‘doha’, ‘west’, ‘new york’,
‘nigeria’, ‘serbia’, ‘darfur’, ‘chile’, ‘morocco’, ‘vatican’, ‘uae’, ‘new delhi’,
‘middle east’, ‘brussels’]

Here’s what the authors say about the seeming outliers in the blue group:

The blue cluster has the highest sentiment score, which means that US is relatively happy with this group. As one may notice, there are a few notable anomalies such as ‘burma’ and ‘sudan’. In the case of ‘burma’, the positive sentiment is mainly caused by Aung San Suu Kyi’s release from house arrest from mutliple cables. In the case of ‘sudan’, it’s also a special case because the darfur cables discuss mostly the international help darfur received, instead of it’s dire situation.

And here are the findings the authors found most interesting:

Given our model, we made a few interesting discoveries:
1. In general, the US diplomats are critical of other countries, as we observe the majority of the data points is in the negative
2. Surprisingly, US’s most important ally is spain (seen lower right quardrant)
3. US is most friendly with Norway (right-most point), although it’s relatively unimportant
4. Iran appeared most frequently, with a small negative sentiment (which means the attitude is not always hostile)
5. US is least happy with Zimbabwe and Paraguay, although it doesn’t care too much about them either
6. US doesn’t actually have good relations with its traditional allies such as France, UK and Germany. Canada, Italy and Germany even scored lower than China.

Number six is a zinger. It’s a stretch to say that cables that talk about our traditional allies in a negative light indicate that we have poor relations with them – maybe we have good relations, and that means we’re more willing to be critical, the way siblings are wont to fight. It’s also important to note that these results, which aren’t peer reviewed, are just a first approximation of what a full-fledged Natural Language Processing analysis of these cables would look like.

As much as this study says something about the nature of diplomacy, it’s possible it says something more about the nature of gossip: good news is never as important as news of what’s going wrong.

Follow Mims on Twitter or contact him via email.

Keep Reading

Most Popular

Large language models can do jaw-dropping things. But nobody knows exactly why.

And that's a problem. Figuring it out is one of the biggest scientific puzzles of our time and a crucial step towards controlling more powerful future models.

OpenAI teases an amazing new generative video model called Sora

The firm is sharing Sora with a small group of safety testers but the rest of us will have to wait to learn more.

Google’s Gemini is now in everything. Here’s how you can try it out.

Gmail, Docs, and more will now come with Gemini baked in. But Europeans will have to wait before they can download the app.

This baby with a head camera helped teach an AI how kids learn language

A neural network trained on the experiences of a single young child managed to learn one of the core components of language: how to match words to the objects they represent.

Stay connected

Illustration by Rose Wong

Get the latest updates from
MIT Technology Review

Discover special offers, top stories, upcoming events, and more.

Thank you for submitting your email!

Explore more newsletters

It looks like something went wrong.

We’re having trouble saving your preferences. Try refreshing this page and updating them one more time. If you continue to get this message, reach out to us at customer-service@technologyreview.com with a list of newsletters you’d like to receive.