Tonight, President Obama gives his State of the Union address. It's that time of year when we reflect on where the country is going, and then gossip about who in the audience made a faux pas on national TV, invent drinking games, and start up text analytics software.
The SotU address is a pretty good case for text analytics. It's important. You want to know fundamentally what it's about. But it's also rather long and frequently takes a while to get to the point. Can't we get a computer to listen to it, and just glance at a computer-generated visualization of what the President said, instead?
This is, of course, the kind of thing we do at Luminoso. And while tonight's speech hasn't happened yet, we can apply Luminoso's software to examine the content of State of the Union addresses over the history of our country. Political rhetoric changes with the times, but we can use Luminoso to open new avenues of investigation into a general model of how politicians talk.
We built this model from transcripts of all 227 SotU speeches. The model captures the semantics of how words and phrases are used, based on its general background knowledge and the way words are used in these speeches in particular. For example, the model knows that the President is talking about approximately the same thing when he mentions "negotiation" and "arbitration", or for that matter "poverty" and the "needy". It knows that "combat", "conquer", "troops", and "invasion" are all ways to talk about "war", which is important because there is a lot of talk about war.
There's more going on in a Concept Cloud than in a typical word cloud visualization. The words are grouped together in space so that words that are used in similar ways appear near each other, or in similar colors. The size of the word indicates how much information we get from the word -- this is why the largest words aren't "the", "I", and "America", they're words relating to particular meaningful topics. Some of the informative words were predictable ("taxpayers", "troops", "negotiated") and some of them less so ("civilized", "gratifying", "enlightened").
We can track the usage of named entities (“Congress”, “Britain”, “Communists”), actions (“enacted”, “negotiated”), and descriptions (“civilized”, “belligerent”, “patriotic”). Some of these topics were legislative, some involved foreign policy, and quite a lot of our analytical topics were military, but then again, so were many of the speeches.
One thing we can do is break down the word usage by political party:
The Concept Clouds for speeches from different political parties turned up some interesting contrasts. Democratic-Republican presidents demonstrated presidential priorities from a different century, including the “Barbary” coast pirates, “fortifications”, and management of “militias”. Republican presidents tended to discuss foreign relations more bluntly (“confront”, “cooperate”, “allies”), while Democrats tended to be multi-lateral (“United Nations”, “aggression”, “neighbors”). On economic policies, Republicans discussed, “inflation”, “incentives”, and “taxpayers”, while Democrats talked of “unemployment” and the “minimum wage”.
Now we can break it down by President, looking at the correlation between the content of their speeches and various topics. This is something we measure as a percentage, with 0% being the typical correlation between two arbitrary words. These numbers measure whether each President talks about various topics more or less than usual, based on the appearance of the key word and many related words in their speeches.
When the SotU speeches were analyzed by president, some interesting variations turned up, some of which were caused by events (“war”, “peace”) and some which were the result of period vocabulary (“taxpayers”, “Constitution”).
A Luminoso analysis can output a lot of numbers, but usually we make the most out of topic-topic correlations, which tell you to what extent any two concepts are related. Many strong correlations were predictable (Senate/ratified 62%, repeal/law 74%), but some were quite interesting. The correlation between “Supreme Court” and “impartial” (77%) suggests that the Supreme Court has always been viewed as the fair arbiter of American legislation. The strong link between “Communists” and “Fascists” (41%) indicates that US presidents have talked about these different political movements in quite similar ways. “England” and “Britain” were 91% related, while “Russia” and “Soviet Union” were only 35% correlated, accurately reflecting the differences that these names represent.
The most interesting results came from the topic-timeline data, which measures how much any given concept (and its related ideas) were discussed over time:
Republicans and Democrats alike will unite in bipartisan indignation to find that, as far as presidential SotU addresses are concerned, they are practically identical entities. However, they may be heartened to learn that their share of the conversation is increasing. Meanwhile, the “Constitution” has been virtually ignored since the start of the 20th century, and both “Congress” and the “Senate” have held steady in terms of relevance in SotU speeches.
Interestingly, terms relating to legislation have been in decline since the early 20th century. This could be because presidents switched from discussing specific legislation to broad policy goals, but it’s more likely that this is the result of emphasizing foreign over domestic policy.
After World War II, the SotU became a message not just for American "citizens”, but for the international community. The role of America in the world changed, and the role of the State of the Union speech did as well. Presidents abruptly began using the SotU address to tell everyone what they were going to do about the communists, fascists, and the imperialists.
World War II profoundly changed the way US presidents discuss all military actions. It used to be about “maritime” rights and dealing with “beligerents” and “hostilities”. After WWII, the new watchword was “security”, and countering “aggression”. Most notably, it was after WWII that US presidents decided that they were the leaders of the “free world” and said so.
Will Obama's address tonight represent the natural continuation of historical trends, or will it cover unprecedented new topics? We'll find out tonight, and so will our software.