Wednesday, April 28, 2010

Libdem manifesto key concepts

And finally the key concepts for the Libdem manifesto.

Labour key concept cloud

And here's the key concept cloud from the Labour manifesto ... there seems to be much more variety of concepts here and hence the cloud is much bigger. I had to shrink it further to get it all in one screenshot.

Conservative key concept cloud

In addition to key words, Wmatrix can produce key concepts by comparing a frequency list of semantic fields automatically tagged in the data with a reference corpus, again here the BNC written sampler. This shows statistically key concepts in the Conservative manifesto

Tuesday, April 20, 2010

TEI versions of UK election manifestos

Meanwhile, somewhere deep in France with a laptop, Lou Burnard has created TEI encoded versions of the UK election manifestos, tagged and lemmatised them with TreeTagger. Download from

Thanks Lou!

Updated Libdem manifesto and cloud

Lou Burnard spotted some conversion errors in the Libdem manifesto (extra spaces after ligatures e.g. 'fi') and Martin has now fixed smart quotes to straight ones. The new text version of the Libdem manifesto is at and here is the updated key word cloud. You'll notice the main difference is that "Britains" is no longer key because it was actually "Britain's" and the apostrophe now being fixed means that it combines its frequency with "Britain".

Thursday, April 15, 2010

Libdem manifesto key words

And finally the Liberal Democrat manifesto key word cloud.

Labour key word cloud

And here's the Labour key word cloud ...

Conservative key word cloud

Unlike tag clouds or those produced by Wordle where the size of a word depends on its frequency, the Wmatrix key word clouds show words where their size is related to their statistical keyness i.e. how different their frequency is from what it is expected to be (based on a large reference corpus). Here's the conservative key word cloud.

UK election manifestos 2010

This week I've been reading the UK election manifestos, or rather I've set Wmatrix to read them for me. First, you have to convert the online versions in PDF or HTML into plain text. Saving automatically from Acrobat as plain text leaves unwanted headers and footers and some lost capitalisation. Thanks to Martin Wynne for editing the Libdem and Conservative files. I've edited the Labour manifesto by taking the HTML version from their website and marking the chapter boundaries with a pseudo-XML tag. The edited full plain text versions are available to download at:

Labour's manifesto is 29,508 words long. The Conservative manifesto is 27,562 words and the LibDem one is shorter at 18,433 words.

Next, I loaded the files into Wmatrix and compared them to a general reference corpus for written British English. Key word clouds coming up ...

Friday, April 9, 2010

Windows to mac tips part 6: control-alt-delete for OSX

Two years later and I'm still finding out new shortcuts!

If you want the equivalent of control-alt-delete on OSX then use command-option-escape and you can force an application to quit.

