And here's the key concept cloud from the Labour manifesto ... there seems to be much more variety of concepts here and hence the cloud is much bigger. I had to shrink it further to get it all in one screenshot.
In addition to key words, Wmatrix can produce key concepts by comparing a frequency list of semantic fields automatically tagged in the data with a reference corpus, again here the BNC written sampler. This shows statistically key concepts in the Conservative manifesto
Lou Burnard spotted some conversion errors in the Libdem manifesto (extra spaces after ligatures e.g. 'fi') and Martin has now fixed smart quotes to straight ones. The new text version of the Libdem manifesto is at http://ucrel.lancs.ac.uk/wmatrix/ukmanifestos2010/ and here is the updated key word cloud. You'll notice the main difference is that "Britains" is no longer key because it was actually "Britain's" and the apostrophe now being fixed means that it combines its frequency with "Britain".
Unlike tag clouds or those produced by Wordle where the size of a word depends on its frequency, the Wmatrix key word clouds show words where their size is related to their statistical keyness i.e. how different their frequency is from what it is expected to be (based on a large reference corpus). Here's the conservative key word cloud.
This week I've been reading the UK election manifestos, or rather I've set Wmatrix to read them for me. First, you have to convert the online versions in PDF or HTML into plain text. Saving automatically from Acrobat as plain text leaves unwanted headers and footers and some lost capitalisation. Thanks to Martin Wynne for editing the Libdem and Conservative files. I've edited the Labour manifesto by taking the HTML version from their website and marking the chapter boundaries with a pseudo-XML tag. The edited full plain text versions are available to download at: http://ucrel.lancs.ac.uk/wmatrix/ukmanifestos2010/
Labour's manifesto is 29,508 words long. The Conservative manifesto is 27,562 words and the LibDem one is shorter at 18,433 words.
Next, I loaded the files into Wmatrix and compared them to a general reference corpus for written British English. Key word clouds coming up ...