Thursday, April 15, 2010

UK election manifestos 2010

This week I've been reading the UK election manifestos, or rather I've set Wmatrix to read them for me. First, you have to convert the online versions in PDF or HTML into plain text. Saving automatically from Acrobat as plain text leaves unwanted headers and footers and some lost capitalisation. Thanks to Martin Wynne for editing the Libdem and Conservative files. I've edited the Labour manifesto by taking the HTML version from their website and marking the chapter boundaries with a pseudo-XML tag. The edited full plain text versions are available to download at:

Labour's manifesto is 29,508 words long. The Conservative manifesto is 27,562 words and the LibDem one is shorter at 18,433 words.

Next, I loaded the files into Wmatrix and compared them to a general reference corpus for written British English. Key word clouds coming up ...

No comments:

Shared items from Google Reader