And just in case you haven't had time to read all 1,078,416 words in the Leveson Inquiry Report, here are the Key Clouds produced by Wmatrix for all four volumes conflated together. I took the reports available from the Inquiry website (http://www.levesoninquiry.org.uk/), downloaded the PDFs, converted them to text using PDFbox ExtractText tool (http://pdfbox.apache.org/commandlineutilities/ExtractText.html) and then loaded them in to Wmatrix (http://ucrel.lancs.ac.uk/wmatrix/) and compared them to the BNC written sampler.
The statistically most significant key words (i.e. not just the most frequent words) are:
And the statistically most significant key semantic categories are:
All these items are significant, but the larger the font, the more significant they are. If you want to have a look at these texts in Wmatrix, just let me know and I can share the folders with you. You'll then be able to click through the clouds to see each example of the word and tags in context.
Thursday, November 29, 2012
If you don't have time to read even the Executive Summary of the Leveson report (26345 words) published today, then here's what Wmatrix shows as the key words compared to the BNC sampler written reference corpus:
And now for the key semantic tags:
Sunday, August 5, 2012
I decided to blog this one rather than tweet it. I needed to shift my iTunes library from my old MacBook to the new MacBook Pro so that I can sync with my shiny new iPad. Follow these instructions if you need to do so. Note that it shifts everything in your library not just your music: http://support.apple.com/kb/HT4527
Monday, February 14, 2011
Friday, November 5, 2010
Sunday, October 31, 2010
Wednesday, September 29, 2010
It has been reported elsewhere (Google research blog, New York Times, Language Log, IBM Research) that Fred Jelinek passed away on September 14th, 2010. I heard Fred (at a talk in Prague I think) repeat his famous 'quote' about the accuracy of his MT system going up when the linguists left the room!