Thursday, November 29, 2012

Key Clouds for the four-volume Leveson Inquiry Report

And just in case you haven't had time to read all 1,078,416 words in the Leveson Inquiry Report, here are the Key Clouds produced by Wmatrix for all four volumes conflated together. I took the reports available from the Inquiry website (http://www.levesoninquiry.org.uk/), downloaded the PDFs, converted them to text using PDFbox ExtractText tool (http://pdfbox.apache.org/commandlineutilities/ExtractText.html) and then loaded them in to Wmatrix (http://ucrel.lancs.ac.uk/wmatrix/) and compared them to the BNC written sampler.

The statistically most significant key words (i.e. not just the most frequent words) are:


And the statistically most significant key semantic categories are:


All these items are significant, but the larger the font, the more significant they are. If you want to have a look at these texts in Wmatrix, just let me know and I can share the folders with you. You'll then be able to click through the clouds to see each example of the word and tags in context.

Clouds for Leveson Inquiry Executive Summary

If you don't have time to read even the Executive Summary of the Leveson report (26345 words) published today, then here's what Wmatrix shows as the key words compared to the BNC sampler written reference corpus:



And now for the key semantic tags:


Sunday, August 5, 2012

Mac tips part 7: moving your iTunes library to a new computer

I decided to blog this one rather than tweet it. I needed to shift my iTunes library from my old MacBook to the new MacBook Pro so that I can sync with my shiny new iPad. Follow these instructions if you need to do so. Note that it shifts everything in your library not just your music: http://support.apple.com/kb/HT4527

Monday, February 14, 2011

IBM computer Watson plays Jeopardy

IBM are in the news again in relation to AI. Their computer (Watson) is playing in the American TV programme Jeopardy.

Friday, November 5, 2010

Wednesday, September 29, 2010

Fred Jelinek RIP

It has been reported elsewhere (Google research blog, New York Times, Language Log, IBM Research) that Fred Jelinek passed away on September 14th, 2010. I heard Fred (at a talk in Prague I think) repeat his famous 'quote' about the accuracy of his MT system going up when the linguists left the room!

Shared items from Google Reader