And just in case you haven't had time to read all 1,078,416 words in the Leveson Inquiry Report, here are the Key Clouds produced by Wmatrix for all four volumes conflated together. I took the reports available from the Inquiry website (http://www.levesoninquiry.org.uk/), downloaded the PDFs, converted them to text using PDFbox ExtractText tool (http://pdfbox.apache.org/commandlineutilities/ExtractText.html) and then loaded them in to Wmatrix (http://ucrel.lancs.ac.uk/wmatrix/) and compared them to the BNC written sampler.
The statistically most significant key words (i.e. not just the most frequent words) are:
And the statistically most significant key semantic categories are:
All these items are significant, but the larger the font, the more significant they are. If you want to have a look at these texts in Wmatrix, just let me know and I can share the folders with you. You'll then be able to click through the clouds to see each example of the word and tags in context.
Thursday, November 29, 2012
Clouds for Leveson Inquiry Executive Summary
If you don't have time to read even the Executive Summary of the Leveson report (26345 words) published today, then here's what Wmatrix shows as the key words compared to the BNC sampler written reference corpus:
And now for the key semantic tags:
Sunday, August 5, 2012
Mac tips part 7: moving your iTunes library to a new computer
I decided to blog this one rather than tweet it. I needed to shift my iTunes library from my old MacBook to the new MacBook Pro so that I can sync with my shiny new iPad. Follow these instructions if you need to do so. Note that it shifts everything in your library not just your music: http://support.apple.com/kb/HT4527
Monday, February 14, 2011
IBM computer Watson plays Jeopardy
IBM are in the news again in relation to AI. Their computer (Watson) is playing in the American TV programme Jeopardy.
Friday, November 5, 2010
Review of UK copyright laws
A change in fair use in UK law might help corpus-based language researchers?
Sunday, October 31, 2010
BBC News: How do you pronounce 'H'?
The British Library is carrying out a survey of how spoken English is changing: http://www.bbc.co.uk/news/magazine-11642588
Wednesday, September 29, 2010
Fred Jelinek RIP
It has been reported elsewhere (Google research blog, New York Times, Language Log, IBM Research) that Fred Jelinek passed away on September 14th, 2010. I heard Fred (at a talk in Prague I think) repeat his famous 'quote' about the accuracy of his MT system going up when the linguists left the room!
Subscribe to:
Posts (Atom)