Thursday, November 29, 2012

Key Clouds for the four-volume Leveson Inquiry Report

And just in case you haven't had time to read all 1,078,416 words in the Leveson Inquiry Report, here are the Key Clouds produced by Wmatrix for all four volumes conflated together. I took the reports available from the Inquiry website (, downloaded the PDFs, converted them to text using PDFbox ExtractText tool ( and then loaded them in to Wmatrix ( and compared them to the BNC written sampler.

The statistically most significant key words (i.e. not just the most frequent words) are:

And the statistically most significant key semantic categories are:

All these items are significant, but the larger the font, the more significant they are. If you want to have a look at these texts in Wmatrix, just let me know and I can share the folders with you. You'll then be able to click through the clouds to see each example of the word and tags in context.

Clouds for Leveson Inquiry Executive Summary

If you don't have time to read even the Executive Summary of the Leveson report (26345 words) published today, then here's what Wmatrix shows as the key words compared to the BNC sampler written reference corpus:

And now for the key semantic tags:

