  <?xml version="1.0" encoding="utf-8"?>
<?xml-stylesheet title="XSL_formatting" type="text/xsl" href="/blogs/shared/nolsol.xsl"?>

<rss version="2.0" xmlns:dc="http://purl.org/dc/elements/1.1/">
<channel>

<title>
Backstage.bbc.co.uk blog
 - 
Dr Ian McDonald
</title>
<link>http://www.bbc.co.uk/blogs/bbcbackstage/</link>
<description>backstage.bbc.co.uk is the 麻豆社&apos;s early adopter network to encourage participation and support creativity through open innovation.</description>
<language>en</language>
<copyright>Copyright 2011</copyright>
<lastBuildDate>Thu, 22 Apr 2010 12:31:45 +0000</lastBuildDate>
<generator>http://www.sixapart.com/movabletype/?v=4.33-en</generator>
<docs>http://blogs.law.harvard.edu/tech/rss</docs> 


<item>
	<title>When is a dataset not a dataset? The hackday project that crowdsourced data.gov.uk</title>
	<description><![CDATA[<p><img class="mt-image-none" alt="Tom Morris and other participants at the end of the hackday" src="http://www.bbc.co.uk/blogs/bbcbackstage/2010/02/12/hhhday_iv_tommorris.jpg" width="595" height="293" /></p>
<p class="麻豆社Text" style="margin: 0cm 0cm 0pt;">When is a dataset not a dataset? How many of the now 3241 datasets listed as part of  are easy to open up and play with? How many are tables for computers to analyse, instead of PDF reports for people to read? </p>
<p class="麻豆社Text" style="margin: 0cm 0cm 0pt;"><o:p>&nbsp;</o:p></p>
<p class="麻豆社Text" style="margin: 0cm 0cm 0pt;">The &nbsp;filled a Channel 4 office with journalists and developers on the final Friday in January. Our aim was to tell new stories with open data. Attendees already had&nbsp;form - the 麻豆社's  Martin Rosenbaum, and data journalism teams from the Times, the Guardian, and the <st1:placetype w:st="on">FT</st1:placetype>.  judged our attempts in his role as head of hosts , alongside&nbsp;My Society boss Tom Steinberg. They  to my team's analysis of Tory candidates. But another project promised to shed light on public data in the UK.</p>
<p class="麻豆社Text" style="margin: 0cm 0cm 0pt;"><o:p>&nbsp;</o:p></p>
<p class="麻豆社Text" style="margin: 0cm 0cm 0pt;"> was part of a team that looked into the quality of data.gov.uk. Although data.gov.uk advertises itself as a database of open datasets, many of the entries are . He built a prototype format checker that invites people to go through datasets and record the file format.&nbsp;You can listen to him explaining the checker to me and to the hackday, or reuse  under the .</p>

<p class="麻豆社Text" style="margin: 0cm 0cm 0pt;"><o:p>&nbsp;</o:p></p>

<div id="tomMorrisDataAudio" class="player" style="margin-left:40px"> <p>In order to see this content you need to have both  enabled and  installed. Visit  for full instructions. If you're reading via RSS, you'll need to visit the blog to access this content. </p> </div> <script type="text/javascript">
<!--
var emp = new bbc.Emp();
emp.setWidth("466");
emp.setHeight("106");
emp.setDomId("tomMorrisDataAudio");
emp.setPlaylist("http://www.bbc.co.uk/learningdevelopment/assets/blogs/backstage/TomMorrisData-feature-playlist.xml");
emp.write();
//-->
</script> 



<p class="麻豆社Text" style="margin: 0cm 0cm 0pt;"><o:p>&nbsp;</o:p></p>

<p class="麻豆社Text" style="margin: 0cm 0cm 0pt;">On Wednesday February 3<sup>rd</sup>, he put a completed quality checker online. On that Thursday, the crowd had gone through data.gov.uk and marked up all of the datasets. <br /></p>

<p class="麻豆社Text" style="margin: 0cm 0cm 0pt;"><o:p>&nbsp;</o:p></p>

<p class="麻豆社Text" style="margin: 0cm 0cm 0pt;">Tom posted his initial breakdown to the data.gov.uk community on March 20th:</p>

<blockquote>

<table>
<tr><td>HTML -</td><td align="right">252</td></tr>
<tr><td>XML -</td><td align="right">5</td></tr>
<tr><td>Word - </td><td align="right">4</td></tr>
<tr><td>RTF - </td><td align="right">1</td></tr>
<tr><td>OpenOffice -</td><td align="right">1</td></tr>
<tr><td>Something odd - </td><td align="right">85</td></tr>
<tr><td>JSON - </td><td align="right">9</td></tr>
<tr><td>Nothing there! - </td><td align="right">190</td></tr>
<tr><td>CSV - </td><td align="right">12</td></tr>
<tr><td>Multiple formats - </td><td align="right">1211</td></tr>
<tr><td>PDF - </td><td align="right">468</td></tr>
<tr><td>RDF - </td><td align="right">10</td></tr>
<tr><td>Excel - </td><td align="right">408</td></tr>
<tr><td>TOTAL - </td><td align="right">2656</td></tr>
</table>

Sadly, this is over-optimistic. I've manually checked some of the data
that has been categorised as JSON and RDF. Most of it is not actually
correctly categorised - either people clicked, say, 'RDF' when they
meant to click 'PDF', or they have seen an RSS or Atom feed and
categorised it as RDF.

What this admittedly imperfect dataset is basically saying is that the
vast majority of the 'data' on data.gov.uk is not actually
machine-readable data but human-readable documents.
</blockquote>

<p class="麻豆社Text" style="margin: 0cm 0cm 0pt;">

<p class="麻豆社Text" style="margin: 0cm 0cm 0pt;">He will be at the  this weekend, where he will speak about  and might do the analysis, which he told me was the most important part. When done, it will be very interesting indeed to read it.</p>]]></description>
         <dc:creator>Dr Ian McDonald 
Dr Ian McDonald
</dc:creator>
	<link>http://www.bbc.co.uk/blogs/bbcbackstage/2010/04/datagovuk-format-checker.shtml</link>
	<guid>http://www.bbc.co.uk/blogs/bbcbackstage/2010/04/datagovuk-format-checker.shtml</guid>
	<category>hackers</category>
	<pubDate>Thu, 22 Apr 2010 12:31:45 +0000</pubDate>
</item>


</channel>
</rss>

