Wednesday, December 16 Showcase of data-orientated projects at the University of Sheffield with Guest speakers to place things in wider context.

09:00 - 09:15 Introduction to Open Data Science Initiative Neil D. Lawrence, Open Data Science Initiative

09:15 - 09:40 Differential Privacy and Virtual Patients Michael T. Smith, Open Data Science Initiative

09:40 - 10:10 Security in Smart Grids Iñaki Esnaola, Department of Automatic Control and Systems Engineering, University of Sheffield

10:10 - 10:30 Break

10:30 - 11:15 OpenML: Open, networked Machine Learning Joaquin Vanschoren, Eindhoven University of Technology

11:15 - 12:00 The Problem with Data in the Humanities Michael J. Pidd, Humanities Research Institute, Sheffield

12:00 - 13:00 Lunch

13:00 - 13:30 Automated Detection of Malaria-Bearing Mosquitoes Davide Zilli, Engineering Department, University of Oxford

13:30 - 14:15 The Iconic Image on Social Media: A Rapid Response to the Death of Aylan Kurdi [tbc] Farida Vis

14:15 - 14:30 Machine Learning: Science and Policy Aleks Berditchevskaia , The Royal Society

14:30 - 15:30 Data and Ethics [tbc] Jonathan Price, Doughty Street Chambers

15:30 - 15:50 Break

15:50 - 16:40 Is Your Research Software Correct? Michael Croucher, Open Data Science Initiative

16:40 - 17:00 Discussion and Close


OpenML: Open, networked machine learning

Joaquin Vanschoren

Today, the ubiquity of the internet is allowing new, more scalable forms of scientific collaboration. Networked science tools allow scientists to share and organize data on a global scale, build directly on each other's data and techniques, reuse them in unforeseen ways, and mine all data to search for patterns. is a place for researchers to analyse data together, building on shared data sets, machine learning code and prior experiments. Integrated in many machine learning environments, it helps researchers win time by automating reproducible sharing, reuse and experimentation as much as possible. It also helps scientists and students across scientific fields to explore the latest and most relevant open data sets and machine learning techniques, find out which are most useful in their work, collaborate with others online, and gain more credit for their work by making it more visible and easily reusable.

View Joaquin's presentation.

Automated Detection of Malaria-Bearing Mosquitoes

Davide Zilli

Abstract: Mosquitoes are responsible for over half a million deaths every year due to their capacity to vector lethal parasites and viruses, which cause diseases such as malaria, lymphatic filariasis, dengue, yellow fever and more. While they have been known to transmit malaria for over 100 years, there is still a great deal to discover about the ecology, bionomics and sometimes even the identity of many of these vector species, particularly in Asia. The HumBug project ( aims to automate the detection of mosquitoes by listening to the sound of their wingbeat with cheap and ubiquitous sensors, and use this data to model their interaction with the surrounding environment. In this talk I will discuss our current challenges in gathering training data for our algorithms and the principles of the acoustic detection of their wingbeat.

View Davide's presentation.

The Problem with Data in the Humanities

Michael J Pidd

This talk will explore some of the problems of using digital data in humanities research; whether it be mining large datasets, nominal record linkage, ontology development, data visualisation, computational linguistics or simply the desire to re-use data. The talk will question the wisdom of constantly using 'bad data' and bust open the myth of 'open data'. I will draw on examples from a range of HRI Digital projects, such as Digital Panopticon which is seeking to reconstruct the lives of 90,000 criminals sentence to transportation to Australia in the 18th and 19th centuries, and Lingustic DNA which is seeking to model the evolution of key ideas (concepts and paradigms) across 30 million pages of historical printed books.

Is Your Research Software Correct?

Mike Croucher

A 2014 survey of researchers from 15 Russell Group universities (1) found that 92% of academics use research software and 69% said that their research would not be practical without it. The survey also found that 56% of researchers develop their own software and that 21% of these had never received any training in software development. Given this state of affairs, how can we be sure that the software we are using to conduct research gives correct results? This talk will explore some of the issues surrounding the use and development of research software.

View Mike Croucher's presentation.

Differential Privacy

Mike Smith

Many of the datasets we are interested in processing contain private or sensitive information. Releasing this data for wider analysis poses serious privacy concerns. Differential Privacy is a system which allows a data holder to release data in a way which balances the needs of the individual with the wider social benefits the analysis might produce. I'll walk through a couple of simple examples of where 'anonymisation' might have failed, and then cover the basic ideas of differential privacy. If I've time I'll look at how this can be extended to more complicated datasets.

View Mike Smith's presentation.

Security in smart grids

Iñaki Esnaola

The smart grid paradigm is founded on the integration of existing power grids with sophisticated sensing and communication infrastructures. While the benefits provided by this setting are crucial for the future development of power grids, it also increases the dependency on data acquisition and system monitoring procedures. Central to the control and optimization of power systems is the state estimation problem in electricity grids. Two of the main contingencies faced by state estimation procedures are missing data due to telemetry problems, and intentionally corrupted data by a malicious attacker. In this talk, it is shown that by exploiting the statistical structure of the data generated in the grid, robust state estimation procedures can be formulated within the framework imposed by the current sensing infrastructure. In particular, tools from matrix completion and game theory are used to provide robust estimation procedures.

The Iconic Image on Social Media: A Rapid Response to the Death of Aylan Kurdi

Farida Vis

Farida Vis (Visual Social Media Lab, Faculty Research Fellow, Information School, University of Sheffield) will be presenting work from a new report from the Visual Social Media Lab on the images framing one of the biggest issues of 2015 -the flight of Syrian refugees to Europe- made highly visible by the photographs of three-year old Aylan Kurdi lying face-down on a beach in Turkey and then picked up by a Turkish police officer following an unsuccessful attempt by his family to reach Greece in early September. It tracks how the images spread on Twitter: from one tweet to 20 million screens in the space of 12 hours. It shows how these images had a huge impact on language use, with users shifting from the term 'migrants' to 'refugees' overnight. Moreover, Twitter recently announced that #RefugeesWelcome was one of the most influential moments of 2015 on the platform. The report shows how Twitter was instrumental in the distribution of these images, making the story go global and mainstream before the official international press published the first news article. Findings from Google News Lab highlight a similar pattern around the increased use of the term 'refugees'. It also shows how people were already searching for Aylan Kurdi before news started to spread on Twitter. The report draws on the expertise of 15 contributors from the Social Sciences, Arts and Humanities, industry and nonprofit organisations to address this wider question from different angles, across four sections in the report:

  • Social Media Responds - studies how the image spread on social media and what people searched for on Google. It looks at specific stories and image use.
  • What Did the Image Do? - addresses media coverage of personal, political and artistic responses as well as different political responses in the UK and Norway.
  • The Iconic Image on Social Media - uses longstanding visual traditions (the iconography of suffering, war and press photography) to understand the images.
  • Showing/Not Showing the Image - includes a timely discussion on ethics, publishing decisions for graphic images as well as the changing role of platforms.
In having adopted a rapid research response mode, the authors of the report seek to actively contribute to conversations about how the refugee crisis is framed in the media.

Machine Learning: Science and Policy

Aleks Berditchevskaia (Royal Society)

Quoting the Royal Society's project page:

"The Royal Society, the UK’s national academy of science, is to start a policy project on how machine learning, the powerful technology that allows machines to learn from data and self-improve, might impact on UK society. Machine learning underpins many of the services people rely on every day, including internet search engines, email filters to sort out spam, websites that make personalised recommendations and many of the applications we use on our phones.
With many more potential applications in the pipeline, future developments in machine learning could power the UK economy and help solve big societal issues. The Royal Society aims to increase awareness and demonstrate the potential of machine learning among policymakers, academia, industry and the wider public and highlight the opportunities and challenges it presents."

View Aleks' presentation.