Thanks to a recommendation from Tracey Gyateng of DataKind, I went along to the launch of ‘Data science: a guide for society’ from Sense about Science, an independent campaigning charity that challenges the misrepresentation of science and evidence in public life.
The purpose of the guide is to equip people with the tools they need to question the quality of data science evidence, whether they are the public, journalists, politicians, or decision makers. It is intended to help start these conversations, regardless of how much someone knows about data science. The key messages are:
• You don’t need to be a data scientist to interrogate data science derived evidence, anyone can use this guide to understand whether results should be applied to real world decision making
• People can take this guide to their workplaces and communities to build in the “three questions” to their decision making when using data
• We can’t let data science become a “black box” where questions about evidence drop away
The guide is useful for anyone involved with data – if you are commissioning data analysis; using in-house analysis to inform your decision making; or when you are presented with recommendations that are based on, often invisible, data. The guide also raises important issues for anti-discriminatory practice.
The guide is short, clear and written for non-expert audiences – if you are a data scientist and you want your parents or grandparents to understand what you do this guide would be a place to start!
The guide starts with meanings of common terms and stages in data science. These simple explanations are intended to support people to start a conversation on the quality of data science; not to engage in academic level debate on the precision of definitions.
At the centre of the guide and campaign are the “three questions” for us all to start asking of data:
Where does it come from (for example, what was the original question asked, how representative is the data)?
What assumptions are being made (for example, are we sure the missing variables are all irrelevant? can results be generalized to other times / places / groups? In what ways might the algorithms be prejudiced?)?
Can it bear the weight being put on it (has the model been tested correctly? How precise are predictions? Is it worth using in the real world?)?
Tracey Brown, Director of Sense about Science, emphasised the power from people simply asking these questions – how people doing and using data science will start to respond to the likelihood that these questions will be asked of them, which in turn will influence their day to day work.
An excellent panel of data scientists from industry and the public sector covered points such as: check how far removed your data is from the real world; start small & be open; bring outside perspectives to check for algorithmic bias – for example music editors taking a look at what new music Spotify is recommending to you. Plus, the underlying issue of how to improve basic numeracy, data literacy, and being comfortable to say, in an increasingly data driven world, “I don’t understand”.
As one speaker emphasised, data is only useful in the context of decision making – which is both powerful and risky. This guide provides support for the ‘intelligent buying’ and use of data science that is necessary if we are going to maximise the power and reduce the risks.
More broadly the guide provides clear articulation of some important terms in evaluation – generalisability, causation & correlation, observational and experimental. Sense about Science also have a guide on statistics that is a useful companion to this one, as is Daniel Kahneman’s book ‘Thinking, Fast and Slow’.