I paused efforts on an introductory course on machine learning and statistical inference after finding Professor Matthew J. Salganik’s draft book, Bit by Bit: social research in the digital age, which has been available for open peer review since summer 2016 (click on the book title to access it, and here to learn about the open review process).
His work bridges computer science and social science, and finding a book “for social scientists that want to do more data science, and it is for data scientists that want to do more social science” was something of a relief. My sense of relief continued for these reasons:
The draft is very easy to read, with a minimum of technical computer or social science language. It is not focused on the detail of various methodological debates, aiming instead for developing a high-level understanding between these two fields.
It gets you into the academic history, which can be harder to access if you work outside academia. There are loads of links to research where digital methods and big data have been used, both recent (how big data has been used in matching studies to explore the effects of shootings on police violence and stop and frisk records) and decades old, for example how the Vietnam draft was used by researchers as a natural experiment, to show how fundamental skills in thinking through design, identifying your questions and understanding bias stand the text of time.
Future orientated, the book doesn’t have the focus on specific, and ever evolving, tools and software that I have been encountering elsewhere.
The author understands the nitty-gritty, practicalities involved in the ‘doing’ social research. For example, how to access data; how to weigh up the cost implications of higher up front fixed costs with a digital approach, but likely much lower variable costs; tangible ways to think about the new and complicated ethical considerations raised by digital social research.
Digital tools and big data are not presented as some kind of silver bullet to all that is difficult about social research. For example, chapter two covers characteristics of big data that are generally good for research, and those which are generally bad; with twice as many listed bad than good (in the draft version I read).
This lecture by Salganik offers a shorter introduction, and does what the book does well – illustrating methodological, practical and ethical issues with lots of examples, and in plain English. For example, how the realism from field experiments and the control of lab experiments can be combined in digital field experiments, at a greater scale. How the limitations of average treatment effect, in understanding how an intervention will impact on heterogeneous populations, may be more easily addressed through digital methods, which often provide more pre-treatment information and more easily allow for running more than one experiment.
While it was a welcome relief to spend some time with a guide that spoke my language, I still think the time I am spending, often lost, in the big data / computer science space is necessary. As the reality is “computational social science”, a term Salganik uses, will be straddling two languages and cultures for some time yet.
Despite appearances this post shouldn’t be read as a review as the book is still a draft. But I will be buying a copy as soon as it comes out (you can sign up here to be notified about publication), not least for the links to further reading and a range of practical exercises to work with.