My first, uninformed answer to the question what is big data? was ‘stuff off social media’. In particular, social media familiar to me and that is accessible – Twitter, Facebook, Instagram.
Much of the content on Facebook or Instagram is shared ‘privately’ between users that are connected to each other in a closed network. Twitter, being a platform where people share most of their content openly with others, is the most accessible. As a result, many courses and tools for social media analytics tend to focus on Twitter.
Twitter is a relatively new source of data, it started in 2006, and much analytical effort has been expended on how to use the data for commercial marketing. More sophisticated analysis, of the type that would be useful in social research, faces a number of hurdles – data availability, bias, ethics, interpretation and understanding context.
I completed this course in July 2016 to try and get a better, practical understanding of the potential and limitations of working with Twitter data for social research. Here is a quick overview of my thoughts based on this experience:
Analysing Twitter data is useful for understanding what is happening on Twitter. It is less clear how useful it is for understanding the world at large. Some topics and events have greater coverage on Twitter, think Black Lives Matter, Occupy, major news events or natural disaster. Even when a topic generates lots of Twitter activity it will still be biased data, providing information only from and on people who use Twitter. Within that population there will be further bias, as some people will be more easily found than others in relation to a topic or event, for example people who use popular hashtags in their tweets.
Before you get to the ‘how’ of including Twitter data in research, it is necessary, as with any data source, to really understand what you are working with. This publication , by Axel Bruns and Hallvard Moe from week 1 of the course, gets into the detail of the fluid and limited nature of what can be retrieved. Twitter communication has different layers and flows, focusing on hashtag activity is limited and biased, and delving more deeply into communication flows can quickly use up the amount of Twitter data that you can access for free. The detail involved to really understand Twitter came as a surprise, despite being a user since 2010. Using Twitter as @kclarity is a different thing from thinking about how to use it as a data source.
Developing expertise didn’t feel like a good use of time. There were few circumstances where I could imagine Twitter being a main component of data collection, and none in the immediate future. I couldn’t see the point of investing time in developing a deep, but only theoretical, understanding of how it could work. I would prefer to spend the time working with a social media analyst, who already has the expertise, and only on a live project.
Developing a basic understanding did feel like a good use of time. The basic analytics I performed through the course would help me have a more useful conversation with an expert, to plan how we might work together. It also gave me some ideas for how I might be able to do incorporate some very simple Twitter analysis into a research project myself. For example, to support hypothesis generation, purposive sampling or identifying experts for a Delphi.
The next post will cover this in more detail on this, along with the tools and software I used for my rudimentary analysis.