In this blog I will share what I have learned from sitting at my desk trying to install and use new software and code. The aim is to help you get started and, more importantly, keep going.
First up, an overview.
• Installing software and loading data rarely worked perfectly and takes some time. Think about where you want to focus your energy before you start. What functions will be most useful to your work?
• Getting used to new tools and coding is about trial and error. You can’t be afraid of messing things up. Try to be playful, mess up, learn, mess up a bit more and keep going.
• You will not get very far by reading without practical application, you have to jump in. After which reading on forums to help you work out where you are going wrong is invaluable. For example, I became totally stuck early on when an error in my R code led to a ‘+’, which wouldn’t let me continue coding. By searching on a help forum, I quickly found the answer. Pressing escape took me back to the usual ‘>’, that indicates you can start your next line of code.
• Find someone with experience to sit down with and talk through where you are stuck. This doesn’t need to be someone who works in your field, but they need to have been around computers and coding long enough to have made many more mistakes than you.
• You need patience. There are lots of “Why? Why are you doing this?!” conversations to be had with the screen.
•The human error involved in coding is huge. Even when I was “absolutely completely sure I’ve done it right this time”, nine times out of ten the problem was that I had mistyped something.
•Uninstall software that you won’t be using regularly and sign up for reminders to update what you do keep.
• I found this children’s book helpful, it literally illustrates how a computer works. And this five-part radio serieson the history of coding was interesting and easy to understand. They helped me put the specific tasks in a wider context.
For data analysis and coding I started with iNZight , free software that allows you to explore data and understand some statistical ideas, including multivariable graphics, time series, and generalised linear modelling (including modelling of data from complex surveys). You can read more about my experience with it here. iNZight was relatively easy to install and keep up to date. I have kept this on my machine, at some point I would like to explore the relative usefulness of a software programme based on R compared with coding directly using R and R Studio.
R is a freely available and widely used programming language, for statistical computing and graphics. One of the problems I bumped into when installing and trying R for the first time is that I didn’t understand the difference between this and RStudio. R is the language, RStudio is something called a software environment. A software environment, or integrated development environment (IDE) provides facilities, such as a source code editor, build automation tools and a debugger. I think of it as a little world where the code works more easily. So you download R and RStudio, but work through RStudio. I wasn’t following the instructions very clearly when I started and was trying to work with the R programming language (one icon on my desktop), where the code didn’t do as I expected, rather than RStudio (a different icon), which I found very easy to use. I am still working with R and RStudio and you can read more about this in my next post.
Python is another freely available programming language. It was the first one I tried, and I just dipped my toe in, over a year ago (via this course). It took at least two attempts to get it installed and then a bit more confusion over how to actually get at the data and start analysis. I downloaded Anaconda Python distribution, which contained all the data processing software I needed: the Python programming language, the data analysis module Pandas, the Jupyter notebook application and the IPython environment. And I struggled to work out how to get into it once it was installed… “Do I click on Anaconda? Or that file that says Python, or that one?”. I will come back to this programming language at some point, and will need to update to the latest version. For a more detailed discussion of the relative merits of R and Python from someone who has much more experience than me I recommend this post from Dr Will Parry, a social scientist and experienced data analyst.
I tried (in a very limited way) Apache Hadoop software, as part of an introductory course on big data analytics (you can read more about my experience in post number 11 ‘Jumping through Hadoops’). This included the Hadoop Distributed File System (HDFS)and MapReduce. I also had to install Virtual Box and Cloudera Quick Start VM (virtual machine), which was a bit like putting a little computer on my laptop that you work through when using the Hadoop data management programme. All of this was several steps of installing. For each install you need to know you have the right version for your machine (which means looking up your operating system and type via the computers control panel). Everything was free, but took a while, two to three hours in total I think, and if you haven’t been through it before you would need step by step instructions, which I had via this course. Once I had finished the course I uninstalled all this software, as I won’t be using it again soon, so didn’t want the hassle of keeping it up to date and having a kind of alternative system running on my machine.
For social media analytics, I installed TAGs which was easy to do, you can read more about it here. However, I finished using it about three months ago, and despite a couple of attempts haven’t been able to uninstall it. I received an email with instructions (involves clicking a link which tells me I have been successful), but my query is still running every day, so I will need to email them directly or do some web searching to work out what is going on.
Next I opted for a two-week trial of Tableau, the only piece of software that wasn’t freely available. It worked like an integration of excel and jazzed up PowerPoint, so you can interrogate and report your data on the go, using visuals that help to tell a story. It was fine, but not anything I couldn’t do using freely available software. I wasn’t keen on the plug in and play approach to analytics, which lets you generate numbers without having to understand the data or analysis behind your results.
Gephi, used to analyse and visualise data on networks, is the only install I have given up on so far, which is almost certainly to do with me rather than the software. Despite several attempts, including to download Firefox (it wasn’t working in Chrome and Edge) and updating Java, it still wasn’t running and I was out of ideas and time. I also knew I wouldn’t be using Gephi anytime soon. As I talk about here, I would need to develop a much more detailed understanding of how Twitter works as a data source to do anything other than basic analysis.
That said I strongly discourage giving up, I found most software took several attempts at installation to get them to work, and came to accept the tinkering and problem solving as useful training for this way of working.