In my last post I talked about why I chose to pursue R via DataCamp. Between January and the end of March I completed the first three courses of the data analyst with R track.
I found a rhythm for learning in limited time – structured exercises through the DataCamp course, daily practice to recap on what I had done before (even if only five minutes on the app, while waiting to pick the kids up from school), practice with the code in R Studio (when easy to do). I sometimes listen to a podcast to reinforce the learning, while doing other things (laundry, cooking, driving). And for topics I find more complicated I might have a read. To get anything from listening or reading I need to have seen it on a screen first.
Things were going well, I was whizzing through the practice exercises, getting answers right first time. But based on previous experience it was going a bit too smoothly. DataCamp is like riding my bike on my indoor trainer, a bit too protected, you don’t have to work out what you’ve done wrong, you get told. And you have to be really disciplined to make sure you understand the code, rather than just running through the exercises. This is helpful for keeping you moving at a certain pace while you get used to the basics. But, sticking with the bike analogy, after a while you need to get out on the road, deal with the vagaries of weather and road conditions (and learn how to fix a flat when things go wrong).
Just as I was trying to construct a ‘real’ data project to complete in R Studio, fortune struck. One of my clients, a service delivery organisation, asked me to analyse some data to better understand who comes through their door, and their needs, goals & progress. My report was to be used by operational staff to inform service development. I downloaded the data from Excel into R Studio before exploring, preparing and analyzing the data (reformatting it to meet principles of tidy data, removing duplicates, sub setting, joining, filtering, using the Tidyverse packages, and many functions new to me).
I learnt lots from the project:
• The basics from an online course, and lots of exposure, even if in micro amounts of time gives you enough knowledge and confidence to jump off a linear path and find what you need for the job in front of you. This project was the first time I ever worked with data cleaning and Tidyverse. It was not the most elegant code, but I did it.
• Stack overflow is your friend. Somebody has asked your question before. Just don’t stop at the first answer if it isn’t exactly what you need. You might find your answer buried in the subsequent comments and discussion between contributors. Cheat sheets are also your friend, I managed the basics of dplyr from a cheat sheet.
• Once I knew something worked I found myself defaulting to that code over and over. If possible try to do things a couple of different ways; consider what you prefer and why. For example, when looking for duplicate entries I tried ‘length’, ‘duplicated’ & ‘distinct’ (dplyr).
• I experienced good flow and out of control flow. You definitely need to be able to get stuck in and try things and keep going. But sometimes I disappeared down the rabbit hole, started to guess too much, and had to pull myself out of R Studio to stop and think (with paper and pencil).
• It was much faster and smoother than anything I could have done in Excel. But I am not advanced with Excel.
If you work in research and evaluation, R is very helpful for getting you in the coding game and streamlining your work. Plus my existing skills contributed a lot of value to the process:
• An inherent suspicion when told the data is really complete and really clean!
• How to get a feel for your data –spending some time understanding what is and isn’t there. (One negative with DataCamp was not being able to nose around the datasets in many of the exercises; this made it harder for me to picture what the code was doing to the data).
• Building relationships with people who understand the context of the data (where it comes from; how it’s entered; who uses it, why & how; conventions for dealing with missing data and so on). For example, a variable I initially thought was irrelevant for analysis, told me, very usefully, how many times someone was seen by a service.
• Knowing how to approach version control (e.g. naming objects in a clear, logical, and systematic way, keeping things tidy)
• Valuing transparency and reproducibility (I don’t know how to use R Markdown yet, but it was automatic for me to produce a ‘technical report’ of the steps in the analysis, with corresponding code)
• Communicating results with appropriate caveats, in a way that is accessible to a non-specialist audience.
Working intensively, a few hours a day for two weeks, really developed my understanding of R and my confidence in how to work things out as I go. When possible, immersion definitely has benefits and has left me itching for another project.