This is third in the series of blog posts related to learning materials centered around Big Data. So far, we have dealt with some of the necessary mathematical tools required in the data scientist’s arsenal. We have also talked about some of the software used by the discerning analyst. In this post, we will teach you the basics of the R programming language.
This tutorial introduces you to the basic workings of R: the go-to language for data analysis. We assume no previous knowledge of R or programming in general. All that you need is a bit of enthusiasm, eagerness to learn and a bit of moxie!
Installing R is very easy. R is cross-platform and as such runs on a variety of OSes such as UNIX, Windows and MacOS. To get started, choose your operating system and download the appropriate version of R.
Once you have R downloaded and installed, it’s time to get your hands dirty. You can invoke R using the command line, or use an IDE like RStudio. We recommend that you use RStudio since that makes your life much easier. 🙂 Here is some information about how to download and use RStudio.
A package in R is a collection of precompiled functions, data and code. These packages are stored in a directory known as a library. R comes pre-installed with a set of standard packages.
Other packages can be downloaded and installed. Here is the complete list of R packages along with extensive documentation.
Open R or RStudio and type the following command into the console:
The package should just install. (The > symbol at the beginning of the line indicates the command prompt. Begin typing after the >.)
Note for Linux users: if you are using Linux without root access, the command may not work! Here is how to install packages without becoming root user.
Once you have downloaded the package, it’s time to load it into the session. Every time you want to use a particular package, you must load it into memory. Here is the command that enables you to do just that:
Basic R tutorial
It’s time to learn R in R. (Very meta isn’t it?) The swirl R package is a fun way to get started with the basics of R. The package turns your R console into an interactive learning environment. Here are the commands to help you get started on your journey:
Congratulations! You are now well on the way to becoming an expert.
Next up we will conclude our series by an introduction to Hadoop, a distributing computing and storage framework for very large datasets. We hope to see you there! Once again, we need to thank Dawar Dedmari for his valuable inputs. Without his suggestions, these posts would never have materialized.