Tutorial 1: First Steps with R

These tutorials provide a self-contained introduction to the R language, a powerful and widely-used environment for working with data.

AimsPermalink

By the end of this first tutorial, you will be able to

  • work confidently in the R console.
  • define and manipulate variables
  • make simple graphics.

InstallationPermalink

You can obtain a free copy of R from www.r-project.org. Install files are available for Windows, Mac and Linux.

In this tutorial, we will use RStudio, a nicely designed interface for working with the R language. It can be freely downloaded from rstudio.com

The R consolePermalink

When you first open RStudio, you will see an environment with several windows.

The console is where commands are entered. By default, it will appear bottom left. The console will be our main way of interacting with R. Type the following three commands into the console, one at a time. Check that the output is what you expect.

2+3
4^2
sin(pi/2)

R stores data as objects. At its simplest, we can store a value in an object using the following syntax

x = 4

Type the command above into the console, noting any changes in the Environment tab in the top right of the RStudio window.

A common alternative way to assign objects in R is

x <- 4

This works just as well. But it is quite peculiar for those who are used to other languages.

To access the contents of a variable, enter its name in the console.

x

Variables containing numeric values can be manipulated as numbers. Check that the commands below behave as you expect.

x + x
3 * x
x ^ 2
sqrt(x)
x / x

A variable can be removed using the command rm.

rm( x ) 

ArraysPermalink

Typically, we want to work with data sets that contain many values. In R, data are stored as arrays. Most simply, we have vectors that are defined as follows.

a = c( 1, -2.3, 4.5, 9, 150 )

c stands for combine.

FunctionsPermalink

Properties of a variable can be extracted with a function, e.g.

length(a)

returns the length of the vector a.

HelpPermalink

Any built-in R function comes with an extensive help file

?length

The help file explains how to use the function. It will always contain a working example that you can adapt.

SubsettingPermalink

We often need to access particular entries of a vector. Try to understand what the following commands achieve.

a[3]
a[c(2, 4)]
a[c(2 : 4)]
a[-3]
a[-c(1, 2)]

Working with dataPermalink

As a first example, we will use the iris dataset, which is supplied with R. Background information is available from the help

?iris

First, we display the data as a rectangular array, which will appear in a separate window. There are 150 observations, and five different variables. An observation corresponds to a row, and a variable corresponds to a column.

View(iris)

Four of the variables are numerical, and one, species, is categorical. We can extract entries from the array as follows.

To obtain the values for the first observation, we extract the first row

iris[ 1 ,]
##   Sepal.Length Sepal.Width Petal.Length Petal.Width Species
## 1          5.1         3.5          1.4         0.2  setosa

To obtain the values for the first variable, we extract the first column

iris[ , 1 ]
##   [1] 5.1 4.9 4.7 4.6 5.0 5.4 4.6 5.0 4.4 4.9 5.4 4.8 4.8 4.3 5.8 5.7 5.4 5.1
##  [19] 5.7 5.1 5.4 5.1 4.6 5.1 4.8 5.0 5.0 5.2 5.2 4.7 4.8 5.4 5.2 5.5 4.9 5.0
##  [37] 5.5 4.9 4.4 5.1 5.0 4.5 4.4 5.0 5.1 4.8 5.1 4.6 5.3 5.0 7.0 6.4 6.9 5.5
##  [55] 6.5 5.7 6.3 4.9 6.6 5.2 5.0 5.9 6.0 6.1 5.6 6.7 5.6 5.8 6.2 5.6 5.9 6.1
##  [73] 6.3 6.1 6.4 6.6 6.8 6.7 6.0 5.7 5.5 5.5 5.8 6.0 5.4 6.0 6.7 6.3 5.6 5.5
##  [91] 5.5 6.1 5.8 5.0 5.6 5.7 5.7 6.2 5.1 5.7 6.3 5.8 7.1 6.3 6.5 7.6 4.9 7.3
## [109] 6.7 7.2 6.5 6.4 6.8 5.7 5.8 6.4 6.5 7.7 7.7 6.0 6.9 5.6 7.7 6.3 6.7 7.2
## [127] 6.2 6.1 6.4 7.2 7.4 7.9 6.4 6.3 6.1 7.7 6.3 6.4 6.0 6.9 6.7 6.9 5.8 6.8
## [145] 6.7 6.7 6.3 6.5 6.2 5.9

Note that we can also refer to the variables by their names

iris$Sepal.Length
##   [1] 5.1 4.9 4.7 4.6 5.0 5.4 4.6 5.0 4.4 4.9 5.4 4.8 4.8 4.3 5.8 5.7 5.4 5.1
##  [19] 5.7 5.1 5.4 5.1 4.6 5.1 4.8 5.0 5.0 5.2 5.2 4.7 4.8 5.4 5.2 5.5 4.9 5.0
##  [37] 5.5 4.9 4.4 5.1 5.0 4.5 4.4 5.0 5.1 4.8 5.1 4.6 5.3 5.0 7.0 6.4 6.9 5.5
##  [55] 6.5 5.7 6.3 4.9 6.6 5.2 5.0 5.9 6.0 6.1 5.6 6.7 5.6 5.8 6.2 5.6 5.9 6.1
##  [73] 6.3 6.1 6.4 6.6 6.8 6.7 6.0 5.7 5.5 5.5 5.8 6.0 5.4 6.0 6.7 6.3 5.6 5.5
##  [91] 5.5 6.1 5.8 5.0 5.6 5.7 5.7 6.2 5.1 5.7 6.3 5.8 7.1 6.3 6.5 7.6 4.9 7.3
## [109] 6.7 7.2 6.5 6.4 6.8 5.7 5.8 6.4 6.5 7.7 7.7 6.0 6.9 5.6 7.7 6.3 6.7 7.2
## [127] 6.2 6.1 6.4 7.2 7.4 7.9 6.4 6.3 6.1 7.7 6.3 6.4 6.0 6.9 6.7 6.9 5.8 6.8
## [145] 6.7 6.7 6.3 6.5 6.2 5.9

R makes it easy to make simple graphics to visualize relationships in data. As a simple example, let’s look at the relationship between the length and width of iris petals

plot(iris$Petal.Length, iris$Petal.Width )

Note the (entirely expected) positive correlation between the two variables.

We can add more information by colouring the points according to the species variable. To make the code easier to read, the arguments to the function appear on separate lines.

plot(iris$Petal.Length,
     iris$Petal.Width,
     col = iris$Species)

Finally, let’s smarten up the plot by adding labels and a legend.

plot(iris$Petal.Length,
     iris$Petal.Width,
     col = iris$Species,
     main = "Anderson's Iris Data",
     xlab = "petal length",
     ylab = "petal width")

legend("topleft",
       legend=c("setosa","versicolor","virginica"),
       col=c("black", "red","green"),
       pch=1)

PackagesPermalink

You will need to install packages to make some functions in these tutorials work. For instance, we install the common graphics library ggplot2 as follows

install.packages("ggplot2")

Note the quotation marks.

This only needs to be done once. Whenever the package is used in a new session, it must be loaded using the command

library(ggplot2)

Next stepsPermalink

In the next tutorial, we will work with some data. After completing this tutorial, you will be able to

  • load in data.
  • make exploratory graphics.
  • perform elementary manipulations on a data frame.

Updated: