# create a numeric variable number_1
<- 3
a a
[1] 3
R is a language and environment for statistical computing, data analysis, visualisation and graphics and many more. It is a free and open source software, under the terms of GNU General Public License.
R runs on a wide variety of platforms, including Windows, Linux and MacOS.
?functionName
in the consoleTo create a variable, you type variable_name <- variable_value
in the console.
You can carry out **mathematical calculation8* on numeric variables, such as exponentiation, addition, division and many more.
In R, there are a few types of variables. The ones you will interact with are:
Note that code that start with #
are comments, and are not evaluated.
To evaluate (or return) the variable you have created, you can either type the name of the variable, or print()
with the variable name inside the bracket.
You can check the variable type using class(variable_name)
:
It is good practice to give your variable a name that is both easy to understand, and also valid.
VariableA
is not the same as variablea
variable3
, but NOT 22variable
Avoid the following:
var.A
, var$A
have special meanings in R.function
, list
and so on. If you really can’t think of a better name, you can use names my_function
, list_1
to avoid the ambiguity.A vector is a list of values; it can be numeric, and also characters and logical.
To create a vector, use function c()
.
[1] 1 2 3 4 5
[1] "student_a" "student_b" "student_c"
[1] TRUE FALSE TRUE FALSE
There are some shortcuts to create a sequence of values; not required to learn, but very useful.
# numeric
# num_vector <- c(1, 2, 3, 4, 5)
num_vector <- 1:5 # from 1 to 5
seq(from = 1, to = 11, by = 2) # from 1 to 11, with 2 between each
[1] 1 3 5 7 9 11
[1] 1 1 1 1 1
# character
# char_vector <- c('student_a', 'student_b', 'student_c')
char_vector <- paste0('student_', c('a', 'b', 'c'))
char_vector
[1] "student_a" "student_b" "student_c"
In a vector, types of the elements must be the same. If you try to combine multiple types of variables in the same vector, such as a numeric number and a character, R will try to convert them into the same type.
Try to combine the following values into a vector, and see what happens.
You can combine multiple vectors using c()
. For example, vec1
has 3 elements, vec2
has 2 elements (assuming that they are of the same type), combining them gives 5 elements.
A matrix can be thought of as a stack of vectors. When you collect data from \(n\) patients (or subjects), you measure a few aspects on each patient such as age, sex, height and smoking. Let’s say you have measured \(p\) aspects. This forms a matrix of size \(n \times p\).
You might not need to create a matrix from scratch in R (because the focus of this course is data analysis); but it is helpful to understand some basic data manipulation commands.
You can create a matrix using matrix()
, with some parameters:
[,1] [,2]
[1,] 1 2
[2,] 3 4
You can also create a matrix by combining two vectors of the same size, using cbind()
or rbind()
, which stands for “column bind” and “rowbind”.
Dataframe, data.frame
is a format of data commonly used in data analysis with R and python. It can be considered as a matrix, but allows a mixture of data types, such as numeric and categorical measurements (age and sex).
In this course, you will mostly be working with dataframes.
We create a small dataframe of 3 subjects:
This is how you can present the dataframe, where each column has a different data type.
You can find the size of a vector with length()
.
For a matrix or dataframe, you can use dim()
. It will return nrow ncol
, number of rows and number of columns.
[1] 2
[1] 2 2
[1] 3 3
dim()
or length()
If you use dim()
on a vector, it returns NULL
. Given that a vector is just a matrix with 1 row (or column), this seems insensible.
Nonetheless, dim()
works on matrix objects. if you convert the vector into a matrix with nrow =1
or ncol = 1
, dim()
will work.
If you use length()
on a matrix, it will return the total number of elements, i.e. ncol times nrow.
You can also use nrow()
, ncol()
to get the number of rows and columns explicitly.
For a vector, you can access
Sometimes you might need to combine previous knowledge to get what you want (e.g. to know how many elements in total there are).
[1] "c"
[1] "c" "e"
[1] "e" "f" "g" "h"
For a matrix,
matrix[r, c]
to get the element on \(r\)-th row, \(c\)-th column.matrix[r, ]
, matrix[, c]
to get all the elements on \(r\)-th row or \(c\)-th column [,1] [,2] [,3]
[1,] 1 2 3
[2,] 4 5 6
[3,] 7 8 9
[1] 6
[1] 1 2 3
For a dataframe,
data$column_name
, or data['column_name']
to access the entire columnConventionally, each row is a subject, and each columnn is a variable (or aspect of measurement, feature, characteristic, risk factor etc).
age sex has_covid
1 20 male TRUE
2 50 female TRUE
3 32 male FALSE
age sex has_covid
1 20 male TRUE
[1] "male" "female" "male"
[1] 20 50 32
age
1 20
2 50
3 32
You might have a task where you need to filter elements based on another variable: for example, select the age
based on sex
. This task is done in 2 steps:
sex
, call it sex_indicator
age
vector, corresponding to sex_ind == TRUE
. (The operator ==
evaluates whether the criteria is met)The following example illustrates this process. You will use this a few times in the course, for example to select the height measured for men and women.
Modifying an existing data is easy, but you should be aware of the risks. In this class we only modify data we created in the class so there is little risk, but you might have your own datasets to analyse in the future.
You should keep your original data in a safe place, and work on copies of it.
Version control is a good skill to learn.
[1] "a" "b" "c" "d" "E" "f" "g" "h"
[1] 1
[,1] [,2] [,3]
[1,] 20 2 3
[2,] 4 5 6
[3,] 7 8 9
age sex has_covid
1 20 male TRUE
2 50 female TRUE
3 32 male FALSE
age sex has_covid
1 20 male TRUE
2 50 female FALSE
3 32 male FALSE
Before importing a dataset, you need to know where it is, and how to tell R to find it in your file system.
You can think of the working directory as the folder where R looks for (and saves) your scripts by default.
You can check where your working directory by running the following command.
You can manually set this to a folder of your choosing by setwd(path)
.
It is recommanded to use R project. It sets a folder just for the current tasks you work on, so that you do not need to set the working directory every time you open RStudio. Read more about how to create an R project.
Data exist in different formats,
csv
is one of the most commonly used data format for tabular data. If possible, it is a good idea to use this data format as it is readable by different languages and softwaresxlsx
is also good for storing tabular data; however it is slightly more complicated than csv
.rda
can be used to store R data (such as lists, higher dimensional arrays);dta
created by STATA), and they would require some specific R packages to load in.It is difficult to summarise all the data formats here, so you should check the documentation on how to import and write (save) data of different types.
id low age lwt eth smk ptl ht ui fvt ttv bwt
1 4 bwt <= 2500 28 120 other smoker 1 no yes 0 0 709
2 10 bwt <= 2500 29 130 white nonsmoker 0 no yes 2 0 1021
3 11 bwt <= 2500 34 187 black smoker 0 yes no 0 0 1135
4 13 bwt <= 2500 25 105 other nonsmoker 1 yes no 0 0 1330
5 15 bwt <= 2500 25 85 other nonsmoker 0 no yes 0 4 1474
6 16 bwt <= 2500 27 150 other nonsmoker 0 no no 0 5 1588
7 17 bwt <= 2500 23 97 other nonsmoker 0 no yes 1 5 1588
8 18 bwt <= 2500 24 128 black nonsmoker 1 no no 1 2 1701
9 19 bwt <= 2500 24 132 other nonsmoker 0 yes no 0 5 1729
10 20 bwt <= 2500 21 165 white smoker 0 yes no 1 4 1790
11 22 bwt <= 2500 32 105 white smoker 0 no no 0 0 1818
12 23 bwt <= 2500 19 91 white smoker 2 no yes 0 12 1885
13 24 bwt <= 2500 25 115 other nonsmoker 0 no no 0 3 1893
14 25 bwt <= 2500 16 130 other nonsmoker 0 no no 1 4 1899
15 26 bwt <= 2500 25 92 white smoker 0 no no 0 4 1928
16 27 bwt <= 2500 20 150 white smoker 0 no no 2 5 1928
17 28 bwt <= 2500 21 200 black nonsmoker 0 no yes 2 4 1928
18 29 bwt <= 2500 24 155 white smoker 1 no no 0 6 1936
19 30 bwt <= 2500 21 103 other nonsmoker 0 no no 0 5 1970
20 31 bwt <= 2500 20 125 other nonsmoker 0 no yes 0 2 2055
21 32 bwt <= 2500 25 89 other nonsmoker 2 no no 1 4 2055
22 33 bwt <= 2500 19 102 white nonsmoker 0 no no 2 3 2082
23 34 bwt <= 2500 19 112 white smoker 0 no yes 0 4 2084
24 35 bwt <= 2500 26 117 white smoker 1 no no 0 7 2084
25 36 bwt <= 2500 24 138 white nonsmoker 0 no no 0 1 2100
26 37 bwt <= 2500 17 130 other smoker 1 no yes 0 9 2125
27 40 bwt <= 2500 20 120 black smoker 0 no no 3 6 2126
28 42 bwt <= 2500 22 130 white smoker 1 no yes 1 4 2187
29 43 bwt <= 2500 27 130 black nonsmoker 0 no yes 0 6 2187
30 44 bwt <= 2500 20 80 other smoker 0 no yes 0 6 2211
31 45 bwt <= 2500 17 110 white smoker 0 no no 0 5 2225
32 46 bwt <= 2500 25 105 other nonsmoker 1 no no 1 5 2240
33 47 bwt <= 2500 20 109 other nonsmoker 0 no no 0 5 2240
34 49 bwt <= 2500 18 148 other nonsmoker 0 no no 0 3 2282
35 50 bwt <= 2500 18 110 black smoker 1 no no 0 4 2296
36 51 bwt <= 2500 20 121 white smoker 1 no yes 0 4 2296
37 52 bwt <= 2500 21 100 other nonsmoker 1 no no 4 0 2301
38 54 bwt <= 2500 26 96 other nonsmoker 0 no no 0 6 2325
39 56 bwt <= 2500 31 102 white smoker 1 no no 1 5 2353
40 57 bwt <= 2500 15 110 white nonsmoker 0 no no 0 3 2353
41 59 bwt <= 2500 23 187 black smoker 0 no no 1 5 2367
42 60 bwt <= 2500 20 122 black smoker 0 no no 0 4 2381
43 61 bwt <= 2500 24 105 black smoker 0 no no 0 3 2381
44 62 bwt <= 2500 15 115 other nonsmoker 0 no yes 0 4 2381
45 63 bwt <= 2500 23 120 other nonsmoker 0 no no 0 2 2395
46 65 bwt <= 2500 30 142 white smoker 1 no no 0 4 2410
47 67 bwt <= 2500 22 130 white smoker 0 no no 1 2 2410
48 68 bwt <= 2500 17 120 white smoker 0 no no 3 6 2414
49 69 bwt <= 2500 23 110 white smoker 1 no no 0 9 2424
50 71 bwt <= 2500 17 120 black nonsmoker 0 no no 2 6 2438
51 75 bwt <= 2500 26 154 other nonsmoker 1 yes no 1 10 2442
52 76 bwt <= 2500 20 105 other nonsmoker 0 no no 3 6 2450
53 77 bwt <= 2500 26 190 white smoker 0 no no 0 4 2466
54 78 bwt <= 2500 14 101 other smoker 1 no no 0 7 2466
55 79 bwt <= 2500 28 95 white smoker 0 no no 2 7 2466
56 81 bwt <= 2500 14 100 other nonsmoker 0 no no 2 6 2495
57 82 bwt <= 2500 23 94 other smoker 0 no no 0 4 2495
58 83 bwt <= 2500 17 142 black nonsmoker 0 yes no 0 2 2495
59 84 bwt <= 2500 21 130 white smoker 0 yes no 3 4 2495
60 85 bwt > 2500 19 182 black nonsmoker 0 no yes 0 4 2523
61 86 bwt > 2500 33 155 other nonsmoker 0 no no 3 6 2551
62 87 bwt > 2500 20 105 white smoker 0 no no 1 10 2557
63 88 bwt > 2500 21 108 white smoker 0 no yes 2 10 2594
64 89 bwt > 2500 18 107 white smoker 0 no yes 0 2 2600
65 91 bwt > 2500 21 124 other nonsmoker 0 no no 0 5 2622
66 92 bwt > 2500 22 118 white nonsmoker 0 no no 1 1 2637
67 93 bwt > 2500 17 103 other nonsmoker 0 no no 1 7 2637
68 94 bwt > 2500 29 123 white smoker 0 no no 1 4 2663
69 95 bwt > 2500 26 113 white smoker 0 no no 0 2 2665
70 96 bwt > 2500 19 95 other nonsmoker 0 no no 0 4 2722
71 97 bwt > 2500 19 150 other nonsmoker 0 no no 1 9 2733
72 98 bwt > 2500 22 95 other nonsmoker 0 yes no 0 10 2750
73 99 bwt > 2500 30 107 other nonsmoker 1 no yes 2 17 2750
74 100 bwt > 2500 18 100 white smoker 0 no no 0 0 2769
75 101 bwt > 2500 18 100 white smoker 0 no no 0 0 2769
76 102 bwt > 2500 15 98 black nonsmoker 0 no no 0 7 2778
77 103 bwt > 2500 25 118 white smoker 0 no no 3 7 2782
78 104 bwt > 2500 20 120 other nonsmoker 0 no yes 0 4 2807
79 105 bwt > 2500 28 120 white smoker 0 no no 1 6 2821
80 106 bwt > 2500 32 121 other nonsmoker 0 no no 2 10 2835
81 107 bwt > 2500 31 100 white nonsmoker 0 no yes 3 4 2835
82 108 bwt > 2500 36 202 white nonsmoker 0 no no 1 7 2836
83 109 bwt > 2500 28 120 other nonsmoker 0 no no 0 8 2863
84 111 bwt > 2500 25 120 other nonsmoker 0 no yes 2 10 2877
85 112 bwt > 2500 28 167 white nonsmoker 0 no no 0 12 2877
86 113 bwt > 2500 17 122 white smoker 0 no no 0 9 2906
87 114 bwt > 2500 29 150 white nonsmoker 0 no no 2 4 2920
88 115 bwt > 2500 26 168 black smoker 0 no no 0 6 2920
89 116 bwt > 2500 17 113 black nonsmoker 0 no no 1 12 2920
90 117 bwt > 2500 17 113 black nonsmoker 0 no no 1 12 2920
91 118 bwt > 2500 24 90 white smoker 1 no no 1 1 2948
92 119 bwt > 2500 35 121 black smoker 1 no no 1 11 2948
93 120 bwt > 2500 25 155 white nonsmoker 0 no no 1 5 2977
94 121 bwt > 2500 25 125 black nonsmoker 0 no no 0 4 2977
95 123 bwt > 2500 29 140 white smoker 0 no no 2 7 2977
96 124 bwt > 2500 19 138 white smoker 0 no no 2 2 2977
97 125 bwt > 2500 27 124 white smoker 0 no no 0 3 2992
98 126 bwt > 2500 31 215 white smoker 0 no no 2 11 3005
99 127 bwt > 2500 33 109 white smoker 0 no no 1 6 3033
100 128 bwt > 2500 21 185 black smoker 0 no no 2 8 3042
101 129 bwt > 2500 19 189 white nonsmoker 0 no no 2 4 3062
102 130 bwt > 2500 23 130 black nonsmoker 0 no no 1 4 3062
103 131 bwt > 2500 21 160 white nonsmoker 0 no no 0 11 3062
104 132 bwt > 2500 18 90 white smoker 0 no yes 0 6 3076
105 133 bwt > 2500 18 90 white smoker 0 no yes 0 6 3076
106 134 bwt > 2500 32 132 white nonsmoker 0 no no 4 7 3080
107 135 bwt > 2500 19 132 other nonsmoker 0 no no 0 3 3090
108 136 bwt > 2500 24 115 white nonsmoker 0 no no 2 5 3090
109 137 bwt > 2500 22 85 other smoker 0 no no 0 5 3090
110 138 bwt > 2500 22 120 white nonsmoker 0 yes no 1 3 3100
111 139 bwt > 2500 23 128 other nonsmoker 0 no no 0 8 3104
112 140 bwt > 2500 22 130 white smoker 0 no no 0 4 3132
113 141 bwt > 2500 30 95 white smoker 0 no no 2 4 3147
114 142 bwt > 2500 19 115 other nonsmoker 0 no no 0 7 3175
115 143 bwt > 2500 16 110 other nonsmoker 0 no no 0 3 3175
116 144 bwt > 2500 21 110 other smoker 0 no yes 0 7 3203
117 145 bwt > 2500 30 153 other nonsmoker 0 no no 0 6 3203
118 146 bwt > 2500 20 103 other nonsmoker 0 no no 0 5 3203
119 147 bwt > 2500 17 119 other nonsmoker 0 no no 0 9 3225
120 148 bwt > 2500 17 119 other nonsmoker 0 no no 0 9 3225
121 149 bwt > 2500 23 119 other nonsmoker 0 no no 2 5 3232
122 150 bwt > 2500 24 110 other nonsmoker 0 no no 0 6 3232
123 151 bwt > 2500 28 140 white nonsmoker 0 no no 0 4 3234
124 154 bwt > 2500 26 133 other smoker 2 no no 0 3 3260
125 155 bwt > 2500 20 169 other nonsmoker 1 no yes 1 8 3274
126 156 bwt > 2500 24 115 other nonsmoker 0 no no 2 11 3274
127 159 bwt > 2500 28 250 other smoker 0 no no 6 13 3303
128 160 bwt > 2500 20 141 white nonsmoker 2 no yes 1 7 3317
129 161 bwt > 2500 22 158 black nonsmoker 1 no no 2 5 3317
130 162 bwt > 2500 22 112 white smoker 2 no no 0 7 3317
131 163 bwt > 2500 31 150 other smoker 0 no no 2 7 3321
132 164 bwt > 2500 23 115 other smoker 0 no no 1 10 3331
133 166 bwt > 2500 16 112 black nonsmoker 0 no no 0 11 3374
134 167 bwt > 2500 16 135 white smoker 0 no no 0 3 3374
135 168 bwt > 2500 18 229 black nonsmoker 0 no no 0 6 3402
136 169 bwt > 2500 25 140 white nonsmoker 0 no no 1 8 3416
137 170 bwt > 2500 32 134 white smoker 1 no no 4 7 3430
138 172 bwt > 2500 20 121 black smoker 0 no no 0 6 3444
139 173 bwt > 2500 23 190 white nonsmoker 0 no no 0 3 3459
140 174 bwt > 2500 22 131 white nonsmoker 0 no no 1 7 3460
141 175 bwt > 2500 32 170 white nonsmoker 0 no no 0 4 3473
142 176 bwt > 2500 30 110 other nonsmoker 0 no no 0 8 3475
143 177 bwt > 2500 20 127 other nonsmoker 0 no no 0 3 3487
144 179 bwt > 2500 23 123 other nonsmoker 0 no no 0 10 3544
145 180 bwt > 2500 17 120 other smoker 0 no no 0 7 3572
146 181 bwt > 2500 19 105 other nonsmoker 0 no no 0 8 3572
147 182 bwt > 2500 23 130 white nonsmoker 0 no no 0 4 3586
148 183 bwt > 2500 36 175 white nonsmoker 0 no no 0 12 3600
149 184 bwt > 2500 22 125 white nonsmoker 0 no no 1 13 3614
150 185 bwt > 2500 24 133 white nonsmoker 0 no no 0 7 3614
151 186 bwt > 2500 21 134 other nonsmoker 0 no no 2 8 3629
152 187 bwt > 2500 19 235 white smoker 0 yes no 0 5 3629
153 188 bwt > 2500 25 95 white smoker 3 no yes 0 8 3637
154 189 bwt > 2500 16 135 white smoker 0 no no 0 2 3643
155 190 bwt > 2500 29 135 white nonsmoker 0 no no 1 4 3651
156 191 bwt > 2500 29 154 white nonsmoker 0 no no 1 5 3651
157 192 bwt > 2500 19 147 white smoker 0 no no 0 4 3651
158 193 bwt > 2500 19 147 white smoker 0 no no 0 4 3651
159 195 bwt > 2500 30 137 white nonsmoker 0 no no 1 5 3699
160 196 bwt > 2500 24 110 white nonsmoker 0 no no 1 8 3728
161 197 bwt > 2500 19 184 white smoker 0 yes no 0 7 3756
162 199 bwt > 2500 24 110 other nonsmoker 1 no no 0 10 3770
163 200 bwt > 2500 23 110 white nonsmoker 0 no no 1 4 3770
164 201 bwt > 2500 20 120 other nonsmoker 0 no no 0 2 3770
165 202 bwt > 2500 25 241 black nonsmoker 0 yes no 0 10 3790
166 203 bwt > 2500 30 112 white nonsmoker 0 no no 1 5 3799
167 204 bwt > 2500 22 169 white nonsmoker 0 no no 0 7 3827
168 205 bwt > 2500 18 120 white smoker 0 no no 2 6 3856
169 206 bwt > 2500 16 170 black nonsmoker 0 no no 4 8 3860
170 207 bwt > 2500 32 186 white nonsmoker 0 no no 2 6 3860
171 208 bwt > 2500 18 120 other nonsmoker 0 no no 1 13 3884
172 209 bwt > 2500 29 130 white smoker 0 no no 2 8 3884
173 210 bwt > 2500 33 117 white nonsmoker 0 no yes 1 2 3912
174 211 bwt > 2500 20 170 white smoker 0 no no 0 4 3940
175 212 bwt > 2500 28 134 other nonsmoker 0 no no 1 8 3941
176 213 bwt > 2500 14 135 white nonsmoker 0 no no 0 8 3941
177 214 bwt > 2500 28 130 other nonsmoker 0 no no 0 8 3969
178 215 bwt > 2500 25 120 white nonsmoker 0 no no 2 7 3983
179 216 bwt > 2500 16 95 other nonsmoker 0 no no 1 10 3997
180 217 bwt > 2500 20 158 white nonsmoker 0 no no 1 6 3997
181 218 bwt > 2500 26 160 other nonsmoker 0 no no 0 9 4054
182 219 bwt > 2500 21 115 white nonsmoker 0 no no 1 5 4054
183 220 bwt > 2500 22 129 white nonsmoker 0 no no 0 4 4111
184 221 bwt > 2500 25 130 white nonsmoker 0 no no 2 9 4153
185 222 bwt > 2500 31 120 white nonsmoker 0 no no 2 7 4167
186 223 bwt > 2500 35 170 white nonsmoker 1 no no 1 6 4174
187 224 bwt > 2500 19 120 white smoker 0 no no 0 3 4238
188 225 bwt > 2500 24 116 white nonsmoker 0 no no 1 7 4593
189 226 bwt > 2500 45 123 white nonsmoker 0 no no 1 5 4990