Saturday, September 15, 2018

Lesson 3: Vectors, Lists, Matrices, and Data Frames in R

Data structures are next on the list of things we need to discuss because they are central to just about everything we do in R.  Data structures in R come in four main flavors: vectors, lists, matrices and data frames.  We'll discuss each of these data structures this week, along with special cases of vectors, such as scalars and factors, as well as a special case of matrices, called arrays.

Okay, let's get started!

PART #1: VECTORS, SCALARS, AND FACTORS

Vectors are able to hold one kind of data type or mode (e.g. numeric, character).
Vectors can be created using the combine function "c()".
So, let's first create a vector with numeric data.

my_first_vector <- c(1, 2, 3, 4, 5)

Now let's create a vector with character data or strings.

my_second_vector <- c("Orange", "Apple", "Banana", "Pear", "Peach")

To reference an element or value in a vector use an index value enclosed in square brackets "[ ]".  For example, to reference the first element in the vector, use "[1]".

my_second_vector[1]

One can also create subsets of the original vector by using a continuous index range from x to y by using the following syntax "[x:y]".  For example, to create a new vector that is a subset of the second vector's last three items, use an index rand of "[3:5]".

my_third_vector <- my_second_vector[3:5]

You can also create a new vector that is a subset of the original vector's first and last items using the "[c(1,5)]".

my_fouth_vector <- my_second_vector[c(1,5)]


To add a value to a vector and create a new vector, use the combine function "c()" once again, referencing the original vector in the combine function.

my_fifth_vector <- c(my_second_vector, "Apricot")

Next we'll discuss Scalars.  Scalars are vectors with only one element
Pi in R is stored as a scalar.  You can test this by referencing it's "first" element

pi[1]

Factors are similar to vectors except a factor stores each unique value in the vector as a 'level' or category.  Factors can used to label or categorize your data.

Let's turn the fifth vector we created into a list of factors.

my_first_factors <- factor(my_fifth_vector)

PART #2: LISTS

Unlike vectors, lists are able to hold more than one kind of data type or mode (e.g. numeric, character).
Let's create a list with numeric and character data modes using the "list()" function.

my_first_list <- list(1, 2, 3, "car", "truck", "van")

Referencing values or elements in a list is similar to referencing element in a vector.  To reference the fourth element in the list use "[[4]]"

my_first_list[[4]]

Similar to vectors, one can also create subsets of the original list by using a continuous index range from x to y by using the following syntax "[x:y]".  For example, To create a new list that is a subset of the original list's last three items use the "[3:5]".

my_second_list <- my_first_list[3:5]

And again, similar to vectors, to create a new list that is a subset of the original list's first and last items use "[c(1,5)]".

my_third_list <- my_first_list[c(1,5)]

Finally, similar to vectors, to add a value to a list and create a new list, use the combine function "c()" once again, referencing the original list in the combine function.

my_fourth_list <- c(my_first_list, "suv")

PART #3: MATRICES AND ARRAYS

You may or may not remember matrices from high school or college math.  Matrices are (typically) numeric vectors with one or more dimensions.  A matrix can be produced from multiple vectors.

a <- c(1,2,3)
b <- c(4,5,6)
c <- c(7,8,9)

To create a matrix we use the "matrix()" function.  First, you should know that there are three parameters used by the "matrix()" function that define what the matrix looks like.  The "data" parameter allows you to specify what vectors to use for data using the "c()" function. The "nrow" and "ncol" allow you tell R how many rows and columns define the matrix.

So, let's create a matrix using the "matrix()" function and the three vectors above.

my_first_matrix <- matrix(data = c(a,b,c), nrow = 3, ncol = 3)

Similar to vectors and lists you can reference specific values in the matrix.  For instance, to reference the item in the first row of the third column in the matrix use "[1,3]".

my_first_matrix[1,3]

Now for the next exercise, we'll need to create a fourth vector.

d <- c(10,11,12)

To add the new vector "d" to "my_first_matrix" as an additional column you need to use the column bind or "cbind" function.

my_second_matrix <- cbind(my_first_matrix,d)

To add the new vector 'd' to 'my_first_matrix' as an additional row you need to use the row bind or "rbind" function.

my_third_matrix <- rbind(my_first_matrix,d)


Next, let's talk about arrays.  Arrays are just matrices with 3 or more dimensions.
To make our first array, let's make a vector with 27 items.  For example number values from 1 to 27.

my_first_array <- c(1:27)

Now to create the array, let's divide the vector up into three 3 x 3 matrices and stack them using the dimension or "dim" function and combine or "c()" function to specify how to structure the vector in to a 3-dimensional array.

dim(my_first_array) <- c(3,3,3)

Review the results of your first first array.

print(my_first_array)

PART #4: DATA FRAMES

Data frames are the last data structure we'll discuss this week.  Data frames are designed to hold tabular data.  They are similar to spreadsheets.  They have columns and rows and can store all data types.  Data frames can be built from vectors, lists, or matrices.  Let's build our first data frame from "my_first_vector" and "my_second_vector" using the "data.frame" function.  For this example, we're going also going to set a parameter called "stringsAsFactors" to FALSE.  This will make sure that the string data is saved as "character" data type rather than "factor" data type.

my_first_df <- data.frame(my_first_vector, my_second_vector, stringsAsFactors = FALSE)

We can also create data frames from lists, so we'll do that next
But, before we do that we have to create a list of vectors.

my_vector_list <- list(my_first_vector, my_second_vector)

When using lists to create a data frame, you need to use the "as.data.frame" function rather than the "data.frame" function.

my_other_df <- as.data.frame(my_vector_list, stringsAsFactors = FALSE)

Since the default column names are the vector names, let's rename the columns.

names(my_first_df) <- c("Quantity", "Fruit")

Using matrix notation, let see what's in row 3.

my_first_df[3,]

Now, Let see what's in column 2.

my_first_df[,2]

OR use the name of column 2 instead to also see what's in it.

my_first_df$Fruit

Next, Let's see what's in row 4, column 2.

my_first_df[4,2]

Finally, let's change the value in row 4, column 2 from "Pear" to "Plum".

my_first_df[4,2] <- "Plum"


Only one more step.  Let's save that script you just created by clicking on "File" -> "Save As..." and let's name your script "R_Lesson3.R"  Click "Save".
Congratulations!  You've completed Lesson 3!


DOWNLOAD CODE Here is the code from my GitHub gist "R Lesson 3 - Vectors, Lists, Matrices, and Data Frames in R" in case you'd rather just copy and paste it and then play around with it.



No comments:

Post a Comment

Note: Only a member of this blog may post a comment.