Reading delimited file data sets into R is pretty straight forward. R includes several base functions that allow you to easily read your delimited files directly into a data frame.
Similarly, reading and writing excel data is easy with the right R libraries. Setting-up the libraries requires a few extra steps. First we need to install two library packages. Then we can load the needed libraries into our R session and access the required R functions to read and write excel files. We'll get into the nitty gritty details of this a little later in the post.
Prep Work
Before we get into the fun stuff, we need to save some files to our desktop.You can download the the example files here.
There are four files:
- my_doc.txt
- tab_delim.txt
- comma_delim.csv
- excel_example.xls
Part #1: Reading & Writing Text Files
Okay, so now that you have your files saved to your desktop, set your working directory to your desktop.setwd("~/Desktop/")
Next, you can use the file I gave you in the link above or you try and create a simple text file using R that contain the infamous sentence "The quick brown fox jumps over the lazy dog."
To create the file, first create a string variable called "contents" with the "infamous sentence" stored in it.
contents <- "The quick brown fox jumps over the lazy dog."
Now, write that "infamous sentence" to a file called "my_doc.txt".
writeLines(contents, "my_doc.txt")
So, good job, you just created a text file using R. Now let's read in that same text file using the "readLines" function and store it in a string variable called "read_it".
First create a variable called "my_text_file" where you will store the name of the newly created text file "my_doc.txt".
my_text_file <- "my_doc.txt"
Next, read in and store the contents of the text file into a string variable.
read_it <- readLines(my_text_file)
Now let's create a copy of the file by using the "writeLines" function and save the new file as "my_doc2.txt".
writeLines(read_it, "my_doc2.txt")
Another faster way to make a copy of a file is using the "file.copy" function.
We covered file & directory operations in Lesson 2 (check it out for more info).
Using the "file.copy" function save a copy of "my_doc.txt" as "my_doc3.txt".
file.copy(from = "my_doc.txt", to = "my_doc3.txt")
Part #2: Reading & Writing Delimited Files
Next we're going to discuss dealing with delimited file data. Reading & writing delimited files, whether they are tab, comma, or semi-colon delimited is easy in R. R comes loaded ready to go with base functions that allow us to read in this type of data.So let's get started! Use the "read.delim" function and specify the delimiter using the "sep" argument. For example, let's read in a tab delimited file called "tab_delim.txt" and store it in a data frame called "tab" where the delimiter argument is "sep" is a tab. A tab punctuation is specified in R as " \t ".
tab <- read.delim(file ="tab_delim.txt", sep = "\t")
Writing the contents of the "tab" data frame is just as simple. We will use a base function called "write.table" and we'll specify the NEW file's name as "tab_delim2.txt". Be sure to use the "row.names" argument to prevent row numbers from being exported along with the data.
write.table(x = tab, file = "tab_delim2.txt", sep = "\t", row.names = FALSE)
Next, let's move on to comma delimited files or comma-separated value (csv) files. To do this we use another base function called "read.csv", which is a special version of the "read.delim" function (imagine that!).
We like to us the "read.csv" function for csv-files because it already knows your delimiter is a comma, so you don't have to specify the "sep" argument.
Now, let's go ahead and read in a comma-delimited or comma-separated value (csv) file called "comma_delim.csv" and store it in a data frame called "comma".
comma <- read.csv(file = "comma_delim.csv")
Similar to the "read.csv" function, the "write.csv" function doesn't need you to specify the "sep" argument, since a comma delimiter is implied. Finally, let's go ahead and write this
write.csv(x = comma, path = "comma_delim2.csv")
Part #3: Reading & Writing Excel Files
For this next part, you'll need to install two different library packages for R; the "readxl" and "writexl" packages.install.packages("readxl")
install.packages("writexl")
Once you've installed the packages, load the associated libraries for those packages.
library(readxl)
library(writexl)
Now we're ready to read in some Microsoft Excel spreadsheet data. Exciting, right!! I know!
Microsoft Excel is a great program, but often it is limited in its capabilities when it comes to LARGER data sets. I'm talking over 1,000,000 rows of data as is common now days, even this many rows is considers small data. When we talk about BIG DATA we're talking 100's of Millions of rows. Also trying to wrangle and tidy data of this magnitude can be impossible in Excel. This is where a program like R can really shine!
Now we are not going to deal with a file with that many rows todays. We're going to keep it simple for today and leave to 7 rows... yeah laugh it up!
First, let's read in "Sheet1" from an excel-file called "excel_example.xls" and store in it a data frame called "excel".
NOTE: "read_xls" can read a variety of excel extensions including: xls, xlsx, xlsm, etc.
excel <- read_xls(path = "excel_example.xls", sheet = "Sheet1")
Now, let's save the excel data frame to a NEW excel file called "excel_example2.xls".
NOTE: "write_xlsx" can write to the basic excel extensions including: xls and xlsx.
write_xlsx(x = excel, path = "excel_example2.xls")
Great stuff, I know!
While the examples we covered today may seem basic, it really is important to know.
I use these functions daily and I'm sure you will too as you continue to use R!
Good Luck!
Only one more step. Let's save that script you just created by clicking on "File" -> "Save As..." and let's name your script "R_Lesson4.R" Click "Save".
Congratulations! You've completed Lesson 4!
DOWNLOAD CODE Here is the code from my GitHub gist "R Lesson 4 - Reading and Writing Data in R" in case you'd rather just copy and paste it and then play around with it.