3 efficient ways to read (import) a CSV file into R
Importing data files is an essential step in data analysis and visualization. The CSV (comma-separated values) file
formats (.csv
) is commonly used for storing data in text format. In this article, you will learn multiple ways to
read CSV files in R.
Download example CSV dataset: testvolcano.csv
Page content
1. read.csv()
read.csv()
is a base R function that reads a CSV file and converts it to a data frame. The
first line of the file is used as header by default in read.csv()
.
You can provide the complete path to the file or directly read the CSV file if it is present in the current directory. For
example, if the file is in the home
directory, provide the path as /home/user/file.csv
(Linux/Mac) or
C:\\Users\\wind\\file.csv
(Windows).
# R version 4.2.0
# If the file is not in the current directory,
# add the path to the file
df = read.csv("testvolcano.csv")
head(df)
# output
GeneNames log2FC p.value
1 LOC_Os09g01000.1 -1.886539 1.25e-55
2 LOC_Os12g42876.1 3.231611 1.05e-55
3 LOC_Os12g42884.2 3.179004 2.59e-54
4 LOC_Os03g16920.1 5.290677 4.69e-54
5 LOC_Os05g47540.4 4.096862 2.19e-54
6 LOC_Os09g00999.1 -1.839222 1.95e-54
If there is no header in the CSV file, set header = False
df = read.csv("testvolcano.csv", header = False)
Specify a row name using row.names
parameter while reading a CSV file. You can specify either column name or
column number for row.names
parameter. All values in row.names
column should be unique. If there are duplicated
values in row.names
column, you will get an error.
df = read.csv("testvolcano.csv", row.names = "GeneNames")
If there are comment lines in the CSV file (for example, if comment lines starts with #
), then set comment.char = "#"
df = read.csv("testvolcano.csv", comment.char = "#")
If you want to skip the first few lines before reading files, you can use the skip
parameter. For example, if you want to skip
the first 5 lines, you can set skip = 5
.
df = read.csv("testvolcano.csv", skip = 5)
If you do not know the exact location of file, then you can use file.choose()
also open file explorer to search and
open a CSV file.
df = read.csv(file.choose())
read.csv()
is not efficient to read big CSV files (several hundreds of MBs to GBs). if you have big files, it is recommended to use eitherfread()
orread_csv()
functions.
2. read_csv()
read_csv()
function from readr
package (part of tidyverse
) can also be used for reading CSV data files. The
first line of the file is used as a header by default in read_csv()
.
With read_csv
, you can also get additional information such as table dimension and data type of each column
# readr v2.1.2
library(readr)
# If the file is not in the current directory,
# add the path to the file
df = read_csv("testvolcano.csv")
head(df)
# output
# A tibble: 6 × 3
GeneNames log2FC `p-value`
<chr> <dbl> <dbl>
1 LOC_Os09g01000.1 -1.89 1.25e-55
2 LOC_Os12g42876.1 3.23 1.05e-55
3 LOC_Os12g42884.2 3.18 2.59e-54
4 LOC_Os03g16920.1 5.29 4.69e-54
5 LOC_Os05g47540.4 4.10 2.19e-54
6 LOC_Os09g00999.1 -1.84 1.95e-54
If there is no header in the CSV file, set col_names = FALSE
. In addition, if you want to set custom column names,
you can use col_names
parameter.
# if there is no header
df = read_csv("testvolcano.csv", col_names = FALSE)
# set custom column names (where there is no column names in file)
df = read_csv("testvolcano.csv", col_names = c("gene", "fc", "pv"))
If there are comment lines in the CSV file (for example, if comment lines start with #
), then set comment = "#"
df = read_csv("testvolcano.csv", comment = "#")
If you want to skip the first few lines before reading files, you can use the skip
parameter. For example, if you want to skip
the first 2 lines, you can set skip = 2
.
df = read_csv("testvolcano.csv", skip = 2)
If you do not know the exact location of file, then you can use file.choose()
also open file explorer to search and
open a CSV file.
df = read_csv(file.choose())
The
read_csv()
function is ~5 to ~10X faster than theread.csv()
base function.read_csv()
also display a progress bar which is useful for importing big files. In addition,read_csv()
outputs a tibble (modern data frame) which keeps the input data type intact.read_csv()
is also more reproducible thanread.csv()
.
3. fread()
fread()
function from data.table
package (advanced version base R’s data.frame) can also be used for importing CSV
data files.
fread()
is fast (~5X faster than read.csv()
) and memory efficient, and especially good for importing big CSV files. fread()
automatically detects the most common delimiter [,\t |;:]
# data.table v1.14.2
library(data.table)
# If the file is not in the current directory,
# add the path to the file
df = fread("testvolcano.csv")
head(df)
# output
GeneNames log2FC p-value
1: LOC_Os09g01000.1 -1.886539 1.25e-55
2: LOC_Os12g42876.1 3.231611 1.05e-55
3: LOC_Os12g42884.2 3.179004 2.59e-54
4: LOC_Os03g16920.1 5.290677 4.69e-54
5: LOC_Os05g47540.4 4.096862 2.19e-54
6: LOC_Os09g00999.1 -1.839222 1.95e-54
fread()
automatically detects the header. If you do not want header to be used, set header = FALSE
.
# if there is no header
df = fread("testvolcano.csv", header = FALSE)
If you want to skip the first few lines before reading files, you can use the skip
parameter. For example, if you want to skip
the first 4 lines, you can set skip = 4
.
df = fread("testvolcano.csv", skip = 4)
If you do not know the exact location of file, then you can use file.choose()
also open file explorer to search and
open a CSV file.
df = fread(file.choose())
Enhance your skills with courses on Statistics and R
- Introduction to Statistics
- R Programming
- Data Science: Foundations using R Specialization
- Data Analysis with R Specialization
- Getting Started with Rstudio
- Applied Data Science with R Specialization
- Statistical Analysis with R for Public Health Specialization
References
This work is licensed under a Creative Commons Attribution 4.0 International License
Some of the links on this page may be affiliate links, which means we may get an affiliate commission on a valid purchase. The retailer will pay the commission at no additional cost to you.