Solving the Most Common R Error: ‘Duplicate row.names are not allowed’
When you read a CSV file using the read.csv()
base R function, you may encounter error duplicate 'row.names' are
not allowed
while using row.names
parameter.
The R data frame does not allow duplicated values in a column specified by row.names
parameter.
This error occurs when there are duplicated values in a column specified by row.names
parameter.
For example, if you have the following dataset in a CSV format,
name,height,weight
x,1.80,65
y,1.62,67
z,1.55,62
z,1.56,63
In this dataset, one of the values in the name
column is duplicated. If you import this dataset using the read.csv()
function with the name
column specified as row.names
, you will get a duplicate 'row.names' are not allowed
error.
Let’s replicate the error using the above dataset,
df = read.csv('https://reneshbedre.github.io/assets/posts/other/data.csv', row.names = "name")
Error in read.table(file = file, header = header, sep = sep, quote = quote, :
duplicate 'row.names' are not allowed
To fix duplicate row.names
are not allowed error, you can use following two solutions.
Solution 1: Import without row.names
parameter
In this solution, you can fix this error by importing a CSV file without specifying a row.names
or assign
row.names = NULL
. This will assign numerical values to the row names.
df <- read.csv("https://reneshbedre.github.io/assets/posts/other/data.csv")
# same as df = read.csv("https://reneshbedre.github.io/assets/posts/other/data.csv", row.names = NULL)
# view data frame
df
name height weight
1 x 1.80 65
2 y 1.62 67
3 z 1.55 62
4 z 1.56 63
Now, if you want to set the first column (name
) as row names, you can try to make values in the name
column as
unique values using make.names()
function.
# make values in name column unique
uniq_name <- make.names(df$name, unique = TRUE)
row.names(df) <- uniq_name
# view data frame
df
name height weight
x x 1.80 65
y y 1.62 67
z z 1.55 62
z.1 z 1.56 63
You have created a data frame with unique row names. If you would like you can drop (df[,-1]
) or keep the name
column in the data frame.
Solution 2: Create a matrix
In this solution, you can fix this error by creating a matrix from a data frame.
In R, data frame does not allow to have duplicated rows, but the matrix can have the duplicated rows.
# load dataset
df <- read.csv("https://reneshbedre.github.io/assets/posts/other/data.csv")
# view data frame
df
name height weight
1 x 1.80 65
2 y 1.62 67
3 z 1.55 62
4 z 1.56 63
Create a matrix from data frame using data.matrix()
function,
# convert data frame to matrix
df_mat <- data.matrix(df)
# assign row name to matrix
row.names(df_mat) <- df$name
# drop first name column
df_mat = df_mat[ ,-1]
# view matrix
df_mat
height weight
x 1.80 65
y 1.62 67
z 1.55 62
z 1.56 63
Enhance your skills with courses on Statistics and R
- Introduction to Statistics
- R Programming
- Data Science: Foundations using R Specialization
- Data Analysis with R Specialization
- Getting Started with Rstudio
- Applied Data Science with R Specialization
- Statistical Analysis with R for Public Health Specialization
This work is licensed under a Creative Commons Attribution 4.0 International License
Some of the links on this page may be affiliate links, which means we may get an affiliate commission on a valid purchase. The retailer will pay the commission at no additional cost to you.