How to replace column values in pandas DataFrame based on column conditions
While working on pandas DataFrame, you encounter a problem where you need to replace or update the column values based on column conditions. In this article, we will discuss four methods for replacing the values of pandas Dataframe columns based on the column conditions.
1. .loc
indexing
df.loc[df['col_name'] == 'old_value', 'col_to_replace'] = new_value
2. numpy.where()
function
df['col_to_replace'] = np.where(df['col_name'] == 'old_value', 'true_value', 'false_value')
3. pandas mask()
function
df['col_to_replace'].mask(df['col_to_replace'] == 'old_value', 'new_value', inplace=True)
4. pandas where()
function
df['col_to_replace'].where(df['col_to_replace'] == 'old_value', 'new_value', inplace=True)
Now, we will discuss these four methods in detail with an example dataset,
1. .loc
indexing
import pandas as pd
# create a random dataframe
df = pd.DataFrame({'name':['Adams', 'Jones', 'Frank', 'Smith', 'Davis'],
'age':[25, 30, 28, 35, 22], 'weight':[74, 90, 85, 65, 92]})
# output
name age weight
0 Adams 25 74
1 Jones 30 90
2 Frank 28 85
3 Smith 35 65
4 Davis 22 92
# replace the Smith's weight to 80
df.loc[df['name'] == 'Smith', 'weight'] = 80
# output
name age weight
0 Adams 25 74
1 Jones 30 90
2 Frank 28 85
3 Smith 35 80
4 Davis 22 92
# based on multiple column conditions
# update the Adams weight to 70 if his age is 25
df.loc[(df['name'] == 'Adams') & (df['age'] == 25), 'weight'] = 70
# output
name age weight
0 Adams 25 70
1 Jones 30 90
2 Frank 28 85
3 Smith 35 65
4 Davis 22 92
# replace the weight value to 75 if age is greater than 28
df.loc[df['age'] > 28, 'weight'] = 80
# output
name age weight
0 Adams 25 74
1 Jones 30 80
2 Frank 28 85
3 Smith 35 80
4 Davis 22 92
The pandas .loc
indexing is a convenient way replace the column values based on a
conditional expression. You can replace the column values based on single or multiple columns conditions.
2. numpy.where()
function
import pandas as pd
import numpy as np
# create a random dataframe
df = pd.DataFrame({'name':['Adams', 'Jones', 'Frank', 'Smith', 'Davis'],
'age':[25, 30, 28, 35, 22], 'weight':[74, 90, 85, 65, 92]})
# output
name age weight
0 Adams 25 74
1 Jones 30 90
2 Frank 28 85
3 Smith 35 65
4 Davis 22 92
# replace the Jones's age to 25
df['age'] = np.where(df['name'] == 'Jones', 25, 30)
# output
name age weight
0 Adams 30 74
1 Jones 25 90
2 Frank 30 85
3 Smith 30 65
4 Davis 30 92
numpy.where()
is a conditional function which returns the elements based on a condition. This method is more suitable
if you want to update the large number of values based on condition in a column.
The syntax of this function is:
numpy.where(condition, true_value, false_value)
condition: conditional expression
true_value: Old value will be replaced with this true value if the condition is True
false_value: Old value will be replaced with this value if the condition is False
3. pandas mask()
function
import pandas as pd
# create a random dataframe
df = pd.DataFrame({'name':['Adams', 'Jones', 'Frank', 'Smith', 'Davis'],
'age':[25, 30, 28, 35, 22], 'weight':[74, 90, 85, 65, 92]})
# output
name age weight
0 Adams 25 74
1 Jones 30 90
2 Frank 28 85
3 Smith 35 65
4 Davis 22 92
# replace the weight value with 92 if it is 90
df['weight'].mask(df['weight'] == 90, 98, inplace=True)
# output
name age weight
0 Adams 25 74
1 Jones 30 98
2 Frank 28 85
3 Smith 35 65
4 Davis 22 92
pandas.DataFrame.mask()
is also a conditional function which replaces the value if the condition is True. The pandas
mask()
is opposite to that of numpy.where()
function. If the condition is False, it does keep the original value.
The syntax of pandas mass function is:
DataFrame['col_to_replace'].mask(condition, new_value)
condition: conditional expression
col_to_replace: Name of the column in which values need to be replaced
new_value: Old value will be replaced with this value if the condition is True
4. pandas where()
function
import pandas as pd
# create a random dataframe
df = pd.DataFrame({'name':['Adams', 'Jones', 'Frank', 'Smith', 'Davis'], 'age':[25, 30, 28, 35, 22], 'weight':[74, 90, 85, 65, 92]})
# output
name age weight
0 Adams 25 74
1 Jones 30 90
2 Frank 28 85
3 Smith 35 65
4 Davis 22 92
# replaces the value if the condition is False
df['name'].where(df['name'] == 'Frank', 'Jones', inplace=True)
# output
name age weight
0 Jones 25 74
1 Jones 30 90
2 Frank 28 85
3 Jones 35 65
4 Jones 22 92
pandas.DataFrame.where()
is also a conditional function which replaces the value if the condition is False (as opposite
to mask()
function). If the condition is True, it does keep the original value.
The syntax of pandas where function is:
DataFrame['col_to_replace'].where(condition, new_value)
condition: conditional expression
col_to_replace: Name of the column in which values need to be replaced
new_value: Old value will be replaced with this value if condition is False. Check value will remain same if
it matches.
Enhance your skills with courses Python and pandas
- Mastering Data Analysis with Pandas
- Python for Data Analysis: Pandas & NumPy
- Introduction to Data Science in Python
- Python for Everybody Specialization
- Python 3 Programming Specialization
If you have any questions, comments or recommendations, please email me at reneshbe@gmail.com
This work is licensed under a Creative Commons Attribution 4.0 International License
Some of the links on this page may be affiliate links, which means we may get an affiliate commission on a valid purchase. The retailer will pay the commission at no additional cost to you.