Query pandas DataFrame to select rows based on value and condition matching
In this article, I will discuss how to query a pandas DataFrame to select the rows based on the exact and partial value matching to the column values
First create a random DataFrame,
import pandas as pd
import numpy as np
df = pd.DataFrame({'col1': [1, 2, 3, 4, 5, np.nan], 'col2': ['abc', 'city', 'def', 'ghi', 'ijk', 'cd'], 'col3': [1, 'Y', 'Z', 'Z', 0, 1]})
df
# output
col1 col2 col3
0 1.0 abc 1
1 2.0 city Y
2 3.0 def Z
3 4.0 ghi Z
4 5.0 ijk 0
5 NaN cd 1
Based on exact match
Select rows based on the exact match with the one column value,
# select the rows where col1 value is equal to 1
df[df['col1']==1]
# output
col1 col2 col3
0 1.0 abc 1
# using query method
df.query('col1==1')
# output
col1 col2 col3
0 1.0 abc 1
Select rows based on the exact match with the multiple column values,
# select the rows where col1 value is equal to 2 and col3 is equal to Y
# using & bitwise operator
df[(df['col1']==2) & (df['col3']=='Y') ]
# output
col1 col2 col3
1 2.0 city Y
# select the rows where col1 value is equal to 2 or col3 is equal to Y
# using | bitwise operator
df[(df['col1']==1) | (df['col3']=='Y') ]
# output
col1 col2 col3
0 1.0 abc 1
1 2.0 city Y
# using query method
df.query('col1 == 2 and col3 == "Y" ')
# output
col1 col2 col3
1 2.0 city Y
Based on partial match using regex
Select rows based on the partial match with the one column value,
# select the rows where specific column contains ty
df[df['col2'].str.contains("ty")]
# output
col1 col2 col3
1 2.0 city Y
# using query method
df.query('col2.str.contains("ty")')
# output
col1 col2 col3
1 2.0 city Y
Select rows based on the multiple partial matches with the one column value,
# select the rows where specific column contains ty or de
df[df['col2'].str.contains("ty|de")]
# output
col1 col2 col3
1 2.0 city Y
2 3.0 def Z
Select rows based on the start of value,
# select the rows where specific column value starts with ci
df[df['col2'].str.contains("^ci")]
# output
col1 col2 col3
1 2.0 city Y
Select rows based on the character index position in strings,
# select the rows where col2 has j character in second index position
df[df['col2'].str[1] == 'j']
# output
col1 col2 col3
4 5.0 ijk 0
Based on condition match
Select rows where column values are greater or lesser than some value,
# select rows where col1 values are greater than 2
df[df['col1'] > 2]
# output
col1 col2 col3
2 3.0 def Z
3 4.0 ghi Z
4 5.0 ijk 0
Select rows where column values are between some values,
# select rows where col1 values between 2 and 5
df[df['col1'].between(left=2, right=5, inclusive='both')] # left and right boundaries included
# output
col1 col2 col3
1 2.0 city Y
2 3.0 def Z
3 4.0 ghi Z
4 5.0 ijk 0
Using isin() function
If you have a list of values and would like to check any one of them matches to column value,
df[df['col2'].isin(['def', 'xyz', 'ijk'])]
# output
col1 col2 col3
2 3.0 def Z
4 5.0 ijk 0
Using isnull() function
Select rows where a column contains the null values,
df[df['col1'].isnull()]
# output
col1 col2 col3
5 NaN cd 1
# using query method
df.query('col1.isnull()')
# output
col1 col2 col3
5 NaN cd 1
Enhance your skills with courses on Python and pandas
- Python for Data Analysis: Pandas & NumPy
- Mastering Data Analysis with Pandas: Learning Path Part 1
- Data Analysis Using Python
Learn more about Python and R
If you have any questions, comments or recommendations, please email me at reneshbe@gmail.com
This work is licensed under a Creative Commons Attribution 4.0 International License