Query pandas DataFrame to select rows based on value and condition matching

Renesh Bedre 3 minute read

In this article, I will discuss how to query a pandas DataFrame to select the rows based on the exact and partial value matching to the column values

First create a random DataFrame,

import pandas as pd
import numpy as np
df = pd.DataFrame({'col1': [1, 2, 3, 4, 5, np.nan], 'col2': ['abc', 'city', 'def', 'ghi', 'ijk', 'cd'], 'col3': [1, 'Y', 'Z', 'Z', 0, 1]})
df
# output
   col1  col2 col3
0   1.0   abc    1
1   2.0  city    Y
2   3.0   def    Z
3   4.0   ghi    Z
4   5.0   ijk    0
5   NaN    cd    1

Based on exact match

Select rows based on the exact match with the one column value,

# select the rows where col1 value is equal to 1
df[df['col1']==1]
# output
   col1 col2 col3
0   1.0  abc    1

# using query method
df.query('col1==1')
# output
   col1 col2 col3
0   1.0  abc    1

Select rows based on the exact match with the multiple column values,

# select the rows where col1 value is equal to 2 and col3 is equal to Y
# using & bitwise operator
df[(df['col1']==2) & (df['col3']=='Y') ]
# output
   col1  col2 col3
1   2.0  city    Y

# select the rows where col1 value is equal to 2 or col3 is equal to Y
# using | bitwise operator
df[(df['col1']==1) | (df['col3']=='Y') ]
# output
   col1  col2 col3
0   1.0   abc    1
1   2.0  city    Y

# using query method
df.query('col1 == 2 and col3 == "Y" ')
# output
   col1  col2 col3
1   2.0  city    Y

Based on partial match using regex

Select rows based on the partial match with the one column value,

# select the rows where specific column contains ty 
df[df['col2'].str.contains("ty")]
# output
   col1  col2 col3
1   2.0  city    Y

# using query method
df.query('col2.str.contains("ty")')
# output
   col1  col2 col3
1   2.0  city    Y

Select rows based on the multiple partial matches with the one column value,

# select the rows where specific column contains ty or de
df[df['col2'].str.contains("ty|de")]
# output
   col1  col2 col3
1   2.0  city    Y
2   3.0   def    Z

Select rows based on the start of value,

# select the rows where specific column value starts with ci
df[df['col2'].str.contains("^ci")]
# output
   col1  col2 col3
1   2.0  city    Y

Select rows based on the character index position in strings,

# select the rows where col2 has j character in second index position
df[df['col2'].str[1] == 'j']
# output
   col1 col2 col3
4   5.0  ijk    0

Based on condition match

Select rows where column values are greater or lesser than some value,

# select rows where col1 values are greater than 2
df[df['col1'] > 2]
# output
   col1 col2 col3
2   3.0  def    Z
3   4.0  ghi    Z
4   5.0  ijk    0

Select rows where column values are between some values,

# select rows where col1 values between 2 and 5
df[df['col1'].between(left=2, right=5, inclusive='both')] # left and right boundaries included
# output
   col1  col2 col3
1   2.0  city    Y
2   3.0   def    Z
3   4.0   ghi    Z
4   5.0   ijk    0

Using isin() function

If you have a list of values and would like to check any one of them matches to column value,

df[df['col2'].isin(['def', 'xyz', 'ijk'])]
# output
   col1 col2 col3
2   3.0  def    Z
4   5.0  ijk    0

Using isnull() function

Select rows where a column contains the null values,

df[df['col1'].isnull()]
# output
   col1 col2 col3
5   NaN   cd    1

# using query method
df.query('col1.isnull()')
# output
   col1 col2 col3
5   NaN   cd    1

Enhance your skills with courses on Python and pandas

Learn more about Python and R

If you have any questions, comments or recommendations, please email me at reneshbe@gmail.com

This work is licensed under a Creative Commons Attribution 4.0 International License

Share on

Twitter Facebook LinkedIn

Query pandas DataFrame to select rows based on value and condition matching

Based on exact match

Based on partial match using regex

Based on condition match

Using isin() function

Using isnull() function

Enhance your skills with courses on Python and pandas

Learn more about Python and R

Share on

You may also enjoy

Calculate Coverage From BAM File

Python: Why VIF Return Inf Value?

Find Max and Min Sequence Length in Fasta

Get Non-overlapping Portion Between Two Regions in bedtools