Query pandas DataFrame to split the values in column

Renesh Bedre    1 minute read

pandas provide a str.split() function to split the strings based on a given separator. Using str.split(), you can split the DataFrame column values into a list or into individual values into separate columns.

First create a random DataFrame,

import pandas as pd
df = pd.DataFrame({'col1': [1, 2, 3], 'col2': ['A_B_C', 'D_E_F', 'X_Y_Z']})
df
# output
   col1   col2
0     1  A_B_C
1     2  D_E_F
2     3  X_Y_Z

Now, split the values in col2 values by underscore (_). By default, it will return pandas Series.

df['col2'].str.split(pat='_')  # pat parameter defines the split separator
# output
0    [A, B, C]
1    [D, E, F]
2    [X, Y, Z]
Name: col2, dtype: object

To split the values of col2 into separate columns, use the parameter expand=True. By default, it will return pandas DataFrame.

df['col2'].str.split(pat='_', expand=True)  
# output
   0  1  2
0  A  B  C
1  D  E  F
2  X  Y  Z

To limit the number of splits, use the parameter n. If n is set to -1 (by default), it will return all split. If n is set to 1, it will return only one split.

df['col2'].str.split(pat='_', n=1, expand=True)  
# output
   0    1
0  A  B_C
1  D  E_F
2  X  Y_Z

To split the string from the right side, use the pandas str.rsplit function

df['col2'].str.rsplit(pat='_', n=1, expand=True)  
# output
     0  1
0  A_B  C
1  D_E  F
2  X_Y  Z

Now, add the split columns to original DataFrame

Rename the DataFrame columns,

temp_df = df['col2'].str.split(pat='_', expand=True)  
temp_df.columns = ['col2_1', 'col2_2', 'col2_3']
temp_df
# output
  col2_1 col2_2 col2_3
0      A      B      C
1      D      E      F
2      X      Y      Z

Merge two DataFrames,

pd.concat([df, temp_df], axis=1)
   col1   col2 col2_1 col2_2 col2_3
0     1  A_B_C      A      B      C
1     2  D_E_F      D      E      F
2     3  X_Y_Z      X      Y      Z

Learn more about Python

If you have any questions, comments or recommendations, please email me at reneshbe@gmail.com

This work is licensed under a Creative Commons Attribution 4.0 International License