Query pandas DataFrame to split the values in column
pandas provide a str.split()
function to split the strings based on a given separator. Using str.split()
, you can
split the DataFrame column values into a list or into individual values into separate columns.
First create a random DataFrame,
import pandas as pd
df = pd.DataFrame({'col1': [1, 2, 3], 'col2': ['A_B_C', 'D_E_F', 'X_Y_Z']})
df
# output
col1 col2
0 1 A_B_C
1 2 D_E_F
2 3 X_Y_Z
Now, split the values in col2 values by underscore (_). By default, it will return pandas Series.
df['col2'].str.split(pat='_') # pat parameter defines the split separator
# output
0 [A, B, C]
1 [D, E, F]
2 [X, Y, Z]
Name: col2, dtype: object
To split the values of col2 into separate columns, use the parameter expand=True
. By default, it will return pandas
DataFrame.
df['col2'].str.split(pat='_', expand=True)
# output
0 1 2
0 A B C
1 D E F
2 X Y Z
To limit the number of splits, use the parameter n
. If n is set to -1 (by default), it will return all split. If n is
set to 1, it will return only one split.
df['col2'].str.split(pat='_', n=1, expand=True)
# output
0 1
0 A B_C
1 D E_F
2 X Y_Z
To split the string from the right side, use the pandas str.rsplit
function
df['col2'].str.rsplit(pat='_', n=1, expand=True)
# output
0 1
0 A_B C
1 D_E F
2 X_Y Z
Now, add the split columns to original DataFrame
temp_df = df['col2'].str.split(pat='_', expand=True)
temp_df.columns = ['col2_1', 'col2_2', 'col2_3']
temp_df
# output
col2_1 col2_2 col2_3
0 A B C
1 D E F
2 X Y Z
Merge two DataFrames,
pd.concat([df, temp_df], axis=1)
col1 col2 col2_1 col2_2 col2_3
0 1 A_B_C A B C
1 2 D_E_F D E F
2 3 X_Y_Z X Y Z
Learn more about Python
- Python enumerate
- Python tuples
- 8 different ways to get column names from pandas DataFrame
- 4 different ways to rename column names in pandas DataFrame
If you have any questions, comments or recommendations, please email me at reneshbe@gmail.com
This work is licensed under a Creative Commons Attribution 4.0 International License