Pandas groupby function to group column values into list
Group dataframe rows into a list based on a common element from one column
import pandas as pd
df = pd.read_csv("https://reneshbedre.github.io/assets/posts/pandas/pfam_data.csv")
df
# output
target pfam_id score
0 Glyco_hydro_1 PF00232.19 164.4
1 Glyco_hydro_1 PF00232.19 147.2
2 Glyco_hydro_17 PF00332.19 113.5
3 Glyco_hydro_17 PF00332.19 114.0
4 Glyco_hydro_14 PF01373.18 363.0
5 Glyco_hydro_1 PF00232.19 17.9
6 Glyco_hydro_14 PF01373.18 40.0
7 Glyco_hydro_16 PF00722.22 207.9
8 Glyco_hydro_18 PF00704.29 130.6
9 Glyco_hydro_18 PF00704.29 135.8
# group pfam_id into list based on common target
df.groupby('pfam_id')['target'].apply(list)
# output
pfam_id
PF00232.19 [Glyco_hydro_1, Glyco_hydro_1, Glyco_hydro_1]
PF00332.19 [Glyco_hydro_17, Glyco_hydro_17]
PF00704.29 [Glyco_hydro_18, Glyco_hydro_18]
PF00722.22 [Glyco_hydro_16]
PF01373.18 [Glyco_hydro_14, Glyco_hydro_14]
# if you want to rename the columns after groupby, you may use reset_index() on dataframe
# df.reset_index()
Group dataframe rows into a list based on a common element from two columns
# group score into list based on common target and pfam_id
df.groupby(['target', 'pfam_id'])['score'].apply(list)
# output
target pfam_id
Glyco_hydro_1 PF00232.19 [164.4, 147.2, 17.9]
Glyco_hydro_14 PF01373.18 [363.0, 40.0]
Glyco_hydro_16 PF00722.22 [207.9]
Glyco_hydro_17 PF00332.19 [113.5, 114.0]
Glyco_hydro_18 PF00704.29 [130.6, 135.8]
Reference
- Jeff Reback, Wes McKinney, jbrockmendel, Joris Van den Bossche, Tom Augspurger, Phillip Cloud, … h-vetinari. (2021, April 12). pandas-dev/pandas: Pandas 1.2.4 (Version v1.2.4). Zenodo. http://doi.org/10.5281/zenodo.4681666
- https://stackoverflow.com/questions/22219004/how-to-group-dataframe-rows-into-list-in-pandas-groupby
Learn more about Python
Learn more about R
If you have any questions, comments or recommendations, please email me at reneshbe@gmail.com
This work is licensed under a Creative Commons Attribution 4.0 International License