Pandas groupby function to group column values into list
Group dataframe rows into a list based on a common element from one column
import pandas as pd
df = pd.read_csv("https://reneshbedre.github.io/assets/posts/pandas/pfam_data.csv")
df
# output
           target     pfam_id  score
0   Glyco_hydro_1  PF00232.19  164.4
1   Glyco_hydro_1  PF00232.19  147.2
2  Glyco_hydro_17  PF00332.19  113.5
3  Glyco_hydro_17  PF00332.19  114.0
4  Glyco_hydro_14  PF01373.18  363.0
5   Glyco_hydro_1  PF00232.19   17.9
6  Glyco_hydro_14  PF01373.18   40.0
7  Glyco_hydro_16  PF00722.22  207.9
8  Glyco_hydro_18  PF00704.29  130.6
9  Glyco_hydro_18  PF00704.29  135.8
# group pfam_id into list based on common target
df.groupby('pfam_id')['target'].apply(list)
# output
pfam_id
PF00232.19    [Glyco_hydro_1, Glyco_hydro_1, Glyco_hydro_1]
PF00332.19                 [Glyco_hydro_17, Glyco_hydro_17]
PF00704.29                 [Glyco_hydro_18, Glyco_hydro_18]
PF00722.22                                 [Glyco_hydro_16]
PF01373.18                 [Glyco_hydro_14, Glyco_hydro_14]
# if you want to rename the columns after groupby, you may use reset_index() on dataframe
# df.reset_index()
Group dataframe rows into a list based on a common element from two columns
# group score into list based on common target and pfam_id
df.groupby(['target', 'pfam_id'])['score'].apply(list)
# output
target          pfam_id
Glyco_hydro_1   PF00232.19    [164.4, 147.2, 17.9]
Glyco_hydro_14  PF01373.18           [363.0, 40.0]
Glyco_hydro_16  PF00722.22                 [207.9]
Glyco_hydro_17  PF00332.19          [113.5, 114.0]
Glyco_hydro_18  PF00704.29          [130.6, 135.8]
Reference
- Jeff Reback, Wes McKinney, jbrockmendel, Joris Van den Bossche, Tom Augspurger, Phillip Cloud, … h-vetinari. (2021, April 12). pandas-dev/pandas: Pandas 1.2.4 (Version v1.2.4). Zenodo. http://doi.org/10.5281/zenodo.4681666
- https://stackoverflow.com/questions/22219004/how-to-group-dataframe-rows-into-list-in-pandas-groupby
Learn more about Python
Learn more about R
If you have any questions, comments or recommendations, please email me at reneshbe@gmail.com
This work is licensed under a Creative Commons Attribution 4.0 International License
 
      