How to unpivot pandas DataFrame (wide to long format)

Renesh Bedre 3 minute read

The pandas melt() function helps in unpivoting the wide format DataFrame into the long format i.e. converts columns into rows. The pandas melt() function can be used as below.

pandas.melt(df, id_vars=['col1'], value_vars=['col2', 'col3', ..., 'colN'])

Now, let’s understand the pandas melt() function with some examples.

Unpivoting (melting) pandas DataFrames from wide to long format is often necessary. pandas melt() function can be used to do this. A long format DataFrame can be useful for summarizing data and displaying data visually.

pandas melt function

pandas melt() function is equivalent to R’s melt() function from reshape2 package.

The following two examples shows how to use pandas melt() function.

Example 1 (for one identifier variable)

import pandas as pd

# create dataframe
df = pd.DataFrame({'plant_var':['A', 'B', 'C', 'D', 'E'], 
                   'height':[15, 20, 18, 21, 17], 'yield':[56, 75, 60, 70, 55]})
# output
  plant_var  height  yield
0         A      15     56
1         B      20     75
2         C      18     60
3         D      21     70
4         E      17     55

Now, convert this wide format DataFrame into a long format using pandas melt() function

import pandas as pd

pd.melt(df, id_vars=['plant_var'], value_vars=['height', 'yield'])
# output
  plant_var variable  value
       A   height     15
       B   height     20
       C   height     18
       D   height     21
       E   height     17
       A    yield     56
       B    yield     75
       C    yield     60
       D    yield     70
       E    yield     55

In the above example, the id_vars and value_vars represent the identifier variables and columns to unpivot, respectively.

In the resulting long format DataFrame, value_vars will be saved in a variable column and their associated response values will be saved in value column. You can change these column names using var_name and value_name parameters.

import pandas as pd

pd.melt(df, id_vars=['plant_var'], value_vars=['height', 'yield'], 
        var_name='features', value_name='response')
# output
  plant_var features  response
0         A   height        15
1         B   height        20
2         C   height        18
3         D   height        21
4         E   height        17
5         A    yield        56
6         B    yield        75
7         C    yield        60
8         D    yield        70
9         E    yield        55

Example 2 (for multiple identifier variable)

If the input DataFrame has two (or more) identifier variables (id_vars), you can convert the wide-format DataFrame into long-format as below

import pandas as pd

# create dataframe
df = pd.DataFrame({'plant_var':['A', 'B', 'C', 'D', 'E'], 'location':['L1', 'L2', 'L3', 'L4', 'L5'], 
                   'height':[15, 20, 18, 21, 17], 'yield':[56, 75, 60, 70, 55]})
# output
  plant_var location  height  yield
0         A       L1      15     56
1         B       L2      20     75
2         C       L3      18     60
3         D       L4      21     70
4         E       L5      17     55

Now, convert this wide format DataFrame into a long format,

import pandas as pd

pd.melt(df, id_vars=['plant_var', 'location'], value_vars=['height', 'yield'])
# output
  plant_var location variable  value
       A       L1   height     15
       B       L2   height     20
       C       L3   height     18
       D       L4   height     21
       E       L5   height     17
       A       L1    yield     56
       B       L2    yield     75
       C       L3    yield     60
       D       L4    yield     70
       E       L5    yield     55

By default, the melt() function reset the index. If you want to preserve the index, you should set the ignore_index parameter to False.

import pandas as pd

pd.melt(df, id_vars=['plant_var', 'location'], value_vars=['height', 'yield'], ignore_index=False)
# output
  plant_var location variable  value
       A       L1   height     15
       B       L2   height     20
       C       L3   height     18
       D       L4   height     21
       E       L5   height     17
       A       L1    yield     56
       B       L2    yield     75
       C       L3    yield     60
       D       L4    yield     70
       E       L5    yield     55

You can see that index values are preserved by using the ignore_index=False parameter.

Enhance your skills with courses Python and pandas

If you have any questions, comments or recommendations, please email me at reneshbe@gmail.com

This work is licensed under a Creative Commons Attribution 4.0 International License

Some of the links on this page may be affiliate links, which means we may get an affiliate commission on a valid purchase. The retailer will pay the commission at no additional cost to you.

Share on

Twitter Facebook LinkedIn

How to unpivot pandas DataFrame (wide to long format)

Enhance your skills with courses Python and pandas

Share on

You may also enjoy

Calculate Coverage From BAM File

Python: Why VIF Return Inf Value?

Find Max and Min Sequence Length in Fasta

Get Non-overlapping Portion Between Two Regions in bedtools