Using Python to Excel (Part 2 – Calculate Columns and Pivot)

Note: If you’re new here, you’ll probably want to start with Part 1

Python’s Excel pivot table

We’ve made great progress, and things are only just getting started. We’ll mostly be talking about pivot tables, though we’ll start by creating the calculated column that we’ll pivot on shortly:

Calculated Column

The cereal’s rating (apparently given by Consumer Reports) is bizarrely precise, so I thought it would be valuable to make bins of 10. This starts to get a little complex, but isn’t too bad:

2014-10-11 19_53_34-cereals - Excel

Python actually uses the same function name. Note that I’m telling it to only display three columns this time, but all of the rest are still in the ‘kellogs_sorted’ sheet:

2014-10-11 19_54_46-cereals

 

Pivot

This is where things get really intense. I’ve squished Excel’s pivot table a bit to fit it, but you can get a sense of the simple stuff that we’re doing here, just summing the cups and calories for each rating category:

2014-10-11 20_00_40-PivotTable Field List

Python uses a function called pivot_table to achieve this same thing:

2014-10-11 20_03_26-cereals

Calculating Values in Pivots

Excel has you add equations columns to the pivot, then bring those into the table

2014-10-11 20_05_01-cereals - Excel

 

2014-10-11 20_06_30-Insert Calculated Field

 

2014-10-11 20_07_41-

 

In python you can take the table that you made before and simply add the column directly (and then round it, to keep the numbers reasonable)

2014-10-11 20_09_35-cereals

 

Pivot with column categories

In Excel, you add the additional category (in this case, which shelf the item is on) to the “Columns” list. Note that we’re now counting the number of products, instead of summing cups or calories

2014-10-11 20_11_38-PivotTable Field List

In python, you add a new argument to the pivot_table function called “columns”, and tell it that you what “shelf” in there

You can also tell it to put 0 in the blank cells, so that the table is makes more visual sense

Python Excel pivot table

Plotting

When it correctly guesses at your intentions, Excel’s graphs are pretty magical, laying everything out in essentially one click:

2014-10-11 20_20_55-cereals - Excel

 

Python is pretty magical too, simply requiring you to specify that you’d like a bar plot, and then allowing you to set as many or as few labels as you’d like. The legend does have an unfortunate tendency to default to the worst corner of the graph, but it’s easy enough to move around:

2014-10-11 20_18_25-cereals

 

It’s clearly less attractive by default, though. Here’s an example of what you can do with a little more specification:

../../_images/plot_bmh.png

From the matplotlib gallery

 

Histograms

Histograms are the first of the functions that Excel doesn’t have a button for. Excel recommends a 6 step process outlined here: http://support2.microsoft.com/kb/214269

Python has a one-step histogram function, .hist()  Here’s an example histogram of the consumer reports ratings of all of the cereals:

2014-10-11 20_32_27-cereals

 

Histogram Comparisons

You can extend the functionality even further by showing two histograms over each other. The function naturally groups the data into 10 bins, which can be misaligned if you are comparing two data sets.

I used a new function called ‘range’ that simply gives me all the numbers between 0 (the first argument) and 100 (the second), counting by 10 (the third argument). Then I tell .hist() that it needs to use those as bins for both the overall data, and the data filtered to Kellogg’s

2014-10-11 20_28_27-cereals

This allows us to see that Kellogg’s is doing about as well as the group overall in terms of ratings.

 

 

That’s it for the basics! I hope you’ve found it useful, and that you give Python a try next time you want to explore a data set.

2 thoughts on “Using Python to Excel (Part 2 – Calculate Columns and Pivot)

  1. […] That’s it for part 1! Stay tuned for calculated columns and pivots in part 2 […]

  2. […] that people think of is Excel, but I prefer Python, and there are some great resources, like this: Python Excel Pivot Table available for helping you understand how to translate your Excel skills into this new […]

Comments are closed.