Business Analysis: Demographics


Is either gender significantly better represented in the Stever Robbins’ listenership?


Survey of 466 listeners of Stever’s podcast

To begin, I found the population proportions, and the total size of the data set

data = pFile['Gender'].dropna()
n = float(data.count())
numMale = data[data=='Male']
numFemale = data[data=='Female']
pMale = numMale.count()/n
pFemale = numFemale.count()/n

At that point, I was able to define  the endpoints using the Z statistic (which I’d already computed and put into a variable for clarity)

#using the z-statistic 2.576, which corresponds to the 99% confidence interval
print pMale + z99*(pMale*(1-pMale) / float(n))**.5, pMale - z99*(pMale*(1-pMale) / float(n))**.5

#using the z-statistic 2, which corresponds to the 95% confidence interval
print pMale + z95*(pMale*(1-pMale) / float(n))**.5, pMale - z95*(pMale*(1-pMale) / float(n))**.5

#using the z-statistic 1.645, which corresponds to the 99% confidence interval
print pMale + z90*(pMale*(1-pMale) / float(n))**.5, pMale - z90*(pMale*(1-pMale) / float(n))**.5


"90% Confidence Interval:(0.49644705827396918, 0.57547869578635569)"
"95% Confidence Interval: (0.4879193283904138, 0.58400642566991101)"
"99% Confidence Interval:  (0.47408278638216622, 0.5978429676781587)"

Since raw numeric output is rarely intuitive enough to help non-statistics users make efficient decisions, I put the results on a number line.  The library that I used isn’t designed to do number lines, so I had to dive into the guts of the code a bit

from matplotlib import pyplot
from matplotlib import font_manager
import matplotlib.ticker as ticker

f1 = pyplot.figure(1, figsize=(10,2), facecolor = 'white') #Have the whole figure be white

ax1 = pyplot.subplot(1,1,1)
for loc, spine in ax1.spines.iteritems():
if loc not in ['bottom']: #I generally only want the bottom of the bounding box

ax1.set_position((.1, .09, .8, .75))

font = '/usr/share/fonts/truetype/msttcorefonts/georgia.ttf' #I like this font best
prop = font_manager.FontProperties(fname = font)

plot99 = ax1.plot(interval99, [0,0], 'o-', label="99% confidence interval")
plot95 = ax1.plot(interval95, [1,1], 'o-', label="95% confidence interval")
plot90 = ax1.plot(interval90, [2,2], 'o-', label="90% confidence interval")

handles, labels = ax1.get_legend_handles_labels()
ax1.legend(handles[::-1], labels[::-1]) # reverse the order

ax1.set_xlim(0, 1) #leave some buffer on the left and right
ax1.set_ylim(-1,3) #data should take up about 2/3 of the vertical space
#Bold the numbers of the ticks
ticklocs = ax1.get_xaxis().get_ticklocs(minor=False)
ax1.get_xaxis().set_ticklabels(ticklocs, weight = 'bold')

#Turn off ticks on right and top

formatter = ticker.FormatStrFormatter('%1.2f')

f1.axes[0].yaxis.set_ticks([]) #turn off ticks on the left side




Which returns this:

  data on confidence intervals


The most important feature of this result is that it straddles the 0.5 mark – meaning that we can’t definitively say that the listenership has more men than women.  Even at the high end of the interval, though, the proportion is not large enough to impact our strategic decisions.



Neither gender is significantly more represented in Stever’s listener base