Minding your p-‘s and q-‘s

In the practice of statistical inference, the concept of p-value (as well as something that needs to exist, but doesn’t yet, called q-value), is very useful. So is a really important concept you need to understand if you want to fool people (or prevent yourself from being fooled!) – it’s called p-hacking.

The first (p-value) concerns the following kind of question (I have borrowed this example from a public lecture at the Math Museum by Jen Rogers in September 2018) – suppose I have a deadly disease where it is known that, if you perform no treatment of any kind, 40% of the people that contract it die, while the others survive, i.e., the probability of dying is 40 \%. On the other hand, a medical salesperson shows up at your doorstep and informs you that about the new miracle cure “XYZ”. They (the manufacturer) gave the drug to 10 people (that had the disease) and 7 of them survived (probability of dying with the new medical protocol appears to be 30 \%). Would you be impressed? What if she told you that they gave the drug to 1000 people and 700 of them survived? Clearly, the second seems more plausibly to have some real effect. How do we make this quantitative?

The second (I call this a q-value) concerns a sort of problem that crops up in finance. There are many retail investors that don’t have the patience to follow the market or follow the rise and fall of companies that issue stocks and bonds. They get ready-made solutions from their favorite investment bank – these are called structured notes. Structured notes can be “structured” any which way you want.

Consider one such example. Say you buy a 7-year US-dollar note exposed to the Nikkei-225 Japanese 225-stock index. The N225 index is the Japanese equivalent of the S&P500 index in the US Usually, you pay in $100 for the note, the bank unburdens you of $5 to feed the salesman and other intermediaries, then invests $70 in a “zero-coupon” US Treasury bond that will expire in 7 years. The Treasury bond is an IOU issued by the US Treasury – you give them $70 now (at the now prevailing interest rates) and they will return $100 in 7 years.

As far as we know right now, the US Treasury is a rock-solid investment, they are not expected to default, ever. Of course, governing philosophies change and someone might look at this article in a hundred years and wonder what I was thinking!

The bank then uses the remaining $25 to invest in a 7-year option that pays off (some percentage P) of the relative increase (written as P \times \frac{ \yen N225_{final}-\yen N225_{initial}}{\yen N225_{initial}}) in the Nikkei-225 index. This variety of payoff, that became popular in the early 1990s, was called a “quanto” option – note that \yen N225 is the Nikkei index in its native currency, so it is around 22,500 right now.

For a regular payoff (non-quanto), you would receive, not the expression above, but something similar converted into US dollars. This would make sense, since it would be natural (for an option buyer) to  convert the $25 into Japanese yen, buy some units of the Nikkei index, keeping only the increase (not losing money if it falls below the initial level), then converting the profits back to US dollars after 7 years. If we wrote this as an “non-quanto” option payoff, it would be P \times \frac{\$ N225_{final}-\$ N225_{initial}}{\$ N225_{initial}}, where \$ N225 is the Nikkei-225 index expressed in US dollars. If the \yen N225 index were 22,500, then the \$ N225 index is currently \frac{\yen N225}{Yen/Dollar} = \frac{22,500}{112} \approx 201. You would convert the index to US dollars after 7 years at the “then” Yen-dollar rate, to compute the “final” \$ N225 index value, which you would plug into the formula.

If  you buy a “quanto” option, you bear no exposure to the vagaries of the FX rate between the US dollar and the Japanese yen, so it is easy to explain and sell to investors. Just look at the first payoff formula above.  The second payoff formula, though natural, is a more complex formula.

However, as you should know, in finance, if there is a risk in the activity that you do, but you find that you don’t bear this risk in the instrument you have bought, it is because someone else has (presumably without your knowledge) bought this risk from you and has paid (much) less than what it is worth, through the assumptions used in pricing the instrument you just bought.

It turns out that option pricing formula invented by Fischer Black, Myron Scholes and Robert Merton can be expanded to value these sorts of “quanto” options. The formula depends on some extra parameters. One of these is the volatility (standard deviation per year) of the Yen-dollar exchange rate. The other is the correlation between two quantities – the \# Yen / Dollar and \# Yen / N225 \: index. That graph might look like this (not real data, but a common observation for these correlations).

Correlation JPYUSD vs JPYNikkei

You are asked to buy this correlation, in competition with others. How much would you pay? If you were in an uncompetitive environment, you might “buy” this correlation  at -100 \%. If you heard that someone paid  -30 \%, would you think it makes sense?

How seriously should one take this correlation? Consider the cases considered in this fantastic post. A correlation between Manoj “Night” Shyamalan’s movies and newspaper reading? Really? What correlations are sensible and what should we pay less heed to?

The idea of p-values answer the first question. The way to think about the miracle drug is this – suppose you did nothing and you assume (from your prior experience) that the results of doing nothing are – the probability of a patient dying of the deadly disease is p = 0.4, i.e.,  the probability of survival is 1- p  = 0.6. Then, if you assume that the patients live or die independent of each other, what is the probability that out of a pool of 10 patients, exactly 7, 8, 9 or 10 people would survive. Well, that would be (it’s called the p-value)

{10 \choose 7} (0.6)^7 (0.4)^3 + {10 \choose 8} (0.6)^8 (0.4)^2 +{10 \choose 9} (0.6)^9 (0.4)^1 +{10 \choose 10} (0.6)^{10} (0.4)^0 = 0.38

You might choose to add up the probability that you might get a result of 5 survivals and lower too (in case you are interested in a deviation of 1 or more from the average, rather than just a higher number).

{10 \choose 5} (0.6)^5 (0.4)^5 + {10 \choose 4} (0.6)^4 (0.4)^6 +{10 \choose 3} (0.6)^3 (0.4)^7 +{10 \choose 2} (0.6)^2 (0.4)^8 +{10 \choose 1} (0.6)^1 (0.4)^9 +{10 \choose 0} (0.6)^0 (0.4)^{10} = 0.37

The sum of these two (called the symmetrical p-value) is 0.75, i.e., there is 75% probability that such (and even more hopeful) results are explainable by the “null hypothesis”, that the miracle drug had absolutely no effect and that the disease simply took its usual course.

If we repeated the same test with a 1000 patients, of whom 700 survived, this has a dramatically different result. The same calculations would yield

{1000 \choose 700} (0.6)^{700} (0.4)^{300} + {1000 \choose 701} (0.6)^{701} (0.4)^{399} +. \: .\: .+{1000 \choose 1000} (0.6)^{1000} (0.4)^0 \\  \approx 3 \times 10^{-11}

Notice how small this number is. If you also add the probability of repeating the experiment and getting 500 or fewer survivals, that would be \approx 10^{-10}.

The symmetrical p-value in this case is \approx 10^{-10}. Consider how tiny this is compared to the 0.75 number we had before. This is clearly a rather effective drug!

The p-value is just the total probability that the “null hypothesis” generates the observed event or anything even more extreme than observed. Seems reasonable, doesn’t it? If this p-value is less than some lower threshold (say 0.05), you might decide this is acceptable as “evidence”. The \frac{700}{1000} test appears as if it proves that “XYZ” is an excellent “miracle” drug.

Next, we come to the underside of p-values. Its called p-hacking. Here’s a simple way to do it. Consider the test where you obtained a \frac{7}{10} result. Let’s say you decided, post-hoc, that the last person that died, actually had a fatal pre-existing condition that you didn’t detect. No autopsies were performed, so that patient might well have died of the condition. In that case, maybe we should exclude that person from the 10 people who were in the survey? And one other guy that died had a really bad attitude, didn’t cooperate with the nurses, maybe didn’t take his medication regularly! We should exclude him too? So we had 7 successful results out of 8 “real” patients. The p-value has now dropped to 0.106 for the 7 and above case and 0.17 for the 3 and below case, for a total p-value of 0.27. Much better! And we didn’t have to do any work, just some Monday morning quarter-backing. Wait, maybe that is exactly what Monday morning quarter-backing is.

Another example of p-hacking is one that I gave in this post. For convenience, I reproduce it here –

Imagine you were walking around in Manhattan and you chanced upon an interesting game going on at the side of the road. By the way, when you see these games going on, a safe strategy is to walk on, since they usually reduce to methods of separating a lot of money from you in various ways.

The protagonist, sitting at the table tells you (and you are able to confirm this by a video taken by a nearby security camera run by a disinterested police officer), that he has managed to toss the same quarter (an American coin) thirty times and managed to get “Heads” {\bf ALL} of those times. And it was a fair coin!

Next, your good friend rushes to your side and whispers to you that this guy is actually one of a really \: large number of people (a little more than a billion) that were asked to successively toss freshly minted, scrupulously clean and fair quarters. People that tossed tails were “tossed” out at each successive toss and only those that tossed heads were allowed to toss again. This guy (and one more like him) were the only ones that remained.

What if the number of coin tosses was 100 rather than 30, with a larger number of initial subjects?

Clearly, you would be p-hacked if you ignored your friend.

p-values are used throughout science, but it is desperately easy to p-hack. It still takes a lot of intellectual honesty and, yes, seat of the pants reasoning and experience to know when you are p-hacking and when you are simply being rational in ignoring certain classes of data.

The q-value is a quantity that describes when a correlation is outside the bonds of the “null hypothesis” – for instance, one might have an economic reason why the fx/equity index correlation is a certain number. Maybe it is linked to the size of trade in/out-flows, tariff structure, the growth in the economy and other aspects. But then, it moves around a lot and clearly follows some kind of random process – just not the one described by the binomial model  It would clarify a lot of the nonsense that goes in to price and estimate economic value in products such as quanto options.

More on this in a future post.

Front image : courtesy Hilda Bastian, from this article


  1. Hi Satish,
    For the longest time I’ve used standard deviation of a binomial distribution as a proxy for the more complex calc you propose.
    sd = Sqrt(N*p*q)
    – For your 700/1000 case (p=.7, q=.3, N=1000) has sd = sqrt(210) ~= 14.5.
    so a shift of 100 is approx. 7 std dev from the mean.
    – for th 7/10 case,
    sd = sqrt(2.1) = 1.43
    a shift of 1 is about .7 std dev from the mean.
    The difference between 7 and .7 is significant.
    The 7/10 case could be random chance.
    Even MMQuarterbacking and changing N to 8 and making p=7/8 does not change it much,
    sd = sqrt(7/8) ~= 15/16, aprox 1. So a change from 7 successes to 6 or even 8 is just barely over 1 sd from the mean.

    1. I agree, this is a quicker estimate.
      I was just making the point (the smaller p-value is still not good enough, even for government work) that you can mess a lot with the p-value by choosing data points appropriately. We all know this from lab work in scientific fields – if you aren’t careful, the results disagree with well established principles or numbers and then the temptation is to mess with the methodology and select only “good” results. An example from real life is the saga of the e/m ratio in the Millikan oil-drop experiment.

Leave a Reply