# Finance

### Master Traders and Bayes’ theorem

Posted on Updated on

Imagine you were walking around in Manhattan and you chanced upon an interesting game going on at the side of the road. By the way, when you see these games going on, a safe strategy is to walk on, since they usually reduce to methods of separating a lot of money from you in various ways.

The protagonist, sitting at the table tells you (and you are able to confirm this by a video taken by a nearby security camera run by a disinterested police officer), that he has managed to toss the same quarter (an American coin) thirty times and managed to get “Heads” ${\bf ALL}$ of those times. What would you say about the fairness or unfairness of the coin in question?

Next, your good friend rushes to your side and whispers to you that this guy is actually one of a $really \: large$ number of people (a little more than a billion) that were asked to successively toss freshly minted, scrupulously clean and fair quarters. People that tossed tails were “tossed” out at each successive toss and only those that tossed heads were allowed to toss again. This guy (and one more like him) were the only ones that remained. What can you say now about the fairness or unfairness of the coin in question?

What if the number of coin tosses was $100$ rather than $30$, with a larger number of initial subjects?

Just to make sure you think about this correctly, suppose you were the Director of a large State Pension Fund and you need to invest the life savings of your state’s teachers, firemen, policemen, highway maintenance workers and the like. You get told you have to decide to allocate some money to a bet based made by an investment manager based on his or her track record (he successively tossed “Heads” a hundred times in a row). Should you invest money on the possibility that he or she will toss “Heads” again? If so, how much should you invest? Should you stay away?

This question cuts to the heart of how we operate in real life. If you cut out the analytical skills you learnt in school and revert to how our “lizard” brain thinks, we would assume the coin was unfair (in the first instance) and express total surprise at the knowledge of the second fact. In fact, even though the second situation could well have happened to every similar situation of the first sort we encounter in the real world, we would still operate as if the coin was unfair, as our “lizard” brain would instruct us to behave.

What we are doing unconsciously is using Bayes’ theorem. Bayes’ theorem is the linchpin of inferential deduction and is often misused even by people who understand what they are doing with it. If you want to read couple of rather interesting books that use it in various ways, read Gerd Gigirenzer’s “Reckoning with Risk: Learning to Live with Uncertainty” or Hans Christian von Baeyer’s “QBism“. I will discuss a few classic examples. In particular Gigirenzer’s book discusses several such, as well as ways to overcome popular mistakes made in the interpretation of the results.

Here’s a very overused, but instructive example. Let’s say there is a rare disease (pick your poison) that afflicts $0.25 \%$of the population. Unfortunately, you are worried that you might have it. Fortunately for you, there is a test that can be performed, that is $99 \%$ accurate – so if you do have the disease, the test will detect it $99 \%$ of the time. Unfortunately for us, the test has a $0.1 \%$ false positive rate, which means that if you don’t have the disease, $0.1 \%$ of such tested people will mistakenly get a positive result. Despite this, the results look exceedingly good, so the test is much admired.

You nervously proceed to your doctor’s office and get tested. Alas, the result comes back “Positive”. Now, ask yourself, what the chances you actually have the disease? After all, you have heard of false positives!

A simple way to turn the percentages above into numbers, suppose you consider a population of $1,000,000$ people. Since the disease is rather rare, only $(0.25 \% \equiv ) \: 2,500$ have the disease. If they are tested, only $(1 \% \equiv ) \: 25$ of them will get an erroneous “negative” result. However, if the rest of the population were tested in the same way, $(0.1 \%=) \: 1000$ people would get a “Positive” result, despite not having the disease. In other words, of the $3475$ people who would get a “Positive” result, only $2475$ actually have the disease, which is roughly $72\%$– so such an accurate test can only give you a 7-in-10 chance of actually being diseased, despite its incredible accuracy. The reason is that the “false positive” rate is low, but not low enough to overcome the extreme rarity of the disease in question.

Notice, as Gigirenzer does, how simple the argument seems when phrased with numbers, rather than with percentages. To do this using standard probability theory, one writes, if we are speaking about Events $A$ and $B$ and write the probability that $A$ could occur once we know that $B$ has occurred as $P(A/B)$, then

$P(A/B) P(B) = P(A)$

Using this

$P(I \: am \: diseased \: GIVEN \: I \: tested \: positive) = \frac {P(I \: am \: diseased)}{P(I \: test \: positive)}$

and then we note

$P(I \: am \: diseased) = 0.25\%$

$P(I \: test \: positive) = 0.25 \% \times 99 \% + 99.75 \% \times 0.1 \%$

since I could test positive for two reasons – either I really among the $0.25 \%$ positive people and additionally was among the $99 \%$ that the test caught OR I really was among the $99.75 \%$ negative people but was among the $0.1 \%$ that unfortunately got a false positive.

Indeed, $\frac{0.25 \%}{0.25 \% \times 99 \% + 99.75 \% \times 0.1 \%} \approx 0.72$

which was the answer we got before.

The rather straightforward formula I used in the above is one formulation of Bayes’ theorem. Bayes’ theorem allows one to incorporate one’s knowledge of partial outcomes to deduce what the underlying probabilities of events were to start with.

There is no good answer to the question that I posed in the first paragraph. It is true that both a fair and an unfair coin could give results consistent with the first event (someone gets $30$ or even $100$ coin tosses). However, if one desires that probability has an objective meaning independent of our experience, based upon the results of an infinite number of repetitions of some experiment (the so-called “frequentist” interpretation of probability), then one is stuck. In fact, based upon that principle, if you haven’t heard something contrary to the facts about the coin, your a priori assumption about the probability of heads must be $\frac {1}{2}$. On the other hand, that isn’t how you run your daily life. In fact, the most legally defensible (many people would argue the ${\bf {only}}$ defensible) strategy for the Director of the Pension Fund would be to

• not assume that prior returns were based on pure chance and would be equally likely to be positive or negative
• bet on the manager with the best track record

At a minimum, I would advise people to stay away from a stable of managers that simply are the survivors of a talent test where the losers were rejected (oh wait, that sounds like a large number of investment managers in business these days!). Of course, the manager that knows they have a good thing going is likely to not allow investors at all for fear of reducing their returns due to crowding. Such managers also exist in the global market.

The Bayesian approach has a lot in common with our every-day approach to life. It is not surprising that it has been applied to the interpretation of Quantum Mechanics and that will be discussed in a future post.

### Arbitrage arguments in Finance and Physics

Posted on Updated on

Arbitrage refers to a somewhat peculiar and rare situation in the financial world. It is succinctly described as follows. Suppose you start with an initial situation – let’s say you have some money in an ultra-safe bank that earns interest at a certain basic rate $r$. Assume, also, that there is a infinitely liquid market in the world, where you can choose to invest the money in any way you choose. If you can end up with ${\bf {definite}}$ financial outcomes that are quite different, then you have an arbitrage between the two strategies. If so, the way to profit from the situation is to “short” one strategy (the one that makes less) and go “long” the other strategy (the one that makes more). An example of such a method would be to buy a cheaper class of shares and sell “short” an equivalent amount of an expensive class of shares for the same Company that has definitely committed to merge the two classes in a year.

An argument using arbitrage is hard to challenge except when basic assumptions about the market or initial conditions are violated. Hence, in the above example, suppose there was uncertainty about whether the merger of the two classes of shares in a year, the “arbitrage” wouldn’t really be one.

One of the best known arbitrage arguments was invented by Fischer Black, Myron Scholes and Robert Merton to deduce a price for Call and Put Options. Their argument is explained as follows. Suppose you have one interest rate for risk-free investments (the rate $r$ two paragraphs above). Additionally, consider if you, Dear Reader, own a Call Option, with strike price $\X$, on a stock price.  This is an instrument where at the end of (say) one year, you look at the market price of the stock and compute $\S - \X.$ Let’s say $X = \100$, while the stock price was initially $\76$. At the end of the year, suppose the stock price became $\110$, then the difference $\110 - \100 = \10$, so you, Dear Reader and Fortunate-Call-Option-Owner, would make $\10$. On the other hand, if the stock price unfortunately sank to $\55$, then the difference $\55 - \ 100 = - \45$ is negative. In this case, you, unfortunate Reader, would make nothing. A Call Option, therefore, is a way to speculate on the ascent of a stock price above the strike price.

Black-Scholes-Merton wanted to find a formula for the price that you should logically expect to pay for the option. The simplest assumption for the uncertainty in the stock price is to state that $\log S$ follows a random walk. A random walk is the walk of a drunkard that walks on a one-dimensional street and can take each successive step to the front or the back with equal probability. Why $\log S$ and not $S$? That’s because a random walker could end up walking backwards for a long time. If her walk was akin to a stock price, clearly the stock price couldn’t go below 0 – a more natural choice is $\log S$ which goes to $- \infty$ as $S \rightarrow 0$. A random walker is characterized by her step size. The larger the step size, the further she would be expected to be found relative to her starting point after $N$ steps. The step size is called the “volatility” of the stock price.

In addition to an assumption about volatility, B-S-M needed to figure out the “drift” of the stock price. The “drift”, in our example, is akin to a drunkard starting on a slope. In that case, there is an unconscious tendency to drift down-slope. One can model drift by assuming that there isn’t the same probability to move to the right, as to the left.

The problem is, while it is possible to deduce, from uncertainty measures in the market, the “volatility” of the stock, there is no natural reason to prefer one “drift” over the other. Roughly speaking, if you ask people in the market whether IBM will achieve a higher stock price after one year, half will say “Yes”, the other half will say “No”. In addition, the ones that say “Yes” will not agree on exactly by how much it will be up. The same for the “No”-sayers! What to do?

B-S-M came up with a phenomenal argument. It goes as follows. We know, intuitively, that a Call Option (for a stock in one year) should be worth more today if the stock price were higher today (for the same Strike Price) by, say $\1$. Can we find a portfolio that would decline by exactly the same amount if the stock price was up by $\1$. Yes, we can. We could simply “short” that amount of shares in the market. A “short” position is like a position in a negative number of shares. Such a position loses money if the market were to go up. And I could do the same thing every day till the Option expires. I will need to know, every day, from the Option Formula that I have yet to find, a “first-derivative” – how much the Option Value would change for a $\1$ increase in the stock price. But once I do this, I have a portfolio (Option plus this “short” position) that is ${\bf {insensitive}}$ to stock price changes (for small changes).

Now, B-S-M had the ingredients for an arbitrage argument. They said, if such a portfolio definitely could make more than the rate offered by a risk-less bank account, there would be an arbitrage. If the portfolio definitely made more, borrow (from this risk-free bank) the money to buy the option, run the strategy, wait to maturity, return the loan and clear a risk-free profit. If it definitely made less, sell this option, invest the money received in the bank, run the hedging strategy with the opposite sign, wait to maturity, pay off  the Option by withdrawing your bank funds, then pocket your risk-free difference.

This meant that they could assume that the portfolio described by the Option and the Hedge, run in that way, were forced to appreciate at the “risk-free” rate. This was hence a natural choice of the “drift” parameter to use. The price of the Option would actually not depend on it.

If you are a hard-headed options trader, though, the arguments just start here. After all, the running of the above strategy needs markets that are infinitely liquid with infinitesimal “friction” – ability to sell infinite amounts of stock at the same price as at which to buy them. All of these are violated to varying degrees in the real stock market, which is what makes the B-S-M formula of doubtful accuracy. In addition, there are other possible processes (not a simple random-walk) that the quantity $\log S$ might follow. All this contributes to a robust Options market.

An arbitrage argument is akin to an argument by contradiction.

Arguments of the above sort, abound in Physics. Here’s a cute one, due to Hermann Bondi. He was able to use it to deduce that clocks should run slower in a gravitational field. Here goes (this paraphrases a description by the incomparable T. Padmanabhan from his book on General Relativity).

Bondi considered the following sort of apparatus (I have really constructed my own example, but the concept is his).

One photon rushes from the bottom of the apparatus to the top. Let’s assume it has a frequency $\nu_{bottom}$ at the bottom of the apparatus and a frequency $\nu_{top}$ at the top. In our current unenlightened state of mind, we think these will be the same frequency. Once the photon reaches the top, it strikes a target and undergoes pair production (photon swerves close to a nucleus and spontaneously produces an electron-positron pair – the nucleus recoils, not in horror, but in order to conserve energy and momentum). Let’s assume the photon is rather close to the mass of the electron-positron pair, so the pair are rather slow moving afterwards.

Once the electron and positron are produced (each with momentum of magnitude $p_{top}$), they experience a strong magnetic field (in the picture, it points out of the paper). The law that describes the interaction between a charge and a magnetic field is called the Lorentz Force Law. It causes the (positively charged) positron to curve to the right, the (negatively charged) electron to curve to the left. The two then separately propagate down the apparatus (acquiring a momentum $p_{bottom}$) where they are forced to recombine, into a photon, of exactly the right frequency, which continues the cycle. In particular, writing the energy of the photons in each case.

$h \nu_{top} = 2 \sqrt{(m_e c^2)^2+p_{top}^2 c^2} \approx 2 m_e c^2$

$h \nu_{bottom} = 2 \sqrt{(m_e c^2)^2+p_{bottom}^2 c^2} \approx 2 m_e c^2 + 2 m_e g L$

In the above, $p_{bottom} > p_{top}$, the electrons have slightly higher speed at the bottom than at the top.

We know from the usual descriptions of potential energy and kinetic energy (from high school, hopefully), that the electron and positron pick up energy $m_e g L$ (each) on their path down to the bottom of the apparatus. Now, if the photon doesn’t experience a corresponding loss of energy as it travels from the bottom to the top of the apparatus, we have an arbitrage. We could use this apparatus to generate free energy (read “risk-less profit”) forever. This can’t be – this is nature, not a man-made market! So the change of energy of the photon will be

$h \nu_{bottom} - h \nu_{top} =2 m_e g L \approx h \nu_{top} \frac{g L}{c^2}$

indeed, the frequency of the photon is higher at the bottom of the apparatus than at the top. As photons “climb” out of the depths of the gravitational field, they get red-shifted – their wavelength lengthens/frequency reduces. This formula implies

$\nu_{bottom} \approx \nu_{top} (1 + \frac{g L}{c^2})$

writing this in terms of the gravitational potential due to the earth (mass $M$) at a distance $R$ from its center

$\Phi(R) = - \frac {G M}{R}$

$\nu_{bottom} \approx \nu_{top} (1 + \frac{\Phi(top) - \Phi(bottom)}{c^2})$

so , for a weak gravitational field,

$\nu_{bottom} (1 + \frac{ \Phi(bottom)}{c^2}) \approx \nu_{top} (1 + \frac{\Phi(top)}{c^2})$

On the other time intervals are related to inverse frequencies (we consider the time between successive wave fronts)

$\frac {1}{\Delta t_{bottom} } (1 + \frac{ \Phi(bottom)}{c^2}) \approx \frac {1}{\Delta t_{top}} (1 + \frac{\Phi(top)}{c^2})$

so comparing the time intervals between successive ticks of a clock at the surface of the earth, versus at a point infinitely far away, where the gravitational potential is zero,

$\frac {1}{\Delta t_{R} } (1 + \frac{ \Phi(R)}{c^2}) \approx \frac {1}{\Delta t_{\infty}}$

which means

$\Delta t_{R} = \Delta t_{\infty} (1 + \frac{ \Phi(R)}{c^2})$

The conclusion is that the time between successive ticks of the clock is measured to be much smaller on the surface of the earth vs. far away. Note that $\Phi(R)$ is negative, and the gravitational potential is usually assumed to be zero at infinity. This is the phenomenon of time dilation due to gravity. As an example, the GPS systems are run off clocks on satellites orbiting the earth at a distance of $20,700$ km. The clocks on the earth run slower than clocks on the satellites. In addition, as a smaller effect, the satellites are travelling at a high speed, so special relativity causes their clocks to run a little slower compared to those on the earth. The two effects act in opposite directions. This is the subject of a future post, but the effect, which has been precisely checked, is about 38 $\mu$seconds per day. If we didn’t correct for relativity, our planes would land at incorrect airports etc and we would experience total chaos in transportation.