The universe in a grain of sand

This article attempts to explain a paper I wrote that is published in Europhysics Letters.

The English engraver William Blake in a piece of poetry, the Krishn{\bar a} stories, the colossal orders of magnitude of sizes from the humongous to the very small make us wonder if somehow the very large is connected to the very small.

A similar theme was explored in physics by a trio of scientists about twenty years ago. They looked at a puzzling problem that has been nagging a rather successful project to use quantum mechanics to explain the physics of fundamental particles. Called “Quantum field theory”, this marriage of quantum mechanics and special relativity (and later, some aspects of general relativity) is probably the most successful theory to emerge in physics in a long while. It has been studied extensively, expanded and some of its predictions in areas where it is possible to make very precise calculations are accurate to fourteen decimal places. It completely excludes the effects of gravity and predicts the precise behavior of small numbers of fundamental particles – how they scatter past each other etc.

The basic building block of the theory – the quantum field – is supposed to represent each kind of particle . There is an electron quantum field, one for the up quark etc. etc. If you want to study a theory with five electrons, that is just an excitation of the basic electron quantum field, just as a rapidly oscillating string on a violin has more energy than a slowly oscillating string. More energy in a quantum theory just corresponds to more “quanta” or particles of the field.

So far so good. Unfortunately, one inescapable conclusion of the theory is that even when the quantum field is at its lowest possible energy, there is something called “zero-point” motion. Quantum objects cannot just stay at rest, they are jittery and have some energy even in their most quiescent state. As it turns out, bosons have positive energy in this quiescent state. Fermions (like the electron) have negative energy in this quiescent state. This energy in each quantum field can be calculated.

It is, for every boson quantum field +\infty.

For every fermion quantum field, it is -\infty.

This is a conundrum. The energy in empty space in the universe can be estimated from cosmological measurements. It is roughly equivalent to a few protons to every cubic meter. It is certainly not \infty.

This conundrum (and its relatives) has affected particle physics for more than fifty years now. Variously referred to as the “cosmological constant” problem or its cousin, the “hierarchy problem”, people have tried many solutions. They need solutions, because if the energy were really +\infty for the boson field (since the universe probably started as radiation dominated with photons), the universe would collapse on itself. This infinite energy spread through space would gravitate and pull the universe in.

Solutions, solutions, solutions. Some people have proposed that every particle has a “super”-partner – a boson would have a fermion “super”-partner with the same mass. Since the infinities would be identical, but have opposite signs, they would cancel and we would hence not have an overall energy density in empty space (this would be called the cosmological constant – it would be zero). Unfortunately, we have found no signs of such a “super”-symmetry, though we have looked hard and long.

Others have proposed that one should just command the universe to not let this energy gravitate, as a law of nature. That seems arbitrary and would have to be adduced as a separate natural law. And why is tough to answer.

Can we measure the effect of this “energy of empty space”, also called “vacuum energy” in some way other than through cosmology? Yes – there is an experiment called the “Casimir effect” which essentially measures the change in this energy when two metallic plates are separated from each other starting from being extremely close. This rather precise experiment confirms that such an energy does exist and can be changed in this fashion.

One way to make the conundrum at least finite is to say that our theories certainly do not work in a regime where gravity would be important, From the natural constants G_N, \hbar, c (Newton’s gravitational constant, Planck’s constant and the speed of light), one can create a set of natural units – the Planck units. These are the Planck length l_P, Planck mass m_P and Planck time t_P, where

l_P = \sqrt{\frac{G_N \hbar}{c^3} }  \sim 10^{-35} meters,

m_P = \sqrt{ \frac{\hbar c}{G_N} } \sim 10\:  \mu \: grams\: \: \: , \: \: \:  t_P = \sqrt{\frac{G_N \hbar }{c^5}  }   \sim 10^{-44} secs

So, one can guess that gravity (represented by G_N) is relevant at Planck “scales”. One might reasonably expect pure quantum field theory to apply in regimes where gravity is irrelevant – so at length scales much larger than 10^{-35} meters. Such a “cutoff” can be applied systematically in quantum field theories and it works – the answer for the “cosmological constant” is not infinitely bigger than the actual number, it is only bigger by a factor of 10^{121}! What one does is to basically banish oscillations of the quantum field whose wavelengths are smaller than the Planck length.

Most people would not be happy with this state of affairs. There are other theories of fundamental particles. The most studied ones predict a negative cosmological constant, also not in line with our unfortunate reality.

About twenty years ago, three scientists – Andrew Cohen, David Kaplan and Ann Nelson (C-K-N) proposed that this vacuum energy actually should cut off at a much larger length scale – the size of the causally connected pieces of the universe (basically something one would consider the largest wavelength possible in our observable universe. In this way, they connected the really small cutoff to the really large size of the universe.

Why did they do this? They made the pretty obvious observation that the universe does not appear to be a black hole. Suppose we assumed that the universe were dominated by radiation. The energy inside should be (they said) the energy in the vacuum, up to this cutoff. But this energy should be confined to a size that should be bigger than, never less than, the “Schwarzschild radius” for this energy. The Schwarzschild radius for some energy is the radius of the ball that this energy should be confined to, in order that it collapses into a black hole.

C-K-N assume that there is a natural principle that requires that the size of the universe is at least equal to the Schwarzschild radius corresponding to all that energy. They then derive some consequences of this assumption.

First, my objections. I would have much rather preferred that the universe be MUCH bigger than this radius. Next, if this is indeed the case, surely some natural law should cause this to happen, rather than a post-hoc requirement (we are here, so it must have been so). That last bit is usually referred to as the “weak” anthropic principle. Anthropic principles have always seemed to me the last resort of the damned physicist – it can also be when you throw up your hands and say – if it weren’t this way, we wouldn’t be here. Its OK to resort to such ideas when you clearly see there is a lot of randomness that drives a physical process. Just not knowing the underlying physical process doesn’t seem the right reason to throw out an anthropic type idea.

Anyway, I cast the entire problem as one in thermodynamics of the entire universe and suggested that the universe is this way because it is simply the most advantageous way for it to arrange itself. This method also lends itself to other extensions. It turns out that if the material of the universe is not the usual type (say “radiation” or “matter”), it might be possible for us to actually find a reasonable estimate of the cutoff that is in line with current experiments (at least the vacuum energy is not off by a factor of 10^{121}, but only 10^{45} or so.

There is more to do!

Do we live in a hologram? How could we tell?

Do we live in a holographic universe? In other words, do we actually live in a universe with three spatial dimensions, or is this just an illusion and if you dig deeper, you could represent the universe as consisting of fewer (or even many more) dimensions. If we were really living in fewer than three dimensions, we’d call the universe holographic; if we had more than three spatial dimensions at small scales, we’d be living on a slice of a larger hyper-universe.

This sort of speculation has been fairly common within science, science fiction and philosophy. String theory is the latest and most original place where such analysis has occurred – certain types of string theory are believed to be consistent only in a certain number of spatial dimensions. In order to maintain sanity, therefore, the “other” dimensions are, for unknown reasons, supposed to have curled up into tiny rings that one cannot see unless one probes space at the Planck length scales \sim 10^{-33} cm.

For the physicist who doesn’t have a dog in the search for additional dimensions, but wishes to figure out if this hypothesis (of different number of dimensions at small length scales) is even remotely true, the big problem is this – we are certainly not able to construct particle accelerators of the size of solar system, which we would potentially need in order to probe nature at the Planck length scale. Maybe in the next million years! In the interim, what could we do to check this possibility? Could extra, or fewer sub-microscopic (“Planckoscopic”?) dimensions have some effects on phenomena that we could measure? I just wrote a paper on a possible proposal to check this that I will describe below.

Unfortunately, we certainly cannot look at distances of that small a scale. We can look at distances that are much larger, quite obviously. What would cause small lengths to blow up and become large?

Enter the theory of inflation. Not economic inflation, which seems to be a factor of life in the year 2022, but  the cosmic theory of inflation. This was invented by a few physicists, prominent among whom are Alan Guth, Andrei Linde, Paul Steinhardt and Alexei Starobinsky.

The problems they were trying to solve were as listed below –

  1. Why is the universe so flat (space seems to be exceedingly flat on large length scales, of the order of 100 million light years). Indeed, if it looks this flat now, it must have been exceedingly flat, even ridiculously flat, when it was very small. Indeed, if we think the universe’s evolution followed the classic model with just matter, radiation and the so-called dark energy then we cannot explain why it was so remarkably flat when it began, unless it was a special kind of universe.
  2. The simplest model of the universe with radiation, matter and dark energy matches observations, but has a problem – the expansion is slow, so if we looked at the Cosmic Microwave Background radiation from opposite directions in the universe, they should look different, since they were never in causal contact. The reason for this is this: the light we see in the background was produced when the universe was a couple of hundred thousand years old. If you track that back in time, anything that was a hundred thousand light years away from something else could have been in close causal contact with it. However, and this is the problem, if you expand a patch that was a couple of hundred thousand light-years (at the epoch when the Cosmic Microwave Background was produced) by the amount that the universe has expanded since then, based on what we believe happened, it turns out to be ten \: thousand times smaller than the current size of the visible universe. This means that patches of the sky more than a palm width away from each other in the sky were never in touch with each other, as they cannot connect by any means faster than light as far as we know.
  3. Here is an objection to this – how about at the big bang singularity when everything was compressed down to one point? The argument really needs to be supplemented – we don’t really know if there was a singularity with infinite density and the universe probably had a very different structure exactly at that point, so we ignore the singular point.
  4. To solve this problem, we need to have a universe that was very small and then suddenly expanded so quickly that things that became ten thousand times a couple of hundred thousand light years apart had still been in causal contact before this astonishing expansion.
  5. That is what inflationary theory posits – for some unknown reason, but with a lot of plausible assumptions, one can find a mechanism that just does that.  It explains most of the problems mentioned above, with some serious problems. For one thing, the explanation revolves around a special kind of particle called the inflaton, which seems to be a cousin of the Higgs boson, but we have never seen it or any other evidence for it . Next, the inflaton seems to have worked for a very short time, then given up the ghost, stopped inflating the universe, doesn’t seem to be around now and no one has a mechanism for why and how and when all this happened.
  6. Anyway, having listed the successes and failures, if you take the inflation idea seriously, the problems above can be considered solved. In fact, in the simplest versions of the theory, it appears that in about 10^{-34} seconds, the universe expanded by at least a factor e^{60} \approx 10^{26}. This made regions that should have been causally disconnected with the usual “slower” expansion actually causally connected from just before the inflationary interlude.

The rapid expansion caused kinks in the distribution of matter to even out and the curvature of space to flatten out. But not all – “Planckoscopic” fluctuations need to get straightened out. The big prediction of the inflationary theory was that these fluctuations would get very close to being completely straightened out (“scale-invariant fluctuations”). At long length scales, this seems to tie in to observations very well. My paper simply corrects this – if the world is actually built differently at small scales – if it has fewer degrees of freedom, fewer dimensions, this would change the way the fluctuations look when they are stretched out. However, since this difference is only seen for very small length scales, one would see this at rather small angular scales in the sky. The Planck and COBE satellite data played an important role in deducing the characteristics of these fluctuations in the Cosmic Microwave Background. However, as it turns out, it is just a little below the angular resolution to see the effect mentioned above and described in the paper – I await newer, higher-precision measurements.

A story of commutators

The conceptual step that took humans from their pre-conceived “classical” notions of the world to the “quantum” notion was the realization that measurements don’t commute. This means, as an example, that if you measure the position of a particle exactly, you cannot simultaneously ascribe to it an infinitely precise momentum.

This is not simply a statement about the ultimate accuracy of measurements. In fact, you can measure any one of these variables to as high a precision as you desire. The statement above represents a property of nature – the things that we call particles do not actually have an infinitely precise position and an infinitely precise momentum at the same instant of time. The simplest way to express that these (different observables like position and momentum) are “complementary” means of describing the results of measurements, is to represent observables as matrices. The possible results of measurements of these observables are eigenvalues of these matrices. In this language, states of the world are represented as vectors in the space on which the matrices operate.

Then, one can express the fact that one observable is not precisely determined if another one is, by requiring that the two matrices (that represent these observables) {\bf NOT} commute. If you recall linear algebra from high school, two matrices that do not commute have eigenvectors that aren’t all the same. Suppose you have a precise eigenvalue for a state that is an eigenvector of one observable (i.e., a matrix). Since it isn’t an eigenvector of another (non-commuting) observable (i.e., another matrix), that state is not one that has a precise eigenvalue for the other observable. Said in this way, the construction automatically implies that the two observables are complementary – a vector cannot generally be an eigenvector of two non-commuting matrices at the same time.

This is the origin of the famous commutator that people start off learning quantum mechanics with, {\it viz.} \:

[ x, p] = x p - p x = i \hbar

Here, x and p are the position and momentum of a quantum particle and the order of the operators implies that if one applies the momentum operator to a state (“measuring” its momentum), the answer one gets for the position operator is different from what one gets if one reverses the order of the measurement. In fact, Heisenberg’s famous uncertainty relation is a direct mathematical consequence of this operator equation. In the above \hbar is the redefined Planck’s constant \hbar=\frac{h}{2 \pi}.

To repeat what I just said – to make sense of his uncertainty relation, Heisenberg realized that what we think of as ordinary variables x, p are actually best represented as matrices – things people study in linear algebra. It is not common to have numbers that don’t commute. It is more common for matrices to not commute, where the order of multiplication of the matrices gives different resultant matrices. Matrices have eigenvectors and eigenvalues, so if a particle is (for instance) in an eigenstate of its position operator, then its position is specified to infinite precision. Its position is, then, the eigenvalue of the position matrix in that eigenstate.

The first non-trivial application of the (then) new quantum mechanical formalism was to the problem of the harmonic oscillator. This is the physics of the simple pendulum oscillating gently about its point of equilibrium.

{\cal E} = \frac{x^2}{2} + \frac{p^2}{2}

Here x is the position of the particle and p is its momentum. This formula usually has constants like the mass of the particle and a “spring-constant” – I have set all of them to 1 for simplicity.

A Russian physicist named Vladimir Fock realized a cute property. If you define

a = \frac{x +ip}{\sqrt{2 }} \\ b = \frac{x - ip}{\sqrt{2}}

then, using the basic commutator for x, p, we deduce that

a b - b a = \hbar

and the energy can be written as

\frac{\cal E}{\hbar} = b a + \frac{1}{2}

Now, a peculiar thing emerges. One can compute the commutator of a and b with {\cal E}, the energy “operator”. One finds

\frac{\cal E}{\hbar} b = b (\frac{\cal E}{\hbar}+1) \\ \frac{\cal E}{\hbar} a = a (\frac{\cal E}{\hbar}-1)

The interpretation of the above equation is simple. If you apply the operator b to a state of the harmonic oscillator with energy {\cal E}, the new state that emerges has one more unit of energy, in units of \hbar. The operator b increments (or “raises”) the system’s energy (and hence changes the system state), while the operator a decrements (or “lowers”) its energy by one unit of \hbar (and hence changes the state).

This would have just been an interesting re-write of the basic equation of the harmonic oscillator, until the physicist Paul Dirac used this language to analyze the electromagnetic field. He realized he could re-write the electromagnetic field (actually, any field) as a collection of oscillators. He then interpreted photons, the elementary quantum of the electromagnetic field as the extra bit of energy created by a suitably defined b operator for every oscillating wave-frequency. The quantum theory now expresses that the electromagnetic field can be described as a collection of identical photons, whose numbers can increase and decrease by 1, by the application of an appropriate b operator. Interactions between the electromagnetic field and other particles basically involve an interaction term with a b operator to create a photon. The photons are, in this picture of the world, regarded as the individual units, the “quanta”, of the electromagnetic field.

This is not just a re-write, therefore, of the basic energy function for an electromagnetic field. It is an entirely new view of how the electromagnetic field operates – so novel that this procedure is (for no apparent reason) referred to as “second quantization”. It solves the puzzle of why photons are all alike. They are all created by the same b operator. And the operator b \: a is called the “Number” operator – if there are {\cal N} photons in the electromagnetic field, then it is an eigenstate of this “Number” operator, with eigenvalue {\cal N}.

This formalism has been very fruitful in research into quantum field theory, which thinks of particles as the individual “quanta” of a quantum field. In this view, there is a quantum field for photons (a kind of boson that we interact with all the time with our eyes). There is a quantum field for electrons, a kind of fermion.

In fact, since photons are a kind of “boson”, the standard commutator for bosons is written as

a b - b a = 1

where the constant \hbar is absorbed by suitably redefining the operators a and b.

Then fermions were discovered. It turns out that they are better described by the commutator

a b + b a = 1

Due to the plus sign, this is usually referred to as an anti-commutator.

The plus or minus sign might seem like a small alteration, but it represents a giant difference. For instance, a bosonic harmonic oscillator can have a countable infinity of states – corresponding to the fact that you can make a beam of monochromatic laser light as intense as you want by having extra photons in the beam. A fermionic harmonic oscillator (with the plus sign), on the other hand, only has two states – one with no fermion and one with one fermion. You cannot have two fermions in the same state, a fact about fermions that is usually referred to as the Pauli exclusion principle.

Let me now proceed (after this extensive introduction) to the topic of this post. It is meant to explain a paper I published in the Journal of Mathematical Physics. In this paper, I study particles in two dimensions, which have properties intermediate between bosons and fermions in a manner akin to how anyons (particles in two dimensions) behave. I wrote a rather long blog post on anyons here, which might be a simple introduction. While, people have studied particles with intermediate statistics, I studied (with a lot of very inspiring discussions with Professor Scott Thomas at Rutgers) the algebra

a b - e^{i \theta} b a = 1

The parameter \theta is a constant and can be chosen to be some fraction of 2 \pi. I studied both rational and irrational fractions of 2 \pi. If we considered only rational fractions, then \theta = 2 \pi \frac{M}{N}. It turns out that we can safely study the case where M=1, as that covers all the possible rational cases well. In that case, notice that if \theta=0, N = \infty, the resulting commutator corresponds to the case of a bosonic oscillator (e^{i 0} =1). The case \theta = \pi, N=2 corresponds to a fermionic oscillator. Particles with intermediate statistics are represented by N=3,4,5.....

Some results emerge almost immediately. The number of states accessible to a harmonic oscillator that obeys the algebra for N=2 (the fermion) is exactly 2. The number of states accessible to a harmonic oscillator that obeys the algebra for N=\infty is \infty. So, quite as expected, the number of states accessible to a particle described by the algebra for a general N is, indeed, N.

These states are described by an “energy” function that is a set of complex numbers on the complex plane. These complex numbers lie on a circle. For N=2, there are only two points – its like a flattened circle. For N=\infty, the eigenvalues lie of a line, which may be thought of as the perimeter of a circle of infinite radius. For intermediate values of N, the circle has a finite radius that is bigger than 0.

The interesting part that emerges from this is an arithmetic and calculus that emerges from the algebra. Since a is a matrix (in our picture), suppose we consider it’s eigenvalues. Eigenvalues are found, as in elementary linear algebra, from the eigenvalue equation

a |{\xi}> = \xi |{\xi}>

Note something interesting – you are not allowed to have two fermions in the same state. This means that in any case, you cannot apply the a operator twice on the same starting state – it could not have more than one fermion anyway, so if you apply a lowering operator to a state with no fermions, you should get zero. So we must have

a^2 = 0

which immediately implies that

\xi^2 = 0

In addition, suppose we have two fermions in different states (different fermions). Let’s also suppose that a_1 and a_2 are the lowering operators for each fermion. Then, let’s state the defining equation,

a_1 a_2 |\xi_1, \xi_2> = \xi_1 \xi_2 |\xi_1, \xi_2>

However, we have an additional requirement for fermions. The wave-function for two fermions needs to be anti-symmetric with respect to the exchange of the fermions. If I switched the order of the fermions in the starting state, I should get an extra minus sign. i.e., we need, for a two-particle state with one fermion in state 1 and one fermion in state 2,

|1_{(1)}, 1_{(2)}> = - |1_{(2)}, 1_{(1)}>

Let’s apply a cute trick. Applying the lowering operator for state 1 or state 2 reduces the numbers of fermions to 0 in each state, i.e.,

a_2 a_1|1_{(1)}, 1_{(2)}> = |0>

where |0> is the vacuum state, with nothing in it. It makes sense that there is only one vacuum state – at least in this simple situation. So, for consistency, if we have the exchange rule for states 1 and 2, we must have a similar exchange rule for lowering operators, i.e.,

a_1 a_2 = - a_2 a_1

But this means that

\xi_1 \xi_2 = - \xi_2 \xi_1

These \xi‘s cannot be simple numbers! Their square is 0 and they anti-commute with each other.

Such numbers are called Grassmann variables.

In my interpolation scheme, one very naturally comes up with numbers whose N^{th} power is 0 (i.e., \xi^N=0) and which commute as in

\xi_1 \xi_2 = e^{i \theta} \xi_2 \xi_1

where \theta =2 \pi \frac{M}{N}. I call these generalized Grassmanns – while such concepts have been thought of before, it is useful to see how things like integration and tools of the differential calculus can be generalized and smoothly go over from the fermionic end to the simple, intuitive, bosonic end.

Another consequence of this work is obtained by generalizing the concept in this post to anyons, real particles in confined two-dimensional spaces. The analysis above speaks of exchange of particles. Anyons are a little more complex. You can exchange anyons by taking them “under” and also “over” as shown in the picture below.

These correspond to a change of sign of \theta, so anyons can be represented by a combination of the commutators

a b - e^{i \theta} b a = 1

as well as

a b - e^{-i \theta} b a = 1

simultaneously. We therefore cannot think of a and b for an anyon (the lowering and raising operators) as matrices in the usual sense. They are much more complex objects!

Now, let’s consider two identical anyons propagating from a starting point to an ending point. If they were classical objects, they would just go and there would be one path connecting start to finish. But these are quantum objects! We need to sum over past histories, i.e., all the paths to go from the start to the finish. And when anyons travel, they could wind around each other. As in the figure below


Now, we sum over all the possible ways

An amazing thing happens. If \theta = 2 \pi \frac{M}{N} and N were {\bf even}, the sum over histories gives {\bf 0} for the probability of propagation. {\it Even-denominator \: anyons \: cannot \: propagate \: freely!}.

And here is the application – it is not easy to see the even – denominator fractional quantum Hall effect. The connection between the math and the quantum Hall effect is not transparent (from what I discussed above), but suffice it to say that the particles that give rise to the fractional quantum Hall effect are anyons and the fractions we obtain are exactly the fractions (\frac{M}{N}) here. This geometrical observation “explains” the difficulty of observing the effect for even-denominator anyons.

Schrodinger’s Cat Lives again!

This article concerns a new paper I just submitted and now published . It concerns a peculiar feature of quantum mechanics (and also of classical mechanics).

The feature is this. The Laws of Physics appear indifferent to the direction of time. If you play a video of two balls colliding elastically with each other, you could play the video backwards and the event would look perfectly reasonable. Of course, if you scale the event up and show the reverse of a video of two balls made of glass colliding and shattering into a million little pieces, such a video would look mighty weird. No one has actually seen a million little pieces of glass crashing into each other and coalescing into an object shaped like a ball.

On the other hand, if the ball broke up into, say, two pieces, it just might look possible, though rare, for an event to occur where two pieces of glass collided and got glued into one ball.

The distinction is merely of size. If there are a very small number of particles involved in a physical event, you could well have both the event and its time-reversed version occur without problem, However, if there were a macroscopic number of particles involved, there would be an infinitesimal chance of recurrence, though it would be possible in principle.

Suppose there were N particles involved that could (each) be either in one unique ordered state or M other disordered (broken up in various ways) states. Then there is one way for them to be together and N^M ways of them being broken up. From pure arithmetic, if N \sim 100, M \sim 1000, this is a number with 10000 digits to the left of the decimal point – it is absolutely humongous! So the chance that a system that starts in one of those large number of states happens to, by chance, end up in the one, unique ordered state, is exceedingly unlikely?

How unlikely? If we sampled one final state every second, it would take us 100^{1000} seconds, which is 10^{2000} years. The universe is only 10^{10} years old or so, so this is 10^{1990} universe lifetimes. If you stick around long enough, it will come back to order, but you might need a lot of pizza while you are waiting!

Similarly, in quantum mechanics, the equations are time-reversal invariant. Just, for clarity, the universe (and the laws that govern our universe) doesn’t happen to be time-reversal invariant. However, the basic laws of quantum mechanics, which is a framework that allows one to write down the laws that govern physics in our universe, are time-reversal invariant. We just have, in our universe, some \: \: {\bf {particular \: \: interactions}}\: \: that operate differently and explicitly break this symmetry.

It is a mathematical tour de force to obtain, starting with basic equations that are time-reversal invariant, phenomena that are clearly irreversible (in any practical sense). In a very precise sense, the Schrodinger’s cat experiment is one such. If you don’t know the experiment, read this. People have tried to explain the peculiar consequences of this experiment in many ways, including crazy ideas such as one needs to be a conscious being to know that a cat is dead. In the paper mentioned above, I analyzed the classic double slit experiment to check whether an electron has gone through one slit or another and figured out how, as the measuring apparatus gets bigger and bigger, one sees that the measuring apparatus itself gets driven to one (electron’s gone through slit 1) or the other (electron’s not gone through slit 1) state. And the key in this is that measuring apparata add energy to a system – without adding energy and amplifying weak signals, one doesn’t know if one is measuring an event or measuring noise.

And it just means that if the cat in Schrodinger’s eponymous experiment were just a few atoms big, one could look inside, see it “dead” (properly defined for a 5-10 atom-sized cat) and then look again a few minutes later and see it “not-dead”. Of course, for a real cat, that will take much too long and for all practical purposes, the cat is indeed “dead” or “alive”.

Is there something like a fundamental limit of how little energy you need to input to “collapse” a wave-function? That is something the uncertainty relationship points to, but it is worth thinking about. Clearly, it means one should define what “collapse” means? For our lifetime? For the universe’s lifetime? For a large multiple of the universe’s lifetime?

Read on!

Three pieces – and some puzzles

I just finished a bit of summer reading – in particular, three books with very similar scope. The first is by a well-known physical chemist and author – it is called “Four Laws that drive the Universe“. The second is by a well-known quantum information expert, and called “Decoding reality: The universe as quantum information“. The third is an article in a philosophy of science journal from July 2019, it is called “A hot mess“, about what the author believes is a series of fatal flaws in Landauer’s computation of the entropy change due to computation. I discuss these below – indeed I thought up some puzzles that might help expand the material.

Atkins’ book (not the nutritional expert, but an Oxford don) is a gem amongst stale books on thermodynamics. He brings the four laws of thermodynamics and their statistical mechanics equivalents to life with lucid descriptions and simple examples. He uses math very judiciously, not beyond fractions and algebra and {\bf {explains}}  the material well. It is worth a careful read, not a beach read!

These books (the first, in particular) bring to mind a bunch of puzzles that consumed me when I first learnt cosmology.

Here is {\bf {Puzzle \#1}}.

As we all are aware, the evidence of increasing red-shifts of increasingly distant galaxies points to an expanding universe. When popular books discuss the expanding universe, they immediately say that the universe had a “hot” beginning, and it “cooled, as it expanded”.

Read this peculiar series of posts on an internet site about this topic. This is (yet) another reason why you shouldn’t believe what you read on the internet, including this article, unless you understand it yourself!

In these articles, the analogy is made to a hot gas in a cylinder with a piston, that cools as the piston is withdrawn – the molecules of the gas do work on the piston wall (assume the piston moves out really slowly, so the process does not heat up the piston, just slowly pushes it out). Here’s a picture of the gas-filled container with the piston (marked in green)


and then, here’s a picture of a gas molecule bouncing (elastically, like a good tennis ball) off the piston


It is easy to see why the speeds of the molecule post-bounce off a moving piston are as shown. First, if the piston were {\bf {stationary}}, were the molecule approaching with velocity (+ -ve to the right, --ve to the left), {\it V}, it would bounce off with velocity {\it -V}, i.e., a speed {\it V} in the opposite direction. The change in velocity of the particle would be -2 \times {\it V}.

Consider the situation if the piston were to move (to the right) with a velocity {\it v}. An observer sitting on the piston (using Galilean/Newtonian relativity), would see that the same particle was approaching the piston with a velocity {\it {V-v}} and leaving it with velocity {\it {-(V -v)}}. Translating this back to the frame of the container (with respect to which the piston is moving to the right, with speed {\it v}), the molecule bounces back with velocity {\it {-V+2 \times v}}. The change in velocity of the molecule is {\it {-2 \times V + 2 \times v}}. Notice the slight difference when you considering a slowly moving piston!

Consider the situation when the piston is stationary and we set {\it v}=0.

If there were {\cal N} collisions of molecules (each of mass m) with the piston per second per unit area of the piston, then, in time \Delta t, the momentum transferred to the piston would be \Delta P = {\cal A} \: m \: {\cal N} \Delta t \times 2 \times {\it {V}}. Here {\cal A} is the area of the piston face. This means the instantaneous force on the piston while it is moving would be {\cal F} = \frac{\Delta P}{\Delta t} = {\cal A} \: m \: {\cal N} \times 2 \times {\it {V}}. The {\bf pressure} on the piston, which is the average force per unit area of the piston’s face, would be {\cal P} = Average \bigg( m \: {\cal N} \times 2 \times {\it {V}} \bigg) .

The upshot is that the pressure on the piston comes from molecules of gas randomly hitting it. This is the “statistical-mechanics” view of the macroscopic quantity “pressure”.

Why does an expanding gas cool? It cools because there is a force from the gas molecules on the piston and when the piston moves to the right, the forces have done work on the piston. This work is produced at the expense of the internal energy of the gas (the molecules are moving slightly slower than they were earlier). To show this, observe (based on the above diagram) that the energy of each molecule goes from \frac{1}{2} m {\it V}^2 to \frac{1}{2} m {\it {(V-2 \times v)}}^2, since each molecule slows down after hitting the piston wall. Then in time \Delta t, the total energy of the gas has gone {\bf {down}} by approximately 2 \times {\cal {NA}} \: m \: {\it {Vv}} \Delta t (which we get by expanding the above square to linear order in {\it v}). This expression can be re-written as {\cal P} \Delta {\cal V}, where \Delta {\cal V} is the change in the volume of the container.

So, an expanding gas cools because it performs work on the piston. Where is this “piston” that the contents of the universe performed their work on during the expansion. There is no such thing.

The universe cooled as it expanded because the expansion means something different from what it means for the closed container above. This is explained with a rubber-sheet analogy in a previous post, but to quickly summarize, think of points in the universe like points on a rubber sheet. When the rubber sheet is pulled apart, the points move apart. The {\bf {scale-factor = a(t)}} measures the “scale” of the universe and is usually written as a function of the cosmological time (since the Big-Bang). As the universe expands, if you think of a particle that was travelling at 10 \frac{m}{s}, it now travels a smaller fraction of that distance in the same time, as those meter-grid-points are now further apart! So it travels slower, has lower energy! And if you think of the particle as a wave (think of wave-particle duality), the wavelength of the wave is stretched out as it travels – {\it {ergo}}, the wavelength gets longer, the frequency gets smaller and the energy of the particle represented by the wave gets smaller, in exactly the same way. Hence cosmic expansion “cools” the hot universe.

On to {\bf {Puzzle \#2}}.

When one learns thermodynamics, one hears the Clausius definition of entropy. It is a “state-function”, which means it can be uniquely defined at every specific macroscopic state of a system. Such state-functions are valuable since they serve as light-houses for us to compute useful things like how much work can be extracted from a system, or how much heat will accompany such work.

Entropy (or the change thereof) is written as \Delta S = \oint \frac{d Q}{T}, where dQ is the amount of heat added to the system, while T is the temperature.

I’ve always wondered, if this is a state-function, why not consider functions (dQ, dU are incremental heat added and incremental internal energy) whose change is \oint \frac{d Q+dU}{T} or different powers of the denominator? Why aren’t they as useful as the entropy?

Atkins’ book is the only one I have seen that actually mentions anything about variations. The reason why the other functions aren’t actually interesting is not because they aren’t (for some reason), not state functions or something like that. The original entropy definition is the only one that can be re-interpreted as being the {\bf {logarithm}} of the number of accessible microscopic states. As it turns out, that is useful for a myriad of other reasons, but not the only way to think about thermodynamics.

Here’s an example. In the 1800s a fellow named Carnot proved that a heat engine’s maximum efficiency is achieved by a special kind of engine. For doing so, he used an engine that was run through a cycle of isothermal (constant-temperature) and isentropic (constant entropy) processes. These are depicted in the P-V diagram below. Briefly, one carries out an isothermal expansion, with the gas doing work and absorbing heat from a heat source, followed by isentropic expansion, cooling the gas, but with the gas still performing work. Then one performs isothermal compression, shedding heat to the heat-sink and with work being done on the gas inside the cylinder. This is followed by isentropic compression, with the gas getting warmer. This is roughly the order of operations followed by a four-stroke engine.

The work done by this Carnot engine is equal to the area (depicted by a blue splash) in the above diagram. That area is bounded by different lines that are either isothermal or isentropic processes.

However, if we invented a new quantity, call it “F_{entropy}“, whose change in a process is dF_{entropy} = \int \frac{dQ+dU}{T}, then we would simply find a new set of curves (call them iso-F-entropic processes) in the above P-V graph. Would we find a more efficient F-Carnot engine by this mechanism?

There is a simple argument that we would not. For if we did, we would simply run the less efficient Carnot engine, do some work and dispose of some heat in the low-temperature heat-sink. Then run the more efficient F-Carnot engine as a refrigerator, to use less of the work to transfer the above disposed of heat from the low-temperature sink to the high-temperature source. This would be a machine that would violate the Second Law of Thermodynamics – it would take heat from a heat source and convert all of it to work. So these other versions of entropy wouldn’t really change anything – its enough to use the version that additionally has the connection to the number of microscopic states.

The last topic I want to discuss, possibly in a future post, has to do with the thermodynamics of computation. Briefly, Rolf Landauer (in the 60s at IBM) deduced that there is an absolute minimum of heat (and entropy) that is generated when a computation is performed. He connected this to something called Maxwell’s demon, which was a thought experiment constructed to explicitly break the Second Law of Thermodynamics. He (Landauer) then showed how ordinary entropy of disorder could be connected to the entropy of information, in a really concrete way. The article referred to above tries to make the case that this connection is weak, primarily because the “message” in information theory is too small, not macroscopic in size. I am not convinced, but think a longer post is essential to discuss it.

Tales of Karatsuba

This article is based on a brilliant essay in Quanta magazine, behind which lies a lovely story of a mathematical discovery. I take the essay a little further and describe an algorithm we posted on arxiv to increase the speed of the calculation even further (though not as fast as the fastest method there is).

You probably learned to multiply numbers in elementary school. If you multiplied two 4-digit numbers, you learnt to multiply the second number’s unit digit (3 in the below) like so


You’d then progress on to the tens digit, repeat the same procedure (multiply {\it it} by each digit of the first number. While doing so, you remember to write out the digits of the second multiplication by one place to the left and on and on. If you were multiplying two n digit numbers, you would need to perform n^2 single-digit multiplications.

Multiplication is one of the most well-understood mathematical operations, it was thought, when the famous mathematician and physicist Andrey Kolmogorov gave a series of lectures on complexity theory at the Moscow State University in the 1960s. One of the students listening to the famous man speak about the order of the number of operations required to perform complex arithmetic tasks was a fellow called Anatoly Karatsuba. He heard Kolmogorov describe multiplication as an {\cal O}(n^2) operation for two n-digit numbers. Later he went home and over the week of lectures managed to prove a much faster way to multiply two numbers. He wrote this up for Kolmogorov, who, suitably impressed, mentioned it in other lectures. Kolmogorov then wrote it up for publication, as the contribution of Karatsuba and another student Yuri Ofman, without their knowledge – they heard only from getting printer proofs!

The algorithm they invented is very simple to explain for four digit numbers. Suppose we wanted to multiply x and y, where

x=x_0 \times 10^0 + x_1 \times 10^2

y=y_0 \times 10^0 + y_1 \times 10^2

where x_0, y_0, x_1, y_1 are two-digit numbers. The product of these two numbers is

x y = x_0 y_0 \times 10^0 + (x_0 y_1 + x_1 y_0) \times 10^2 + x_1 y_1 \times 10^4

One would normally think that one would need to perform four multiplications and two additions to compute this, but note, as Anatoly Karatsuba did, that this could be written as

x y = x_0 y_0 \times 10^0 + \left( (x_0 + x_1)(y_0+y_1) - x_0 y_0 - x_1 y_1 \right) \times 10^2 + x_1 y_1 \times 10^4

which therefore needs only three multiplications and three additions.

Generalizing this “divide-and-conquer” strategy results in a systematic recursion relation between the complexity of the n-digit number multiplication and the \frac{n}{2}-digit multiplication, i.e.,

{\cal M}(n)=3 {\cal M}(\frac{n}{2}) + {\cal O}(n)

where the last term captures the order-n additions that have to also be performed. This function can be computed for large n and yields

{\cal M}(n) \sim n^{\log_2 3} = n^{1.58}

The method is elegant and simple, but has since been superseded by far more analytically complex Fourier transform algorithms that are {\cal O}(n  (\log_2 n) (\log_2 \log_2 n)) and indeed {\cal O}(n \log_2 n) itself in complexity.

{\underline {But \: there \: is \: an \: even \: simpler \: way \: to \: make \: a \: faster \: algorithm}}. Note that if you {\bf precomputed} \: {\underline all} possible multiplications of n-digit numbers, you would need to compute 10^{2n} entries into a giant matrix. It would then take you \log_2 10^{2n} searches to find the particular element you wanted to find, which is of {\cal O}(n).

That is the trick underlying the route I suggested along with my co-author (full disclosure, it is my brilliant brother and published author) in our article on the arxiv.

Error Correcting Codes and the Quantum version

This article was inspired by a very nice article in Quanta magazine (by Natalie Wolchover) about the connection between error-correction codes and space-time. I thought the quantum mechanics concepts were glossed over, so decided to expand on it a little.

Electrical engineers and digital-signal-processing engineers study error correction for a simple reason – the same reason why we study language (both spoken and written) carefully for many years in school.

Ist bcuz wee somtmes rceeve grbled mssgses frm othrs tht we neeed to enderstnd, andd qickyly!

When the first efforts were made with digital computers, engineers realized that the problem of understanding garbled messages was slightly easier in a computer. The “language” is pretty simple – the alphabet is 0,1 and “words” are of the ASCII variety with a fixed number of “letters”. While the number of “letters” in every word was 7 (0,1 digits) at the start of the system, it now extends to as many as 32 letters. This has been achieved in a backward compatible way, which is why you have never had to “upgrade” your ASCII character set in your PC and can use a 20 year old Windows PC reasonably easily to read a newer Word document apart from a few newer encoded characters.

If I sent you the following 7-letter word 011 1001, you might receive it with a one-bit error, as 111 1001. How could you figure that there has been an error? The simplest solution involves appending a “parity” bit. You append a single digit (a 1 or a 0) to the word according to a simple formula. If the original word had an even number of 1‘s, you append an extra 0. If the original word had an odd number of 1‘s, you append a 1. This extra bit, called a “parity” bit, allows the system to automatically catch a single-bit error. Why is that?

In the above example, if the 8-bit word sent was 0 011 1001, but was received as 0 111 1001, you immediately know that there is an error in transmission. This would happen even if the one-bit error were to happen for the “check-bit” itself. At this point, you could ask for a new transmission of error-ridden words, or alternatively if you had the same word sent an odd number of times, perform a “majority” poll of the same word without an obvious one-bit error.

The theory of error correction is a mature area of engineering. It involves catching and correcting errors in more than one-bit, doing the same for errors that occur in bursts, so that successive 8-bit sequences have multiple errors each followed by relatively error-free intervals etc. Several interesting types of codes exist – for instance there is an entire family of error-correcting codes called Hamming codes, where a d-bit code can correct upto d-1 bit errors. These techniques are the reason why your phone calls don’t sound totally garbled, even when you call really long-distance over multiple types of transmission media. It is the basis of the wonder of modern communication.

Quantum computers take this theory to a new level. A quantum computer operates on the principle that a quantum system can be in several orthogonal (read “distinct”) states, or propagate along different trajectories, at the same time. This is called a superposition of states. Very roughly, the “state” of a computer is specified by the contents of its registers, its memory elements and the states of its various connections. A computation consists of letting the total state evolve according to a program.

Suppose some of these registers were quantum objects, then you could allow them to be in several states at the same time, each state serving as an input to a  program. If all these states fed the program together, the quantum computer would evolve all the states, producing the final output states, all in one giant superposition. The only problem would now be to extract the proper answer (maybe the shortest path through a maze) from this collection of answers. The extraction of the answer is not a trivial problem and is sometimes as difficult as writing the algorithm itself.

As an example of this superposition, if an electron were aimed at at two slits (slit \#1 and slit \#2) on a wall, to agree with subsequent observations, we have to assume that it is able to go through both the slits at the same time. The state of the electron after it passes the slits is written this way

| {\: Electron \: state \: post \:  the \:  slits} > = \frac{ |Electron \: in \: state \: post \: slit \: \#1>+|Electron \:  in \: state \: post \: slit \: \#2>}{\sqrt{2}}

Electrons possess a property called spin. They could be in an up-spin state or a down-spin state. Electrons are also completely indistinguishable from each other. If an electron is one of a pair of electrons, with total spin of zero (one is up and one is down), then each electron is in a superposition of up- and down-states. The state of each electron can be written as

| {\: Electron \: state} > = \frac{ |Electron \: in \: up-spin \: state>+|Electron \:  in \: down-spin \: state >}{\sqrt{2}}

which we can write as \frac{ |0>+|1 >}{\sqrt{2}}.

A quantum computer would have several such particles (say \#1, \#2, \#3). Let’s say they were in a correlated set of states, i.e., \frac{ |000>+|111 >}{\sqrt{2}}. The computation algorithm requires these states to stay uncorrupted, undisturbed, except acted upon by the program. Even if the program weren’t to run in the quantum computer, we might progressively notice changes in these states due to interactions with other particles from the outside, vibrations from a nearby highway etc.  These are errors – we don’t want anything but the actual program to change these states. If we had a one-bit error of the sort we saw before, we’d end up with one of the spins being flipped, so we could end up in any one of the states \frac{ |100>+|011 >}{\sqrt{2}} (first one flipped) , \frac{ |010>+|101 >}{\sqrt{2}} (second one flipped) or \frac{ |001>+|110 >}{\sqrt{2}} (third one flipped). To reliably use the quantum computer, we will need to correct the states to the starting one – and then we are faced with a problem common to quantum systems. If we “measure” the state in order to find which state is flipped, the quantum system “collapses” to one or the other state. Hence, in the first case where the first spin was flipped, we’d end up with either the state |100> or |011 > which would be good except that we just disrupted our quantum computer.

We need to find a method to figure out which spin was flipped without actually collapsing the state, so that we can then apply a “quantum-friendly” approach (maybe using an appropriately placed magnetic field to flip a spin) to then bring the state back to its original “state”.

The lore in quantum mechanics (the way Nature really is) is that if you ask a question of a quantum system that allows it to maintain its anonymity, i.e., not require it to “collapse” into one of the states in its superposition, then the state stays in superposition. The way to do this is to “entangle” the state with another particle.

Suppose there were a particle that would end up in the “up” state if the above particles \# 1 and \# 2 were in the same state and in the “down” state if not. And let’s suppose there is another particle that would end up in the “up”  state if particles \# 1 and \# 3 were in the same state and in the “down” state if not. Let’s run these two “test” particles against the quantum computer’s particles in their pristine state, i.e., \frac{ |000>+|111 >}{\sqrt{2}}. You can quickly check that both the test particles would end up in the up-state as a result of their interaction with the above computer’s particles. Additionally, if you measured the states of the test particles, you cannot figure out the exact spins of the particles in the quantum computer. All you can find out is that particles \#1, \#2 and particles \#1, \#3 are in the same states. This allows the quantum computer’s particles to maintain their quantum “anonymity” and their state is undisturbed {\underline {even \: \: \: though \: \: \: the \: \: \:  spins \: \: \: of \: \: \: the \: \: \: test \: \: \: particles \: \: \: are \: \: \: measured}}.

The interesting consequence results when you apply these test particles to the one-bit-error states \frac{ |100>+|011 >}{\sqrt{2}} (first one flipped) , \frac{ |010>+|101 >}{\sqrt{2}} (second one flipped) or \frac{ |001>+|110 >}{\sqrt{2}} (third one flipped).

In the first case, after interacting with the state \frac{ |100>+|011 >}{\sqrt{2}}, the first test particle ends up “down” with the second test particle “down” too.

In the second case, after interacting with the state \frac{ |010>+|101 >}{\sqrt{2}}, the first test particle ends up “down” with the second test particle “up”.

In the second case, after interacting with the state \frac{ |001>+|110 >}{\sqrt{2}}, the first test particle ends up “up” with the second test particle “down”.

These are distinguishable results, don’t affect the quantum state of the quantum computer and now can be used to gently bring  the one-bit error states back to their pristine condition, i.e., \frac{ |000>+|111 >}{\sqrt{2}}, so the quantum computer can continue to operate.

One-bit errors can thus be detected and corrected by this simple procedure, which is however quite ponderous to implement.

These ideas have connections to current theories of black holes that are best described in a future post.


Modulo arithmetic & cards

Another week, another Manjul Bhargava delight.

Arithmetic is usually taught in base 10. We have 10 unique symbols (0,1,2,3,4,5,6,7,8,9) and a place value system with the lowest value being 10^0=1, the next being 10^1=10, the next being 10^2=100 and so on. So a number like 145=1 \times 10^2 + 4 \times 10^1 + 5 \times 10^0.

So far so good, but you could do this with a different base. Let’s say you used base – 3. That means you would need 3 unique symbols (people usually use 0,1,2, but they could well use ⊕,ψ,◊ if you desired – just convenience, familiarity with the traditional symbols make us use 0,1,2). Then, we use a place value system, just using 3^0=1 as the lowest value, 3^1=3 as the next value place, followed by 3^2 and so on.

Next, one generalizes the notion of equality of numbers to something called congruence. For this concept,  we use the \equiv symbol, rather than =. Two numbers a and b are equal if they are, well, equal! But, they are congruent, modulo N, if they both give the same remainder when divided by N. This is written as a  = b \: mod \: N.

For instance, 16=1\:  mod \: 3, 17 = 2 \: mod \: 5, ... etc.

The thing most people learn in high school is that the sum and difference (or in general, any linear combination) of two numbers (say a, b) modulo N is just the sum or difference or the same linear combination of a \: mod \: N and b \: mod \: N. In equation form, (\alpha \: a + \beta \: b) \: mod \: N = \alpha (a \: mod \: N) + \beta (b \: mod \: N). This is called the distributive law of modulo arithmetic.

An additional property that also is true is that the {\bf product} of two numbers a, b modulo N is also the product of each of the two numbers a and b individually modulo N. Again, in equations,

(a \: b) \: mod \: N = (a \: mod \: N) (b \: mod \: N)

Proving this is easy. If you write a, b as multiples of N plus remainders, as below, where Q_a, Q_b are integers in base-10, while R_a, R_b are in the range 0, 1,2,...(N-1) as good remainders should be!

a = Q_a N + R_a

b = Q_b N + R_b

Then the product of these modulo N is (Q_a Q_b N^2 + Q_a R_b N + Q_b R_a N + R_a R_b) \: mod \: N. The first three terms are all multiples of N and we use the distributive law to reduce those “mods” to 0 remainder when divided by N. The last term is R_a R_b \: mod \: N, which is exactly (a \: mod \: N)(b \: mod \: N).

A simple consequence of this is that if you have any number, say with four digits, which can be written as 1000 a + 100 b + 10 c + d, that is equal to (999a+99b+9c) + (a+b+c+d). Now, since 3 or 9 divides 999,99 and 9, all we need is to have a+b+c+d \: mod \: 3=0 for the number to be divisible by 3. Alternatively we need a+b+c+d \: mod \: 9 =0 for the number to be divisible by 9. That’s the origin of a simple high-school mnemonic.

This is the basis of a simple trick.

Consider a number, say 576281. Ask someone to jumble up the digits and subtract the smaller number from the larger number. So, let’s say I get 156728 and then compute 576281 - 156728 = 419553. Ask the person to hide one of the digits and reveal the others. You can easily guess the hidden digit. The reason is simple – the difference of these two numbers is going to be a multiple of 9. So the sum of the digits is going to be a multiple of 9, which allows one to guess the digit. And why is the difference a multiple of 9? – Suppose we start with the number 10^6 a + 10^5 b + 10^4 c + 10^3 d + 10^2 e + 10^1 f + g and jumble up the digits to get 10^6 b + 10^5 c + 10^4 e + 10^3 d + 10^2 f + 10^1 g + a. If we subtract the second from the first, we will get a (10^6-1)+b (10^5-10^6) + c (10^4-10^5)+d (0) + e(10^2-10^4)+f(10^1-10^2)+f(1-10^1). Each of the bracketed differences is a multiple of 9, then we just use the product rule discussed above.

This brings us to the card trick discussed a few posts before.

Consider that a sequence of cards is placed face down in two piles, with the cards cut in two. The first pile, top to bottom, is, say Ace, King, Queen, Jack of Hearts and the other pile has exactly the same (the other half of the torn cards), but reversed in order, so it would be Jack, Queen, King, Ace of hearts. Place the first set of cards next to the second set.

Now, use the following phrases GARDNER, BHARGAVA, MAGIC. Break each word wherever you want (say GARD-NER) and as you enunciate every letter (G-A-R-D), move a card on the first pile from the top to the bottom. When the break (at N-E-R as I arbitrarily chose) arrives in the word, switch to the other pile and complete the word there. Do this for the next work and so on. After each word ends, place the top card of each pile in a separate place with each other. These two will be perfectly the cut halves, as in the video in the previous post. Then go on to the next word.

Why does it work?

The word “GARDNER” has 7 letters. That is 3 \: mod \: 4. So what?

Well, if the initial set of cards looks like

0 \: \: \: \: \: \: \: N-1 \\  1 \: \: \: \: \: \: \: N-2 \\  2 \: \: \: \: \: \: \: N-3 \\  . \\  . \\  . \\  N-1 \: \: \: \: \: \: \: 0

Then if we sequentially move m cards to the bottom on the left pile, we have

(m) mod \: N \: \: \: \: \: \: \: \: \: \: \: N-1 \\  (m+1) mod \: N \: \: \: \: \: \: \: N-2 \\  (m+2) mod \: N \: \: \: \: \: \: \: N-3 \\  . \\  . \\  . \\  (m+N-1) mod \: N \: \: \: \: \: \: \: 0 \\

Why the mod \: N? That’s to handle the case where we move more than the full stack of cards (i.e., m >N).

Next, we move k cards to the bottom of the second pile, resulting in

(m) mod N \: \: \: \: \: \: \: \: \: \: \: (N-(k+1)) mod \: N \\  (m+1) mod N \: \: \: \: \: \: \: (N-(k+2)) mod \: N \\  (m+2) mod N \: \: \: \: \: \: \: (N-(k+3)) mod \: N \\  . \\  . \\  . \\  (m+N-1) mod \: N \: \: \: \: \: \: \: (N-k) mod \: N \\

For the top cards to match, we must have m \: mod \: N= (N-k-1) mod N

Now, if we chose k = N-m-1, then the two sides of the equation are equal. In that case, we just need that the total number of cards moved (in both piles) needs to be m + (N-m-1) = N-1, but modulo N. Note that the number of letters in GARDNER is 7 and note that 7 = 3 \: mod \: 4.

The same argument works for the next two words. Since we removed one card, we have BHARGAVA, which has 8 letters, which is 8 = 2 \: mod \: 3. And finally, MAGIC has 5 letters, which is 1 \: mod \: 2. And the cards magically line up.

Next up – a magical number.

Gauge invariance, Global and Local Symmetry

This post, aimed at people with some knowledge of Maxwell’s equations, is aimed at connecting a bunch of concepts that are all central to how we understand the universe today. Nearly every word in the title has the status of being a buzz-word, but for good reason – they help organize the ideas well. Some of these ideas are being challenged, but nothing concrete has emerged as a convincing alternative.

When you study Maxwell’s equations in a sophomore course in college, you are presented with something like

\vec \nabla . E = \frac{\rho}{\epsilon_0} \: \: , \: \:  \vec \nabla \times \vec E = -  \frac{\partial \vec B}{\partial t} \: \: , \: \:  \vec \nabla . \vec B = 0\: \: , \: \:  c^2 \vec \nabla \times \vec B = \frac{\vec j}{\epsilon_0} + \frac{\partial \vec E}{\partial t}

You are then told that these equations can be much simplified by introducing something called vector and scalar potentials ( \vec A, \phi), such that \vec B = \vec \nabla \times \vec A and \vec E = - \vec \nabla \phi - \frac{1}{c} \frac{\partial \vec A}{\partial t}.

Then we discover this peculiar property (I say this the way I first heard of it): you can amend these potentials in the following way

\vec A \rightarrow \vec A - \vec \nabla \chi \: \: , \: \: \phi \rightarrow \phi + \frac{1}{c} \frac{\partial \chi}{\partial t} for {\bf any} field \chi and the physically measurable electric (\vec E) and magnetic fields (\vec B ) are {\bf completely} unchanged., so are Maxwell’s equations.

I presume when Maxwell discovered this property (invariance of the electric and magnetic fields upon a “gauge transformation”), which was later named “gauge invariance”, it seemed like a curiosity. The vector potential seems to have a redundancy – it is only relevant up to the addition of \vec \nabla \chi for any \chi (and in conjunction with the scalar potential). But the physical properties of matter and charge only care about the electric and magnetic fields, which are insensitive to the above freedom. If you solve for the vector and scalar potential that applies to a particular problem, you can compute the electric/magnetic fields and go on without thinking about the potentials again.

Quantum mechanics was the first blip on the sedate view above. It turns out that a charged particle notices the vector potential, in a typically beautiful way (in Nature) that both preserves the above (gauge) redundancy, but is still noticeable in its physical effects. We could have a magnetic field far far away, but the vector potential that produces that magnetic field could exist here, close to an electron and there would be measurable effects due to that far away magnetic field. It is called the Aharonov-Bohm effect, but it is a standard feature of quantum mechanics. The vector potential seems to have physical significance, but there is the curious question of what gauge invariance actually means. And the correct quantum mechanical version of Schrodinger’s equation in the presence of an electromagnetic field is not

\frac{1}{2m} (-i \hbar \frac{\partial }{\partial x})(-i \hbar \frac{\partial }{\partial x}) \psi + V(x) \psi = E \psi,


\frac{1}{2m} (-i \hbar \frac{\partial }{\partial x} - q A_x)(-i \hbar \frac{\partial }{\partial x} - q A_x) \psi + V(x) \psi = E \psi,

This is all we have to go on.

If we require this “gauge-invariance” to be a generic property of electromagnetism, it implies that the energy must be a simple function of the physically relevant \vec E‘s and \vec B‘s, not some term involving \vec A‘s that might not be gauge-invariant. No term like \vec A. \vec A can appear in the energy, because such a term would not be the same after you perform a gauge transformation on  \vec A, i.e., add a \vec \nabla \chi term as in the above. So what? This will become clear in the next paragraphs.

Let’s leave this aside for a bit.

The next big development in the theory of fundamental physics was quantum field theory. This theory has many things going for it, but the initial motivation was the discovery that particles can be created and destroyed, seemingly out of pure energy (or photons) and what’s more, every particle of a particular type is precisely identical to any other particle of the same type. All electrons look/feel the same and behave in exactly the same way. Whilst we have no idea why this is the way things are, we can model this behavior very nicely. The theory of quantum mechanics started out by considering a simple harmonic oscillator. The energy of the harmonic oscillator is E = \frac{p^2}{2m} + \frac{m \omega^2 x^2}{2}, where  \omega= the frequency of the oscillator, m= the mass of the thing that’s oscillating while x, p are the position and momentum of the thing that is oscillating.

It turns out that one can think of states of higher and higher energy of the harmonic oscillator as having more and more “quanta” of energy, since energy seems to be absorbed in packets (this is what was discovered in the quantum theory). These quanta appear identical to each other. If you have these harmonic oscillators at every point in space, then we have a “field” of oscillators. Then, with some simple construction, the “quanta” that are constructed from these several harmonic oscillators can be given a charge and a mass. The energy of this field can be written, for a field in only one dimension (the x-dimension) as E = \frac{1}{2}(\frac{\partial \phi}{\partial x})^2 + \frac{1}{2} m^2 \phi^2 :  there is a “strain energy” that’s the first term and a “mass” energy, which is related to the second term. We can treat the universe as being composed of these “fields”, though the universe seems to have scalar, vector, spinor and tensor fields too. They can each be mapped onto all known particles, so we can invent a quantum field for every single type of fundamental particle. Voila, all these particles are identical to each other and have exactly the same properties  – they are just “quanta” of these fields. Studying the properties of these particles is, basically, the business of quantum field theory.

If we had a field that wasn’t a scalar \phi, but a vector \vec A, the corresponding mass term would be a coefficient times \vec A. \vec A. But we already rejected such a term as it wouldn’t preserve gauge invariance, described in the above. You might then persist, to ask why we should try to preserve gauge invariance. Let me come back to this point a little further on. Right now, let me just say that the Aharanov-Bohm effect already tells us that Nature appears to care about this fig-leaf. In particular, it seemed for a long time that it was not possible to write a gauge-invariant theory of a massive (i.e,. not massless) particle described by a vector field.

The simplest kind of field is a scalar field, something akin to the temperature inside every point in a room. There is one number attached to every point in space.

Let’s simplify this to a simple collection of balls and springs. We could do this in any number of dimensions, but to start, let’s do this in one-dimension.


Imagine little springs connecting the balls on the lattice. At equilibrium, the balls like to be equidistant from each other, separated by the lattice constant “a”. The “field” \phi at every point is the displacement of the ball from the equilibrium point. The equilibrium value of this “field” is 0 at every point.

We can express this by saying that each ball is sitting in a potential energy well along the transverse directions (perpendicular to the one-dimensional lattice) that looks like this


which has the result that the ball doesn’t like to roll away from the lattice point it is supposed to be in.

Now, suppose there is an additional wrinkle in this lattice. Let’s assume each ball has a small charge. That doesn’t change the equilibrium situation, since like charges repel and the balls would still keep a distance “a” apart (don’t forget there are still the springs holding the balls together) .

Suppose, now, that there is a small fuzzy charge donut around each ball, with radius \delta, and assume the charge in the fuzzy donut is small and opposite to that of the ball. In that case, the ball is attracted away from its equilibrium position and would rather sit at the point \delta away from its usual position on the lattice.

The potential energy of each ball went from being zero, centered at the lattice point, to being positive (at the lattice site) with minima a little radius \delta around the lattice point. This may be called a “broken” symmetry phase, the symmetry of the smooth parabola is replaced with this “Mexican hat”.

The position of the ball can be specified by a complex number now, it is |\phi|, the along-the-chain displacement of the ball times e^{i \theta}, where \theta is the angular position on the donut shaped equilibrium surface. Why complex? Its just a simple representation of the position of the ball with this particular geometry.

PE-Broken Symmetry

Another depiction of the potential is here

Mexican hat

and the chain is

PE-chain with broken symmetry

Its not the particular mathematical function that is relevant here, just the idea that this sort of shape-shifting can happen due to natural evolution of parameters in the potential energy function, as we change external conditions.

But let’s assume the springs are pretty taut and hard to pull apart – when one ball decides to move to this new minimum, the others will all follow. This is a global shift of the entire system to the new potential energy well.  But where in this new well? There is a minimum all the way around and chain of balls could line up, ramrod straight, with all the balls on the same corresponding spot on the Mexican hat potential, at any point on the channel around the central hat.

The energy function of the scalar field in this case is E = \frac{1}{2} \frac{\partial \phi^{*}}{\partial x}  \frac{\partial \phi}{\partial x} {\bf -}  \frac{1}{2} \mu^2 \phi^{*} \phi + \frac{\lambda}{4} (\phi^{*} \phi)^2 or some similar function. Remember that \phi is complex and we have just written this in a form that gives a “real” number for the energy. In the particular case of this function, the new equilibrium would be |\phi|=\sqrt{\frac{\mu^2}{\lambda}}; remember |\phi| represents how far the ball is from the lattice point it started out at, so to choose a concrete spot, x=0, y= \sqrt{\frac{\mu^2}{\lambda}}, z=0. And the “global” symmetry here is that we could change the angular position \theta of {\bf all} the balls and it wouldn’t affect the total energy of the system.

Now, for a more daring and remarkable idea, which undoubtedly only came about because Chen Ning Yang and Robert Mills were playing around with the idea of gauge invariance before they came to this realization.

Suppose we let each ball be in any spot it wants to be in that channel (donut shaped minimum) of the Mexican hat potential, but we require that the energy of the total system be the same. In that case, remember there are springs between the balls! They don’t want the balls to get far apart – they need to be allowed to relax a bit.

Concretely, we are saying, let \theta be a function of x. If so, we have a problem. The terms proportional to \phi^{*} \phi are unaffected, since they remain |\phi|^2 e^{i \theta} e^{-i \theta} = |\phi|^2. So are the terms proportional to (\phi^{*} \phi)^2.  Not so for the “strain” energy terms – the springs doth protest!

\frac{1}{2} \frac{\partial \phi^{*}}{\partial x}  \frac{\partial \phi}{\partial x}

which was \frac{1}{2} \frac{\partial |\phi|}{\partial x}  \frac{\partial |\phi|}{\partial x} when \theta are independent of x becomes, instead,

\frac{1}{2} (\frac{\partial |\phi|}{\partial x}- i |\phi| \frac{\partial \theta}{\partial x})  (\frac{\partial |\phi|}{\partial x}+ i |\phi| \frac{\partial \theta}{\partial x})

This seems like a disaster for this idea – except that the idea that rescues it is to say, there is a field (let’s for reason of lack of imagination, call it A) that has does two things

  • enters into the energy expression through the derivative term, i.e., \frac{\partial \phi}{\partial x} - i q A \phi.
  • has a property that the field A has a peculiar kind of freedom – we can change A to A + \frac{\partial \chi}{\partial x}.

But this is exactly the way the vector potential appears in the energy function and Schrodinger’s equation of a charged particle in an electromagnetic field, as was noted a few paragraphs above! With such a field, we can make a “gauge transformation” by \frac{\partial \theta}{\partial x} and “remove” the effect of the spatially varying \theta term. The “gauge field” A allows you to turn the global symmetry into a local symmetry.

That was the connection that established that turning a global symmetry into a local symmetry (in a quantum field \phi in this case) establishes that a gauge-transforming field A must exist, must be coupled to the quantum field and must transform  following the mysterious “gauge transformation” formula to be consistent. And that is the model that all fundamental theories in particle physics have followed ever since.

In addition, if you expand out the energy function for such a coupled field,

E = \frac{1}{2} (\frac{\partial \phi^{*}}{\partial x}- i  A \phi^{*})  (\frac{\partial \phi}{\partial x}+ i A \phi) + \frac{\mu^2}{2} \phi^{*} \phi + \frac{\lambda}{4} (\phi^{*} \phi)^2

we get a very peculiar term, proportional to A^2. The coefficient of this term is \phi^{*} \phi. But this is exactly what a mass term is supposed to look like, for the “gauge field” A and in particular, since \phi is stuck at a non-zero absolute magnitude (with the peculiar potential we drew above), we can get a mass-term in the energy function, but in a gauge-invariant fashion! This is called the Englert-Brout-Higgs-Guralnik-Hagen-Kibble-Anderson-Nambu-Polyakov mechanism, but only Higgs and Englert were recognized for it with a Nobel Prize in 2013.

And why does a theory of a vector field like A need gauge invariance as a condition? This is a little harder work to understand, but a hint is that the theory is over specified right at the start. A vector field in space-time has four components at every point in space, while the spin-1 particle it describes only has three independent components. If something like gauge-invariance didn’t exist, we’d have to invent it, to reduce the excessive freedom in a theory of a vector field.

The next question (we never run out of questions!) one might have is why the potential takes the form we drew, with the Mexican hat shape. This is a profound realization – the concrete proof is that this has been seen in actual experimental studies of phase transitions. In addition, there is a connection to another set of realizations in physics – that of renormalization. That is such an interesting topic that it deserves a future post of its own.

p-‘s and q-‘s redux

Continuing our saga, trying to be intellectually honest, while a little prurient (Look It Up!, to adopt a recent political slogan), let’s look at the ridiculous “measured” correlation in point 3 of this public post. Let’s call it the PPP-GDP correlation! The scatter graph with data is displayed below

PPP GDP graph

Does it make sense? As in all questions about statistics, a lot of the seminal work traces back to Fisher and Karl Pearson. The data in the graph can be (painfully) transcribed and the correlation computed in an  Excel spreadsheet. The result is a negative correlation – around -34% – higher GDP implies lower PPP. Sorry, men in rich countries. You don’t measure up!

Some definitions first. If you take two normally distributed random variables X and Y with means \mu_X, \mu_Y and standard deviations \sigma_X, \sigma_Y, and you collect N samples of pairs (x_i, y_i),  then the Pearson coefficient

r = \frac{\sum_{i=1}^N(x_i - \mu_X^S)(y_i - \mu_Y^S)}{ \sqrt{\sum_{i=1}^N (x_i-\mu_X^S)^2} \sqrt{ \sum_{j=1}^N (y_j-\mu_Y^S)^2} }

measures the correlation between the pairs of variables measured by considering this sample. If the sample were infinitely large, you would expect to recover the “actual” correlation \rho between the two variables. They could then be assumed to be distributed through a joint (technical term “bivariate”) distribution for two random variables.

In general, this quantity r has a rather complicated distribution. However, Fisher discovered that a certain variable (called the “Fisher” transform of r) is approximately normally distributed.

F(r) = \frac{1}{2} \ln \frac{1+r}{1-r} = arctanh(r)

with mean F(\rho)=arctanh(\rho) and standard deviation \frac{1}{\sqrt{N-3}}.

Once we know something is (at least approximately) normally distributed, we can throw all sorts of simple analytic machinery at it. For instance, we know that

  • The probability that the variable is within 1.96 standard deviations of the mean (on either side) is 95%
  • The probability that the variable is within 3 standard deviations of the mean (on either side) is 99.73%
  • The probability that the variable is within 5 standard deviations of the mean (on either side) is 99.99994%

The z-score is the number of standard deviations a sample is away from the mean of the distribution.

So, if we were sensible, we would start with a null hypothesis – there is no correlation between PPP and GDP.

If so, the expected correlation \rho between these sets of variables, is 0.

The Fisher transform of \rho=0 is F(0)=arctanh(0)=0.

The standard deviation of the Fisher transform is \frac{1}{\sqrt{N-3}}. In the graph above, N=75, so the standard deviation is 0.11785.

If you measured a correlation of -0.34 from the data, that corresponds to a Fisher transform of -0.354, which is a little more than 3 standard deviations away from the “expected” mean of 0!

If you went to the average Central banker with a 3 standard deviation calculation of what could go wrong (at least before the 2008 Financial crisis), he or she would have beamed at you in fond appreciation. Now, of course, we realize (as particle physicists have) that the world needs to be held to a much higher standard. In fact, if you hold to a 5 standard deviation bound, the correlation could be between -52.9% and +52.9%.

So, if you fondly held the belief expressed in graph 3 of the post alluded to above, you might need to think again.

If you had been following the saga of the 750 GeV “bump” discovered in the data from the Large Hadron Collider a few years ago, that was roughly at the 3.5 standard deviation level. If you held off from publishing a theory early describing exactly which particle had been observed, you would have been smart. The data, at that level, was as believable as the PPP vs GDP data above. The puzzling thing, in my opinion, is why the “independent” detectors ATLAS and CMS saw the same sort of bump in the same places. Speaks to a level of cross-talk which is not supposed to happen!

The above calculation leads to a very simple extension of the p-value concept to correlation. It’s just the probability of seeing correlations more extreme than the one observed, given the null hypothesis. The choice of the null hypothesis doesn’t necessarily have be a correlation of 0. It might be reasonable to expect, for instance in the case of the correlation between the Japanese equity index and the exchange rate between the Yen and the dollar, that there is some stable (non-zero) correlation over at least one business cycle.

Featured image courtesy Maximilian Reininghaus

Buzzfeed post by Ky Harlin (Director of Data Science, Buzz Feed)

I haven’t bothered, in this analysis, to consider how the data was collected and whether it is even believable. We probably have a lot of faith in how the GDP increase data was computed, though methods have obviously changed in the last sixty years. However, did they use cadavers, to measure “lengths”?  Did they wander around poor villages with tape measures? How many of the data collectors survived the experience? This graph has all the signs of being an undergraduate alcohol-fueled experiment with all the attendant bias risks.

If you liked this post, you might like this website.

Minding your p-‘s and q-‘s

In the practice of statistical inference, the concept of p-value (as well as something that needs to exist, but doesn’t yet, called q-value), is very useful. So is a really important concept you need to understand if you want to fool people (or prevent yourself from being fooled!) – it’s called p-hacking.

The first (p-value) concerns the following kind of question (I have borrowed this example from a public lecture at the Math Museum by Jen Rogers in September 2018) – suppose I have a deadly disease where it is known that, if you perform no treatment of any kind, 40% of the people that contract it die, while the others survive, i.e., the probability of dying is 40 \%. On the other hand, a medical salesperson shows up at your doorstep and informs you that about the new miracle cure “XYZ”. They (the manufacturer) gave the drug to 10 people (that had the disease) and 7 of them survived (probability of dying with the new medical protocol appears to be 30 \%). Would you be impressed? What if she told you that they gave the drug to 1000 people and 700 of them survived? Clearly, the second seems more plausibly to have some real effect. How do we make this quantitative?

The second (I call this a q-value) concerns a sort of problem that crops up in finance. There are many retail investors that don’t have the patience to follow the market or follow the rise and fall of companies that issue stocks and bonds. They get ready-made solutions from their favorite investment bank – these are called structured notes. Structured notes can be “structured” any which way you want.

Consider one such example. Say you buy a 7-year US-dollar note exposed to the Nikkei-225 Japanese 225-stock index. The N225 index is the Japanese equivalent of the S&P500 index in the US Usually, you pay in $100 for the note, the bank unburdens you of $5 to feed the salesman and other intermediaries, then invests $70 in a “zero-coupon” US Treasury bond that will expire in 7 years. The Treasury bond is an IOU issued by the US Treasury – you give them $70 now (at the now prevailing interest rates) and they will return $100 in 7 years.

As far as we know right now, the US Treasury is a rock-solid investment, they are not expected to default, ever. Of course, governing philosophies change and someone might look at this article in a hundred years and wonder what I was thinking!

The bank then uses the remaining $25 to invest in a 7-year option that pays off (some percentage P) of the relative increase (written as P \times \frac{ \yen N225_{final}-\yen N225_{initial}}{\yen N225_{initial}}) in the Nikkei-225 index. This variety of payoff, that became popular in the early 1990s, was called a “quanto” option – note that \yen N225 is the Nikkei index in its native currency, so it is around 22,500 right now.

For a regular payoff (non-quanto), you would receive, not the expression above, but something similar converted into US dollars. This would make sense, since it would be natural (for an option buyer) to  convert the $25 into Japanese yen, buy some units of the Nikkei index, keeping only the increase (not losing money if it falls below the initial level), then converting the profits back to US dollars after 7 years. If we wrote this as an “non-quanto” option payoff, it would be P \times \frac{\$ N225_{final}-\$ N225_{initial}}{\$ N225_{initial}}, where \$ N225 is the Nikkei-225 index expressed in US dollars. If the \yen N225 index were 22,500, then the \$ N225 index is currently \frac{\yen N225}{Yen/Dollar} = \frac{22,500}{112} \approx 201. You would convert the index to US dollars after 7 years at the “then” Yen-dollar rate, to compute the “final” \$ N225 index value, which you would plug into the formula.

If  you buy a “quanto” option, you bear no exposure to the vagaries of the FX rate between the US dollar and the Japanese yen, so it is easy to explain and sell to investors. Just look at the first payoff formula above.  The second payoff formula, though natural, is a more complex formula.

However, as you should know, in finance, if there is a risk in the activity that you do, but you find that you don’t bear this risk in the instrument you have bought, it is because someone else has (presumably without your knowledge) bought this risk from you and has paid (much) less than what it is worth, through the assumptions used in pricing the instrument you just bought.

It turns out that option pricing formula invented by Fischer Black, Myron Scholes and Robert Merton can be expanded to value these sorts of “quanto” options. The formula depends on some extra parameters. One of these is the volatility (standard deviation per year) of the Yen-dollar exchange rate. The other is the correlation between two quantities – the \# Yen / Dollar and \# Yen / N225 \: index. That graph might look like this (not real data, but a common observation for these correlations).

Correlation JPYUSD vs JPYNikkei

You are asked to buy this correlation, in competition with others. How much would you pay? If you were in an uncompetitive environment, you might “buy” this correlation  at -100 \%. If you heard that someone paid  -30 \%, would you think it makes sense?

How seriously should one take this correlation? Consider the cases considered in this fantastic post. A correlation between Manoj “Night” Shyamalan’s movies and newspaper reading? Really? What correlations are sensible and what should we pay less heed to?

The idea of p-values answer the first question. The way to think about the miracle drug is this – suppose you did nothing and you assume (from your prior experience) that the results of doing nothing are – the probability of a patient dying of the deadly disease is p = 0.4, i.e.,  the probability of survival is 1- p  = 0.6. Then, if you assume that the patients live or die independent of each other, what is the probability that out of a pool of 10 patients, exactly 7, 8, 9 or 10 people would survive. Well, that would be (it’s called the p-value)

{10 \choose 7} (0.6)^7 (0.4)^3 + {10 \choose 8} (0.6)^8 (0.4)^2 +{10 \choose 9} (0.6)^9 (0.4)^1 +{10 \choose 10} (0.6)^{10} (0.4)^0 = 0.38

You might choose to add up the probability that you might get a result of 5 survivals and lower too (in case you are interested in a deviation of 1 or more from the average, rather than just a higher number).

{10 \choose 5} (0.6)^5 (0.4)^5 + {10 \choose 4} (0.6)^4 (0.4)^6 +{10 \choose 3} (0.6)^3 (0.4)^7 +{10 \choose 2} (0.6)^2 (0.4)^8 +{10 \choose 1} (0.6)^1 (0.4)^9 +{10 \choose 0} (0.6)^0 (0.4)^{10} = 0.37

The sum of these two (called the symmetrical p-value) is 0.75, i.e., there is 75% probability that such (and even more hopeful) results are explainable by the “null hypothesis”, that the miracle drug had absolutely no effect and that the disease simply took its usual course.

If we repeated the same test with a 1000 patients, of whom 700 survived, this has a dramatically different result. The same calculations would yield

{1000 \choose 700} (0.6)^{700} (0.4)^{300} + {1000 \choose 701} (0.6)^{701} (0.4)^{399} +. \: .\: .+{1000 \choose 1000} (0.6)^{1000} (0.4)^0 \\  \approx 3 \times 10^{-11}

Notice how small this number is. If you also add the probability of repeating the experiment and getting 500 or fewer survivals, that would be \approx 10^{-10}.

The symmetrical p-value in this case is \approx 10^{-10}. Consider how tiny this is compared to the 0.75 number we had before. This is clearly a rather effective drug!

The p-value is just the total probability that the “null hypothesis” generates the observed event or anything even more extreme than observed. Seems reasonable, doesn’t it? If this p-value is less than some lower threshold (say 0.05), you might decide this is acceptable as “evidence”. The \frac{700}{1000} test appears as if it proves that “XYZ” is an excellent “miracle” drug.

Next, we come to the underside of p-values. Its called p-hacking. Here’s a simple way to do it. Consider the test where you obtained a \frac{7}{10} result. Let’s say you decided, post-hoc, that the last person that died, actually had a fatal pre-existing condition that you didn’t detect. No autopsies were performed, so that patient might well have died of the condition. In that case, maybe we should exclude that person from the 10 people who were in the survey? And one other guy that died had a really bad attitude, didn’t cooperate with the nurses, maybe didn’t take his medication regularly! We should exclude him too? So we had 7 successful results out of 8 “real” patients. The p-value has now dropped to 0.106 for the 7 and above case and 0.17 for the 3 and below case, for a total p-value of 0.27. Much better! And we didn’t have to do any work, just some Monday morning quarter-backing. Wait, maybe that is exactly what Monday morning quarter-backing is.

Another example of p-hacking is one that I gave in this post. For convenience, I reproduce it here –

Imagine you were walking around in Manhattan and you chanced upon an interesting game going on at the side of the road. By the way, when you see these games going on, a safe strategy is to walk on, since they usually reduce to methods of separating a lot of money from you in various ways.

The protagonist, sitting at the table tells you (and you are able to confirm this by a video taken by a nearby security camera run by a disinterested police officer), that he has managed to toss the same quarter (an American coin) thirty times and managed to get “Heads” {\bf ALL} of those times. And it was a fair coin!

Next, your good friend rushes to your side and whispers to you that this guy is actually one of a really \: large number of people (a little more than a billion) that were asked to successively toss freshly minted, scrupulously clean and fair quarters. People that tossed tails were “tossed” out at each successive toss and only those that tossed heads were allowed to toss again. This guy (and one more like him) were the only ones that remained.

What if the number of coin tosses was 100 rather than 30, with a larger number of initial subjects?

Clearly, you would be p-hacked if you ignored your friend.

p-values are used throughout science, but it is desperately easy to p-hack. It still takes a lot of intellectual honesty and, yes, seat of the pants reasoning and experience to know when you are p-hacking and when you are simply being rational in ignoring certain classes of data.

The q-value is a quantity that describes when a correlation is outside the bonds of the “null hypothesis” – for instance, one might have an economic reason why the fx/equity index correlation is a certain number. Maybe it is linked to the size of trade in/out-flows, tariff structure, the growth in the economy and other aspects. But then, it moves around a lot and clearly follows some kind of random process – just not the one described by the binomial model  It would clarify a lot of the nonsense that goes in to price and estimate economic value in products such as quanto options.

More on this in a future post.

Front image : courtesy Hilda Bastian, from this article

Math, Rhythmic patterns & A Card Trick

Another Wednesday, another session of Manjul Bhargava’s entertaining and instructive class at the National Museum of Mathematics, in New York City.

This time, the topic was that of rhythmic combinations and their connection to mathematics. As the sentence itself suggests, combinations of rhythms lead to combinatorial arithmetic – the notions of Fibonacci numbers and Pascal’s triangle immediately suggest themselves. What follows is a précis of his class, with some additions of my own concerning patterns in South Indian classical music (also known as CarnAtic music). In this paragraph, as well as all that follows, I will be using capitalized vowels to indicate a stretched vowel sound. So “All” rather than “all” vowels aren’t equivalent! Additionally, all the material in this talk is well-known to scholars of Sanskrit. One example of a study that describes this is an article from 1985 by Paramanand Singh.

Acharya VIrahankA, a poet and mathematician, considered the number of ways to form an N-syllable line of poetry from a combination of 1- and 2-syllable words. While this is a run of the mill question for all poets, it is also of importance in the improvisational aspect of Indian classical music. In Indian music, the music follows a pattern of beats called a “tAlam”. There is the simple 8-beat “Adi” tAlam, which repeats after 8 beats. There is the 3-beat “rUpakam” tAlam, which repeats after 3. There are more complex tAlams, with more beats. In order to figure out where you are in the tAlam, performers use a combination of slaps of the palm, finger counting as well as waves of the hand to count the number of beats of the tAlam. TAlams are classified with the concept of jAtI. So the simplest Adi tAlam, with 8 beats is actually called chatushra-jati-triputa tAlam, for it has one slap, three finger counts, then two slap-wave combinations. There is also a khanda-jAtI-triputa tAlam, which has one slap, four finger counts, then two slap-wave combinations. As you can see, this is a pretty organized system with practically infinite number of tAlams, though after the first few, one is basically showing off one’s muscle memory and coordination.A demonstration is here

As far as improvisation goes, you have to start at the first beat (in the middle of a poetry composition also set to the same rAgA) and end your improvisation at the last beat – of course, you could try variations where you interrupt the poem at the middle of the beat cycle and then end at the last beat of the cycle, many cycles later. Clearly,  you might improvise with notes of length 1-beat, or 2- or even \frac{1}{3} of a beat, since you can play or sing faster than the speed of the beats, so the idea of fitting notes of varying length into a cycle of beats is an ongoing challenge in improvisation. You have to keep aware of the rhythmic cycle while making the improvisatory notes sound, well, musical! More about this after the next section on the mathematical concepts here.

To put the problem that VIrahankA (and later HemachandrA) considered into a visually appealing format, the number of ways to arrange 1 and 2 length tiles to make a tile-chain of length N can be written down as follows.

Tile-chain of length 1   :  One (1) way. We write this as 1.

Tile-chain of length 2   :   Two(2) ways (One with two 1-length tiles, another one with a single 2-length tile), We write this as 11, 2.

Tile-chain of length 3   :   Three(3) ways (in our shorthand notation from above, this is 111, 12, 21).

Tile-chain of length 4.  :    Five (5) ways (in our shorthand notation this is 1111, 211, 121,112, 22).

In fact, if you have a Tile-chain of length N, and you asked for all the ways to form such tile-chains (call this quantity V_N), you would reason (as VIrahankA did) as follows. Such a tile-chain would be formed by forming, in all V_{N-1} ways, tile-chains of length N-1 and then append a 1-length tile PLUS by forming, in all V_{N-2} ways, a tile-chain of length N-2, then appending a 2-length tile. Ergo, V_{N} = V_{N-1}+V_{N-2}. This is exactly the constructive method to build a series of numbers 1,1,2,3,5,8,13,21,34…. – the VIrahankA numbers that you probably heard of under a different name.

Next, if you asked a different question, as PingalA did, you start with n syllables, k of length 1 and n-k of length 2, then you can construct lines of different lengths. There are {{n} \choose {k}} ways of organizing these, which is the binomial expansion coefficient. He then considered how to deduce the relationships between these different length poetic sentences and realized that they could be arranged in a triangular form that he (more likely, a commentator named HalAyudha) referred to as a MeruprastArA,


The reason for the organization is as follows – the top row (1) represents the number of ways to organize 0 1-length and 0 2-length syllables. That’s just 1 way of doing nothing. The number on the extreme right is just the sum of the elements in that row.

The second row represents the number of ways to organize 1 1-length or 1 2-length syllables. There is one way to do so with one 1-length syllable, as well as one way to do so with one 2-length syllable.  The sum of these is 2, which is on the extreme right.

The third row represents the number of ways to organize 2 1-length syllables, or 1 1-length and 1 2-length syllables, or 2 2-length syllables. The sum of the number of ways is on the extreme right. It is, again, a power of 2.

And so on.

The connection to the series of VIrahankA is easy to see. If you consider the ways to count the number of ways to construct syllables of length 1, 2, 3… from the above, they represent sums of certain terms in the MeruprastArA above. That is depicted below,


They are the sequences of different ways to organize 1- and 2- length syllables to yield syllable-chains of length 1, 2, 3…. The sums of the different number ways to do this is exactly VIrahankA’s numbers, as you can see above along the lines.

One can repeat this exercise with 1-, 2-, 3- length syllables. As Manjul jokingly suggests, we get the “TriVIrahankA” numbers as well as Pingala’s 3-D MeruprastArA.

This pattern of mixing syllables in systematically developed in the style of music sung and practiced in South India, called Carnatic music. There are many kinds of  improvisation in this style, including one where solfa syllables are sung, and it sounds like scat singing. The organization is as follows. One first sets the speed of the basic beat – this speed is called “KAlam”. People choose what they consider their “slow” speed, this is called the 1st kAlam. There is a 2nd kAlam, which is double this speed, while 4 times this basic speed is called 3rd kAlam (this is a logarithmic scale). Each cycle of beats, which is called a tAlam, runs at the speed of the kAlam.

The next concept is that of “GatI” (Sanskrit for “speed”). This is the number of notes that fit into each beat. This is also called “Nadai” in the Tamil language. Hence in Tisra GatI, one would sing or play 3 notes per beat, in the 1st kAlam. However in 2nd kAlam, this would be 6 notes per beat and so on. One can vary the GatI in the basic speed of the beat, leading to rhythmic variations.

In addition, due to the influence of percussion performers, with their interest in pure rhythm, some non-standard gatIs have become popular – these are, for instance, chatushra-tisra GatI, which is a \frac{4}{3} speed – the notes are held for longer time, so in 3rd kAlam, where one sings 4 notes per beat, one only sings 1 note per \frac{3}{4} beat.

It is almost magical to hear the results of improvisation with all the myriad ways to intersperse notes into beats, as in the following example  and here by some performers.

Continuing from this digression, the next item Manjul discussed were some methods to communicate the meter of a poem to a reader, in a manner that is incorruptible by the ravages of translators, copiers or other recording devices. These concepts are similar to error-detection methods used in modern-day coding.

While the method he discusses (which I will detail shortly) is interesting for short phrases and poems, there is an elephant in the room for this one. One of the largest texts in the world is the Vedas, which were composed before 1500 B.C. They were transmitted orally and of course one has the problem of how to make sure that there isn’t a Chinese whispers problem. The method is meticulous and involved, involves singing the poetry literally backwards and forwards in such a rigid manner that multiple syllable errors can be caught – however the result is still not full of nonsense words, but is also meaningful. There are many discussions of the technique used and I would be foolish to repeat it, look here. Similar techniques were also used by Buddhist and Jain scholars to transmit their texts, though less often.

The poetic technique Manjul discussed is one of those used for shorter poems. It is based on the length of the syllables in the nonsense word “ya mA tA rA ja bhA ga sa la gA”. If you count the length of each syllable in this word, use 0 to represent 1-length syllables (small “a”) and 1 to represent 2-length syllables (capital “A”), it is 0111010001. If you look at all three digit combinations, they are, in sequence,

011  which is decimal number 3

111 which is decimal number 7

110 which is decimal number 6

101 which is decimal number 5

010 which is decimal number 2

100 which is decimal number 4

000 which is decimal number 0

001 which is decimal number 1

Notice that all the decimal numbers from 0-7 make an appearance. This nonsense word originates from PingalA (of MeruprastArA fame!) and is a way to communicate the precise pronunciation of the words in a simple code, which would be shorter than the phrase you were trying to exactly represent.

As it turns out there are two ways to construct a nonsense phrase with all the numbers 0-7 represented just once, they can be computed based on the exhaustive tree search depicted below. They are 01253764, as well as 01376524. If the second one is cyclically permuted to 37652401, we get 0111010001, while the first one is 25376401, which codes to 0101110001. If the second one can be written as {\bf ya mA tA rA ja bhA ga sa la gA}, the first one is {\bf ya mA ta rA jA bhA ga sa la ga}. The neat reason why someone would pick the second variant is that the last two syllables of the word are la and gA, which are also the starting syllables of the Sanskrit words laghu (for “short”) and guru (for “long”). The syllables are self-referential in this respect in the word.

Now, the Sanskrit poets used this technique to make sure future generations would never forget the meter and how to shorten or lengthen syllables properly. Suppose they wanted to coommunicate that the cadence was 001 001 1010 000  – to be really clear. listen to the audio clip

you could break it up into threes, then code it gA \: gA \:  rA \: la \: la, where the syllables represent three-digit binary numbers, while the last “la” is a single bit 0. Then you would include the nonsense phrase gAgArAlala in your poem (as a labelled cadence phrase) and be secure that you have communicated the cadence to your reader.

This is the basis of a card trick, which he demonstrated with five cards (a five bit version of the above code), but since he also asked that we don’t publish that version of the code, I will show the three-bit version, with only 10 cards.

Audio courtesy Rajeswari Satish.

Of Baby Hummers and clock arithmetic with Aryabhata and Archimedes

I spent a pleasant evening at the National Museum of Mathematics  this week – the first session of a semester long program of lecture demonstrations about mathematics and magic. The instructor is Manjul Bhargava, the famous Princeton mathematician. I thought the ideas were worth discussing in a more public forum, so I resolved to demonstrate my own version of the cool tricks he showed. All the tricks have some cute mathematical nugget hidden inside and he spent the last ten minutes basically revealing the secret behind the trick.

I have changed the tricks slightly to prevent a complete copy of what he presented, but the essence of the idea is exactly the same.

The first is called a Baby Hummer card trick, a creation of Bob Hummer. I demonstrate it as well as a ten-card Hummer trick in the following videos.

Here’s the third trick (this is a variation of what I saw at the class) : I invoke my favorite mathematicians, Aryabhata and Archimedes and the magic they wove.

Why do these three tricks work?

Let’s look at the first two.

The Hummer “arrangement” and its preservation under the Hummer “maneuver” is key. It is described in the following pictures




Now add one “error” card in the arrangement




In my tricks, I basically put your selected card in as an “error” in the Hummer arrangement.

The Hummer “maneuver” also protects two or more “error” cards, which explains some of the variants of the puzzle you could find on YouTube.

The third trick is easy to think about and uses the arithmetic of a clock – also called modulo arithmetic.  I will leave it at that.

Thanks to my videographer and collaborator Rajeswari Satish. And thanks to the National Museum of Mathematics for organizing interesting events nearly every week.

Gedankenexperiments #1

Albert Einstein is well known to be one of the most creative scientists of the last couple of centuries. He produced fascinating theories that really burnished this reputation. But he also had several ideas (trying to undermine, for instance, ideas about quantum mechanics) that didn’t work – often the exact way in which they did not work led to even more insights about the theory he was trying to undermine.

Much has been written about the famous debates about the fundamental correctness of quantum mechanics and the “reality” of classical methods of describing nature. One was a debate that he carried on with Niels Bohr over several sessions, including during a famous sit-down at the Fifth Solvay Conference of 1927 (the one with the famous photo with some many scientific movers and shakers in the picture above).

The reason I wanted to write about this particular puzzle was that it is described in two different ways that fundamentally contradict each other in two popular physics books – Carlo Rovelli’s “Reality is not what it seems” and Crease/Goldhaber’s “The quantum moment”. I frankly understood neither at the first (or even second) reading. And since I am curious, here goes with my explanation.

Einstein’s thought-experiment (Gedankenexperiment in German) is very simple. He was trying to address the uncertainty relation between \Delta t and \Delta E. Expressed simply, this “energy-time” uncertainty relationship says that you can violate the conservation of energy (by an amount \Delta E) for a short time \Delta t, as long as \Delta E \Delta t \ge \frac{\hbar}{2}, where \hbar is the famous Planck’s constant (divided by the number 2 \pi for that is what appears a lot in physics equations). Said another way, if the universe decides to create a particle-antiparticle pair out of nothing, thus violating the principle of energy-momentum conservation, it can do so – with a caveat. The larger the total energy of the particles created out of nothing, the shorter the time the particles can stick around until they recombine and sink back into nothingness. We appear to see these phenomena indirectly in Nature and the notion of “vacuum fluctuations” is well accepted in the scientific world. As an aside, what seems to be is that we calculate that we should have many more of these fluctuations that actually seem to happen – but more about that in a future post.

I don’t actually like this way of phrasing it, it seems rather mysterious. I find it easier to think in terms of Fourier components. I think of Energy as actually Frequency, using the relation \omega =\frac{E}{\hbar}, that Einstein himself wrote down in his analysis of the photoelectric effect and de Broglie later used in his thesis on wave-particle duality. In that case, if I think of a function of time f(t), I could also compute its Fourier representation, which expresses the same function in terms of its frequency (\omega) components. The above condition is then the statement of a well-known mathematical theorem (it is called the Schwarz inequality) that a function of time that is very short-lived has a large number of frequency components. Conversely, if it is very long-lived in time, it has very few frequency components.

As an example of this, suppose I played a pure note on a violin. Remember, in order for the frequency to be {\bf EXACTLY} a single number, I’d have to play the note for an infinite time (a single frequency sine-wave doesn’t begin or end!). If, on the other hand, I want to describe a quick pluck of a string, I would have to include {\bf ALL} the frequencies the string is capable of producing – that’s why a single plucked string immediately produces all the possible harmonics.

So, if something is short-lived in time, it has a lot of frequency components (huge “spread” in frequency space), while if it is long-lived in time, it has very few frequency components (little “spread” in frequency space).

This energy-time uncertainty relationship has been expressed in other ways. Sometimes, it is expressed as “If I want to measure an energy difference of \Delta E between two states of a system, then whatever experiment I do needs to take a time period \Delta t \ge \frac{\hbar/2}{\Delta E} – I cannot beat this”.

Einstein wanted to show that this version of the energy-time relation was incorrect. In particular, he wanted to show that the connection of Energy to Frequency, otherwise expressed as “Particle-Wave duality” was incorrect. Was he being inconsistent? After all, in his famous work on the photoelectric effect, he had deduced a relationship between Energy and Frequency for photons that was as written above. However, his oft-expressed thought was that this particle interpretation he had supplied for light was simply a consequence of a deficiency of theory. He believed that since he wasn’t able to construct a better theory, he had to invent a “statistical” description of light as made up of photon particles and he had to make up the above relation in such a statistical description. The short answer is that he was wrong. But the experiment he thought up is still rather interesting.

He thinks of a little box that has a shutter controlled by an on-board clock, as in the picture below.

Einstein experiment

The on-board clock is supposed to open, then close the shutter for a short prescribed time \Delta t. Trapped inside the box is one photon – maybe injected much earlier by an extremely weak source of light. The energy of the photon can be arbitrarily set to \Delta E. Note that we can select the numbers \Delta t, \Delta E in any way we choose. In particular, we can arrange things so that \Delta t \Delta E < \frac{\hbar}{2}. And to do the experiment, I simply place this contraption on a balance and wait for the weight on the balance to drop by \frac{\Delta E}{c^2} g, which is the weight of the mass-equivalent of the photon. Once the photon leaks out, I’ll know immediately (though I could wait an infinitely long time to really be sure) and so I have an instance where the change \Delta E was measured in time \Delta t – aha!, the energy-time uncertainty relation has been violated.

Einstein supposedly presented this to Bohr one afternoon and it led to a sleepless night for the poor Dane (I should know, it led me to a few sleepless nights too and I am no Bohr, though it wasn’t at all bo(h)ring!). Clearly, if you believe quantum mechanics, something about the world should prevent you from measuring things so accurately that you know these quantities when the photon has departed its cage! And before I engage in an analysis, let me acknowledge that I benefited from a rather fruitful discussion with Scott Thomas at the physics department at Rutgers University, who might disagree with some conclusions I reached or even my approach. I take the blame for any errors.

The fiendish aspect of this experiment is that \Delta E and \Delta t appear to be set in stone to breach the inequality, so how could anything be amiss? I would like to take a view of this problem that a person in the 1930-50s would, so ignore any of the quantum aspects entirely. I will treat the photon as a particle, albeit one that can travel at the speed of light.

The key is that once the shutter is opened, the photon, treated as a particle, escapes. The shutter is open for a short time \Delta t. If you know that a horse travelling at speed V left the barn when the barn door was opened for \Delta T seconds, you know that the horse could be in a region V \Delta T away. Let’s ignore the possibility that the horse decided to smell the roses and was peacefully grazing outside the door, giving up the chance to escape! Similarly, the uncertainty in the position of the photon is \delta x = c \Delta t where c is the speed of light.

But if the photon is a particle, it is subject to the usual uncertainty principle for particles. In particular, the uncertainty in its momentum is \delta p and indeed, \delta x \delta p \ge \frac{\hbar}{2}. This implies that \delta p \ge \frac{\hbar}{2 c \Delta t}. The energy of a photon is related to its momentum (for it is massless) by the relation E = c p. It follows that the uncertainty in the photon’s energy is \delta E \ge  \frac{\hbar}{2  \Delta t}. Aha!, this implies that \delta E \Delta t \ge \frac{\hbar}{2}.

Why is \delta E/c^2 also the uncertainty in the mass of the box? That’s because where could the extra energy for the photon have come from? It is not coupled to anything else! It could only have come from the box. What this implies is that the position of the photon is tied to the energy of the box, which is cryptically referred to in Carlo Rovelli’s book.

Bohr’s refutation of Einstein’s experiment, according to the Crease/Goldhaber book, boiled down to saying that \Delta t would depend on the gravitational slowing of clocks due to the differing position of the box on the balance in a gravitational field. However (credit to Scott Thomas for this point), you could remove the gravitational field from this problem by simply measuring the mass of the box using a method involving a horizontal spring. The spring’s time constant would have nothing to do with the gravitational field and there would be no time “dilation” in this case. But the time constant (\sqrt{\frac{k}{M}}) would indeed depend on the mass of the box and would serve to measure the mass. So, I am not sure why Einstein accepted Bohr’s explanation, which by the way, I wasn’t able to make sense of either.

To do this calculation in quantum mechanics, you’d have to treat the shutter as a small antenna, which emits electromagnetic fields into space. Then you would have the combined field inside and outside the box  which would fall into some state which would not have definite numbers of photons inside the box. Then you would project out the outgoing states that involve a single photon and look at the spread in their energy. That would also be the spread in the energy of the box itself, since energy-momentum is conserved. But in doing so, you would use the uncertainty principle for the electromagnetic field to derive this result. Very similar to what I have done in the back-of-the-envelope argument sketched above.


“The Quantum Moment” : Robert P. Crease & Alfred S. Goldhaber, W. W. Norton  & Co. page 196-197.

“Reality is not what it seems”: Carlo Rovelli, Penguin Random House.

Acknowledgments: Scott Thomas @ Rutgers