Fermi Gases and Stellar Collapse – Cosmology Post #6

The most refined Standard Candle there is today is a particular kind of Stellar Collapse, called a Type 1a Supernova. To understand this, you will need to read the previous posts (#1-#5), in particular, the Fermi-Dirac statistics argument in Post #5 in the sequence. While this is the most mathematical of the posts, it might be useful to skim over the argument to understand the reason for the amazing regularity in these explosions.

Type 1a supernovas happen to white dwarf stars. A white dwarf is a kind of star that has reached the end of its starry career. It has burnt through its hydrogen fuel, producing all sorts of heavier elements, through to carbon and oxygen. It has also ceased being hot enough to burn Carbon and Oxygen in fusion reactions. Since these two elements burn rather less efficiently than Hydrogen or Helium in fusion reactions, the star is dense (there is less pressure from light being radiated by fusion reactions in the inside to counteract the gravitational pressure of matter, so it compresses itself) and the interior is composed of ionized carbon and oxygen (all the negatively charged electrons are pulled out of every atom, the remaining ions are positively charged and the electrons roam freely in the star). Just as in a crystalline lattice (as in a typical metal), the light electrons are good at keeping the positively charged ions screened from other ions. In addition, they also use them in turn to screen themselves from other electrons; the upshot is, the electrons behave like free particles.

At this point, the star is being pulled in by its own mass and is being held up by the pressure exerted by the gas of free electrons in its midst. The “lattice” of positive ions also exerts pressure, but the pressure is much less, as we will see. The temperature of the surface of the white dwarf is known from observations to be quite high, \sim 10,000-100,000 \: Kelvin. More important, the free electrons in a white dwarf of mass much greater than the Sun’s mass (written as M_{\odot}) are ultra-relativistic, with energy much higher than their individual mass. Remember, too, that electrons are a species of “fermion”, which obey Fermi-Dirac statistics.

The Fermi-Dirac formula is written as

P(\vec k) = 2 \frac {1}{e^{\frac{\hbar c k - \hbar c k_F}{k_B T}}+1}

What does this formula mean? The energy of an ultra-relativistic electron, that has energy far in excess of its mass, is

E = \hbar c k

where k c is the “frequency” corresponding to the electron of momentum c k, while \hbar is the “reduced” Planck’s constant (=\frac {h}{2 \pi}) (here h is the regular Planck’s constant) and c is the speed of light. The quantity k_F is called the Fermi wave-vector.  The function P(\vec k) is the (density of) probability of finding an electron in the momentum state specified by \hbar \vec k . In the study of particles where their wave nature is apparent, it is useful to use the concept of the de Broglie “frequency” (\nu = \frac{E}{h}), the de Broglie “wavelength” (\lambda=\frac {V}{\nu} where V is the particle velocity) and k=\frac{2 \pi}{\lambda}, the “wave-number”  corresponding to the particle. It is customary for lazy people to forget to write c and \hbar in formulas, hence, we speak of momentum k for a hyper-relativistic particle travelling at speed close to the speed of light, when it should really be h \frac {\nu}{c} = h \frac{V}{\lambda c} \approx {h}{\lambda} = \frac {h}{2 \pi} \frac{2 \pi}{\lambda} = {\bf {\hbar k}}.

Why a factor of 2? It wasn’t there in the previous post!

From the previous post, you know that fermions don’t like to be in the same state together. We also know that electrons have a property called spin and they can be spin-up or spin-down. Spin is a property akin to angular momentum, which is a property that we understand classically, for instance, as describing the rotation of a bicycle wheel. You might remember that angular momentum is conserved unless someone applies a torque to the wheel. This is the reason why free-standing gyroscopes can be used for airplane navigation – they “remember” which direction they are pointing in. Similarly, spin is usually conserved, unless you apply a magnetic field to “twist” a spin-up electron into a spin-down configuration. So, you can actually have two kinds of electrons – spin-up and spin-down, in each momentum state \vec k . This is the reason for the factor of 2 in the formula above – there are two “spin” states per \vec k state.

Let’s understand the Fermi wave-vector k_F. Since the fermions need to occupy momentum states two-at-a-time for each, and if they were forced into a cube of side L, you can ask how many levels they occupy. They will, like all sensible particles, start occupying levels starting with the lowest energy level, going up till all the fermions available are exhausted. The fermions are described by waves and, in turn, waves are described by wavelength. You need to classify all the possible ways to fit waves into a cube. Let’s look at a one-dimensional case to start

Fermion Modes 1D

The fermions need to bounce off the ends of the one-dimensional lattice of length L so we need the waves to be pinned to 0 at the ends. If you look at the above pictures, the wavelengths of the waves are 2L (for n=1), L (for n=2), \frac{2 L}{3} (for n=3), \frac {L}{2} (for n=4).  In that case, the wavenumbers, which are basically \frac {2 \pi}{\lambda} for the fermions need to be of the sort \frac {n \pi}{L}, where n is an integer (0, \pm 1, \pm 2 ...).

Fermion Modes 1D Table

For a cube of side L, the corresponding wave-numbers are described by \vec k = (n_x, n_y, n_z) \frac {\pi}{L} since a vector will have three components in three dimensions. These wave-numbers correspond to the momenta of the fermions (this is basically what’s referred to as wave-particle duality), so the momentum is \vec p = \hbar \vec k. The energy of each level is \hbar c k. It is therefore convenient to think of the electrons as filling spots in the space of k_x, k_y, k_z.

What do we have so far? These “free” electrons are going to occupy energy levels starting from the lowest k = (\pm 1, \pm 1, \pm 1) \frac{\pi}{L} and so on in a neat symmetric fashion. In k space, which is “momentum space”, since we have many, many electrons, we could think of them filling up a sphere of radius k_F in momentum space. This radius is called the Fermi wave-vector. It represents the most energetic of the electrons, when they are all arranged in as economically as possible – with the lowest possible energy for the gas of electrons. This would happen at zero temperature (which is the approximation we are going to work with at this time). This comes out from the probability distribution formula (ignore the 2 for this, consider the probability of occupation of levels for a one-dimensional fermion gas). Note that all the electrons are inside the Fermi sphere at low temperature and leak out as the temperature is raised (graphs towards the right).

It is remarkable and you should realize this, that a gas of fermions in its lowest energy configuration has a huge amount of energy. The Pauli principle requires it. If they were bosons, all of them would be sitting at the lowest possible energy level, which couldn’t be zero (because we live in a quantum world) but just above it.

What’s the energy of this gas? Its an exercise in arithmetic for zero temperature. Is that good enough? No, but it gets us pretty close to the correct answer and it is instructive.

The total energy in the gas of electrons is (in a spherical white dwarf of volume V = \frac {4}{3} \pi R^3, with 2 spins per state, is

E_{Total} =  2 V  \int \frac {d^3 \vec k}{(2 \pi)^3} \hbar c k = V \frac {\hbar c k_F^4}{4 \pi^2}

The total number of electrons is obtained by just adding up all the available states in momentum space, up to k_F

N = 2 V \int \frac {d^3 \vec k}{(2 \pi)^3} \rightarrow k_F^3 = 3 \pi^2 \frac {N}{V}

We need to estimate the number of electrons in the white dwarf to start this calculation off. That’s what sets the value of k_F, the radius of the “sphere” in momentum space of filled energy states at zero temperature.

The mass of the star is M. That corresponds to \frac {M} { \mu_e m_p} full atoms, where m_p is the mass of the proton (the dominant part of mass) and \mu_e is the ratio of atomic weight to atomic number for a typical constituent atom in the white dwarf. For a star composed of Carbon and Oxygen, this is 2. So, N = V \frac {M}{\mu_e m_p} = V \frac {M}{2 m_p}.

Using all the above

E_{Total} = \frac {4\pi}{3} R^3  \frac {\hbar c}{4 \pi^2} \left(  3 \pi^2 \frac {M}{\mu_e m_p \frac{4\pi}{3} R^3 }\right)^{4/3}

Next, the white dwarf has some gravitational potential energy just because of its existence. This is calculated in high school classes by integration over successive spherical shells from 0 to the radius R, as shown below

Spherical Shell White Dwarf

The gravitational potential energy is

\int_{0}^{R} (-) G \frac {\frac{4 \pi}{3} \rho_m r^3     4 \pi r^2}{r} dr = - \frac{3}{5} \frac {G M^2}{R}

A strange set of things happen if the energy of the electrons (which is called, by the way, the “degeneracy energy”) plus the gravitational energy goes negative. At that point, the total energy can become even more negative as the white dwarf’s radius gets smaller – this can continue {\it ad \: infinitum} – the star collapses. This starts to  happen when you set the gravitational potential energy equal to the Fermi gas energy, this leads to

 \frac{3}{5} \frac{G M^2}{R} = \frac {4\pi}{3} R^3  \frac {\hbar c}{4 \pi^2} \left(  3 \pi^2 \frac {M}{\mu_e m_p \frac{4\pi}{3} R^3 }\right)^{4/3}

the R (radius) of the star drops out and we are left with a unique mass M where this happens – the calculation above gives an answer of 1.7 M_{\odot}. A more precise calculation at non-zero temperature gives 1.44 M_{\odot}.

The famous physicist S. Chandrasekhar, after whom the Chandra X-ray space observatory is named, discovered this (Chandrasekhar limit) while ruminating about the effect of hyper-relativistic fermions in a white dwarf.


He was on  a cruise from India to Great Britain at the time and had the time for unrestricted rumination of these sorts!

Therefore, as is often the case, if a white dwarf is surrounded by a swirling cloud of gas of various sorts or has a companion star of some sort that it accretes matter from, it will collapse into a denser neutron star in a cataclysmic collapse precisely when this limit is reached. If so, once one understands the type of light emitted from such a supernova from some nearer location, one has a Standard Candle – it is like having a hand grenade that is of {\bf exactly} the same quality at various distances. By looking at how bright the explosion is, you can tell how far away it is.

After this longish post, I will describe the wonderful results from this analysis in the next post – it has changed our views of the Universe and our place in it, in the last several years.

Coincidences and the stealthiness of the Calculus of Probabilities

You know this story (or something similar) from your own life. I was walking from my parked car to the convenience store to purchase a couple of bottles of sparkling water. As I walked there, I noticed a car with the number 1966 – that’s the year I was born! This must be a coincidence – today must be a lucky day!

There are other coincidences, numerical or otherwise. Carl Sagan, in one of his books mentions a person that thought of his mother the very day she passed away in a different city. He (this person) was convinced this was proof of life after/before/during death.

There are others in the Natural World around us (I will be writing about the “Naturalness” idea in the future) – for eclipse aficionados, there is going to be a total solar eclipse over a third of the United States on the 21st of August 2017. It is a coincidence that the moon is exactly the right size to completely cover the sun (precisely – see eclipse photos from NASA below)


Isn’t is peculiar that the moon is exactly the right size? For instance, the moon has other properties – for instance, the face of the moon that we see is always the same face. Mercury does the same thing with the Sun – it also exhibits the same face to the Sun. This is well understood as a tidal effect of a small object in the gravitational field of a large neighbor. There’s an excellent Wikipedia article about this effect and I well explain it further in the future. But there is no simple explanation for why the moon is the right size for total eclipses. It is not believed to be anything but an astonishing coincidence. After all, we have 6000 odd other visible objects in the sky that aren’t exactly eclipsed by any other satellite, so why should this particular pair matter, except that they provide us much-needed heat and light?

The famous physicist Paul Dirac discovered an interesting numerical coincidence based on some other numerology that another scientist called Eddington was obsessed with. It turns out that a number of (somewhat carefully constructed) ratios are of the same order of magnitude – basically remember the number 10^{40}!

  • The ratio of the electrical and gravitational forces between the electron and the proton (\frac{1}{4 \pi \epsilon_0} e^2 vs. G m_p m_e) is approximately 10^{40}
  • The ratio of the size of the universe to the electron’s Compton wavelength, which is the de Broglie wavelength of a photon of the same energy as the electron – 10^{27}m \: vs \: 10^{-12} m \: \approx 10^{39}

On the basis of this astonishing coincidence, Dirac made the startling observation that this could indicate (since the size of the universe is related to it’s age) the value of G would fall with time (why not e going up with time, or something else?). Whereas precision experiments in measuring the value of G are beginning now, there would have been cosmological consequences if G had indeed behaved as 1/t in the past! For this reason, people discount this “theory” these days.

I heard of another coincidence recently – the value of the quantity \frac {c^2}{g}, where c is the speed of light and g is the acceleration due to gravity at the surface of the earth (9.81 \frac{m}{s^2}), is very close to 1 light year. This implies (if you put together the formulas for g, and 1 light year), a relationship between the masses of the earth, the sun, the earth’s radius and the earth-moon distance!

The question with coincidences of any sort is that it is imperative to separate signal from noise. And this is why some simple examples of probability are useful to consider. Let’s understand this.

If you have two specific people in mind, the probability of both having the same birthday is \frac{1}{365} – there are 365 possibilities for the second person’s birthday and only 1 way for it to match the first person’s birthday.

If, however, you have N people and you ask for any match of birthdays, there are \frac {N(N-1)}{2} pairs of people to consider and you have a substantially higher probability of a match. In fact, the easy way to calculate this is to ask for the probability of NO matches – that is \frac {364}{365} \times \frac {363}{365} \times ... \frac {365 - (N-1)}{365}, which is \frac {364}{365} for the first non-match, then \frac {363}{365} from the second non-match for the third person and so on. Then subtracting from 1 gives the probability of at least one match. Among other things, this implies that the chance of at least one match is over 50% (1 in 2) for a bunch of 23 un-connected people (no twins etc). And if you have 60 people, the probability is extremely close to 1.

Probability Graph for Birthdays

The key takeaway from this is that a less probable event has a chance to become much more probable when you have the luxury of adding more possibilities for the event to occur. As an example, if you went around your life declaring in advance that you will worry about coincidences ONLY if you find a number matching the specific birth year for your second cousin – the chances are low that you will observe such a number – unless you happen to hang about near your second cousin’s home! On the other hand, if you are willing to accept most numbers close to your heart – the possibilities stealthily abound and the probability of a match increases! Your birthday, your age, room number or address while in college, your current or previous addresses, the license plates for your cars, your current and previous passport numbers – the possibilities are literally, endless. And this means that the probability of a “coincidence” is that much higher.

I have a suggestion if you notice an unexplained coincidence in your life. Figure out if that same coincidence repeats itself in a bit – a week say. You have much stronger grounds for an argument with someone like me if you do! And then you still have to have a coherent theory why it was a real coincidence in the first place!


Addendum: Just to clarify, I write above “It is a coincidence that the moon is exactly the right size to completely cover the sun …” – this is from our point of view, of course. These objects would have radically different sizes when viewed from Jupiter, for instance.

Arbitrage arguments in Finance and Physics

Arbitrage refers to a somewhat peculiar and rare situation in the financial world. It is succinctly described as follows. Suppose you start with an initial situation – let’s say you have some money in an ultra-safe bank that earns interest at a certain basic rate r. Assume, also, that there is a infinitely liquid market in the world, where you can choose to invest the money in any way you choose. If you can end up with {\bf {definite}} financial outcomes that are quite different, then you have an arbitrage between the two strategies. If so, the way to profit from the situation is to “short” one strategy (the one that makes less) and go “long” the other strategy (the one that makes more). An example of such a method would be to buy a cheaper class of shares and sell “short” an equivalent amount of an expensive class of shares for the same Company that has definitely committed to merge the two classes in a year.

An argument using arbitrage is hard to challenge except when basic assumptions about the market or initial conditions are violated. Hence, in the above example, suppose there was uncertainty about whether the merger of the two classes of shares in a year, the “arbitrage” wouldn’t really be one.

One of the best known arbitrage arguments was invented by Fischer Black, Myron Scholes and Robert Merton to deduce a price for Call and Put Options. Their argument is explained as follows. Suppose you have one interest rate for risk-free investments (the rate r two paragraphs above). Additionally, consider if you, Dear Reader, own a Call Option, with strike price \$X, on a stock price.  This is an instrument where at the end of (say) one year, you look at the market price of the stock and compute \$S - \$X. Let’s say X = \$100, while the stock price was initially \$76. At the end of the year, suppose the stock price became \$110, then the difference \$110 - \$100 = \$10, so you, Dear Reader and Fortunate-Call-Option-Owner, would make \$10. On the other hand, if the stock price unfortunately sank to \$55, then the difference \$55 - \$ 100 = - \$45 is negative. In this case, you, unfortunate Reader, would make nothing. A Call Option, therefore, is a way to speculate on the ascent of a stock price above the strike price.

Black-Scholes-Merton wanted to find a formula for the price that you should logically expect to pay for the option. The simplest assumption for the uncertainty in the stock price is to state that \log S follows a random walk. A random walk is the walk of a drunkard that walks on a one-dimensional street and can take each successive step to the front or the back with equal probability. Why \log S and not S? That’s because a random walker could end up walking backwards for a long time. If her walk was akin to a stock price, clearly the stock price couldn’t go below 0 – a more natural choice is \log S which goes to - \infty as S \rightarrow 0. A random walker is characterized by her step size. The larger the step size, the further she would be expected to be found relative to her starting point after N steps. The step size is called the “volatility” of the stock price.

In addition to an assumption about volatility, B-S-M needed to figure out the “drift” of the stock price. The “drift”, in our example, is akin to a drunkard starting on a slope. In that case, there is an unconscious tendency to drift down-slope. One can model drift by assuming that there isn’t the same probability to move to the right, as to the left.

The problem is, while it is possible to deduce, from uncertainty measures in the market, the “volatility” of the stock, there is no natural reason to prefer one “drift” over the other. Roughly speaking, if you ask people in the market whether IBM will achieve a higher stock price after one year, half will say “Yes”, the other half will say “No”. In addition, the ones that say “Yes” will not agree on exactly by how much it will be up. The same for the “No”-sayers! What to do?

B-S-M came up with a phenomenal argument. It goes as follows. We know, intuitively, that a Call Option (for a stock in one year) should be worth more today if the stock price were higher today (for the same Strike Price) by, say \$1. Can we find a portfolio that would decline by exactly the same amount if the stock price was up by \$1. Yes, we can. We could simply “short” that amount of shares in the market. A “short” position is like a position in a negative number of shares. Such a position loses money if the market were to go up. And I could do the same thing every day till the Option expires. I will need to know, every day, from the Option Formula that I have yet to find, a “first-derivative” – how much the Option Value would change for a \$1 increase in the stock price. But once I do this, I have a portfolio (Option plus this “short” position) that is {\bf {insensitive}} to stock price changes (for small changes).

Now, B-S-M had the ingredients for an arbitrage argument. They said, if such a portfolio definitely could make more than the rate offered by a risk-less bank account, there would be an arbitrage. If the portfolio definitely made more, borrow (from this risk-free bank) the money to buy the option, run the strategy, wait to maturity, return the loan and clear a risk-free profit. If it definitely made less, sell this option, invest the money received in the bank, run the hedging strategy with the opposite sign, wait to maturity, pay off  the Option by withdrawing your bank funds, then pocket your risk-free difference.

This meant that they could assume that the portfolio described by the Option and the Hedge, run in that way, were forced to appreciate at the “risk-free” rate. This was hence a natural choice of the “drift” parameter to use. The price of the Option would actually not depend on it.

If you are a hard-headed options trader, though, the arguments just start here. After all, the running of the above strategy needs markets that are infinitely liquid with infinitesimal “friction” – ability to sell infinite amounts of stock at the same price as at which to buy them. All of these are violated to varying degrees in the real stock market, which is what makes the B-S-M formula of doubtful accuracy. In addition, there are other possible processes (not a simple random-walk) that the quantity \log S might follow. All this contributes to a robust Options market.

An arbitrage argument is akin to an argument by contradiction.

Arguments of the above sort, abound in Physics. Here’s a cute one, due to Hermann Bondi. He was able to use it to deduce that clocks should run slower in a gravitational field. Here goes (this paraphrases a description by the incomparable T. Padmanabhan from his book on General Relativity).

Bondi considered the following sort of apparatus (I have really constructed my own example, but the concept is his).

Bondi Apparatus.JPG

One photon rushes from the bottom of the apparatus to the top. Let’s assume it has a frequency \nu_{bottom} at the bottom of the apparatus and a frequency \nu_{top} at the top. In our current unenlightened state of mind, we think these will be the same frequency. Once the photon reaches the top, it strikes a target and undergoes pair production (photon swerves close to a nucleus and spontaneously produces an electron-positron pair – the nucleus recoils, not in horror, but in order to conserve energy and momentum). Let’s assume the photon is rather close to the mass of the electron-positron pair, so the pair are rather slow moving afterwards.

Once the electron and positron are produced (each with momentum of magnitude p_{top}), they experience a strong magnetic field (in the picture, it points out of the paper). The law that describes the interaction between a charge and a magnetic field is called the Lorentz Force Law. It causes the (positively charged) positron to curve to the right, the (negatively charged) electron to curve to the left. The two then separately propagate down the apparatus (acquiring a momentum p_{bottom}) where they are forced to recombine, into a photon, of exactly the right frequency, which continues the cycle. In particular, writing the energy of the photons in each case.

h \nu_{top} = 2 \sqrt{(m_e c^2)^2+p_{top}^2 c^2} \approx 2 m_e c^2

h \nu_{bottom} = 2 \sqrt{(m_e c^2)^2+p_{bottom}^2 c^2} \approx 2 m_e c^2 + 2 m_e g L

In the above, p_{bottom} > p_{top}, the electrons have slightly higher speed at the bottom than at the top.

We know from the usual descriptions of potential energy and kinetic energy (from high school, hopefully), that the electron and positron pick up energy  m_e g L (each) on their path down to the bottom of the apparatus. Now, if the photon doesn’t experience a corresponding loss of energy as it travels from the bottom to the top of the apparatus, we have an arbitrage. We could use this apparatus to generate free energy (read “risk-less profit”) forever. This can’t be – this is nature, not a man-made market! So the change of energy of the photon will be

h \nu_{bottom} - h \nu_{top} =2 m_e g L \approx h \nu_{top} \frac{g L}{c^2}

indeed, the frequency of the photon is higher at the bottom of the apparatus than at the top. As photons “climb” out of the depths of the gravitational field, they get red-shifted – their wavelength lengthens/frequency reduces. This formula implies

\nu_{bottom} \approx \nu_{top} (1 + \frac{g L}{c^2})

writing this in terms of the gravitational potential due to the earth (mass M) at a distance R from its center

\Phi(R) = - \frac {G M}{R}

\nu_{bottom} \approx \nu_{top} (1 + \frac{\Phi(top) - \Phi(bottom)}{c^2})

so , for a weak gravitational field,

\nu_{bottom} (1 + \frac{ \Phi(bottom)}{c^2}) \approx \nu_{top} (1 + \frac{\Phi(top)}{c^2})

On the other time intervals are related to inverse frequencies (we consider the time between successive wave fronts)

\frac {1}{\Delta t_{bottom} } (1 + \frac{ \Phi(bottom)}{c^2}) \approx \frac {1}{\Delta t_{top}} (1 + \frac{\Phi(top)}{c^2})

so comparing the time intervals between successive ticks of a clock at the surface of the earth, versus at a point infinitely far away, where the gravitational potential is zero,

\frac {1}{\Delta t_{R} } (1 + \frac{ \Phi(R)}{c^2}) \approx \frac {1}{\Delta t_{\infty}}

which means

\Delta t_{R} =  \Delta t_{\infty} (1 + \frac{ \Phi(R)}{c^2})  

The conclusion is that the time between successive ticks of the clock is measured to be much smaller on the surface of the earth vs. far away. Note that \Phi(R) is negative, and the gravitational potential is usually assumed to be zero at infinity. This is the phenomenon of time dilation due to gravity. As an example, the GPS systems are run off clocks on satellites orbiting the earth at a distance of $20,700$ km. The clocks on the earth run slower than clocks on the satellites. In addition, as a smaller effect, the satellites are travelling at a high speed, so special relativity causes their clocks to run a little slower compared to those on the earth. The two effects act in opposite directions. This is the subject of a future post, but the effect, which has been precisely checked, is about 38 \museconds per day. If we didn’t correct for relativity, our planes would land at incorrect airports etc and we would experience total chaos in transportation.

The earth is flat – in Cleveland


I stopped following basketball after Michael Jordan stopped playing for the Bulls – believe it or not, the sport appears to have become the place to believe and practice outlandish theories that might be described (in comparison to the Bulls) as bull****.

There’s a basketball star, that plays for the Cleveland Cavaliers. His name is Kyrie Irving. He believes that the earth is flat. He wishes to leave the Cleveland Cavaliers – but not go away too far, since he might fall off the side of the earth. However, he has inspired a large number of middle-schoolers (none of whom I have had the pleasure of meeting, but apparently they exist) that the earth is flat and that the “round-earthers” are government-conspiracy-inspired, pointy-headed, Russian spies – read this article if you want background. In fact, there is a club called the Flat Earth Society, that has members around the globe, that all believe the earth is flat as a pancake.

It would be really interesting, I thought, if, like my favorite detective – Sherlock Holmes – I decided to write the “Intelligent Person’s Guide to Why the Earth is Round”. I would ask you, dear Skeptical Reader, to use no more than tools readily available, some believable friends who possess phones with cameras and the ability to send and receive pictures by mail or text, as well as not being in the pay of the FSB  (or the North Koreans, who decidedly are trying very hard to check the flat earth theory by sending out ICBMs at increasing distances).

I live  in south New Jersey. At my location, the sun rose today at 5:57 am (you could figure this out by typing it out on Google search or just wake up in time to look for the sun). I have two friends, that live in Denver (Colorado) and Cheyenne (Wyoming). Their sunrises occur at 6:00 am and 5:53 am (their time) – averages to 5:56:30 am roughly. I realize that Denver is a mile high, which is also roughly Cheyenne’s height, but hey, you don’t pick your friends. I also live at an elevation of roughly 98′, which isn’t much and I ignore it. They sent me pictures of when the sun rose and I was able to prove they weren’t lying to me or part of a government conspiracy.

The distance from my town to these places is 1766 miles (to Denver) and 1613 miles (to Cheyenne). I used Google to calculate these, but you could schlep yourself there too. Based on just these facts, I should conclude that the earth curves between New Jersey and those places. To my mind, this should clinch the question of whether the earth is round. Since the  roughly 1700 mile separation equals 2 hours of time difference (in sunrises), a 24-hour time difference corresponds to 20,400 miles. This is roughly equal to 24,000  miles times the cosine of 40 degrees, which is the latitude of both New York City, Denver and Cheyenne (which is the circumference for radius r, rather than Earth’s radius R). This means 2 \pi R (which is the earth’s equatorial circumference) is roughly 26,000 miles, which is close to the correct figure to within 4%. The extreme height at Denver and Cheyenne has something to do with it! The sun {\bf should} have risen later in Denver and Cheyenne if they had been at lower elevations, so 1700 miles would {\bf really} have corresponded to a few minutes more than 2 hours, which would have meant a lower estimate for the earth’s equatorial circumference.

Earth Is Round

By the way, I picked Cheyenne because of its auditory resemblance to the town that Eratosthenes picked for his diameter-of-Earth measurement, Syrene in present-day Libya. Yes, the first person to measure the Earth’s diameter was Libyan!

Some objections to these entirely reasonable calculations include – if the earth is actually rotating, why doesn’t it move under you when you go up in a balloon. Sorry, this has been thought of already! When I was young, I was consumed by Yakov Perelman’s “Astronomy for Entertainment” – a book written by the tragically short-lived Soviet popularizer of science who died during the siege of Leningrad (St. Petersburg) in 1942. Perelman wrote about a young, enterprising, French advertising executive/scammer at the turn of the 19th century that dreamed up a new scheme to separate people from their money. He advertised balloon flights that would take you to different parts of the world without moving – just go up in a balloon and stay aloft till your favorite country comes up beneath you. It doesn’t happen because all the stuff around you is moving with you. Why? Its the same reason why the rain drops don’t fly off your side-windows even when you are driving on the road at high speed in the rain – forgetting for a second about the gravitational force that pulls things towards the earth’s center. There is a boundary layer of material that rotates or moves as fast as a moving object – its a consequence of the mechanics of fluids and we live with it in various places. For instance, it is one reason why icing occurs on airplane wings – if there was a terrible force of wind all the time, ice wouldn’t form.

So, if you are willing to listen to reason, no reason to restrict yourself to Cleveland. The world is invitingly round.

Addendum : a rather insightful friend of mine just told me that Kyrie Irving was actually born in Australia on the other side of the Flat Earth. If so, I doubt that even my robust arguments would convince him to globalize his views.

Special Relativity; Or how I learned to relax and love the Anti-Particle

The Special Theory of Relativity, which is the name for the set of ideas that Einstein proposed in 1905 in a paper titled “On the Electrodynamics of moving bodies”, starts with the premise that the Laws of Physics are the same for all observers that are traveling at uniform speeds relative to each other. One of the Laws of Physics includes a special velocity – Maxwell’s equations for electromagnetism include a special speed c, which is the same for all observers. This leads to some spectacular consequences. One of them is called the “Relativity of Simultaneity”. Let’s discuss this with the help of the picture below.


Babu is sitting in a railway carriage, manufactured by the famous C-Kansen company, that travels at speeds close to that of light. Babu’s sitting exactly in the middle of the carriage and for reasons best known to himself (I guess the pantry car was closed and he was bored), decides to shoot a laser beam simultaneously at either end of the carriage from his position. There are detectors/mirrors that detect the light at the two ends of the carriage. As far as he is concerned, light travels at 3 \times 10^5 \frac {km}{sec} and he will challenge anyone who says otherwise to a gunfight – note that he is wearing a cowboy hat and probably practices open carry.

Since the detectors at the end of the carriage are equidistant from him, he is going to be sure to find the laser beams hit the detectors simultaneously, from his point of view.

Now, consider the situation from the point of view of Alisha, standing outside the train, near the tracks, but safely away from Babu and his openly carried munitions. She sees that the train is speeding away to the left, so clearly since {\bf she} thinks light also travels at 3 \times 10^5 \frac {km}{sec}, she would say that the light hit the {\bf right} detector first before the {\bf left} detector. She doesn’t {\underline {at \: all \: think}} that the light hit the two detectors simultaneously. If you asked her to explain, she’d say that the right detector is speeding towards the light, while the left detector is speeding away from the light, which is why the light strikes them at different times.

Wait – it is worse. If you had a third observer, Emmy, who is skiing to the {\bf {left}} at an even higher speed than the C-Kansen (some of these skiers are crazy), she thinks the C-Kansen train is going off to the right (think about it), not able to keep up with her. As far as {\underline {\bf {she}}} is concerned, the laser beam hit the {\bf {left}} detector before the other beam hit the {\bf {right}} detector.

What are we finding? The Events in question are – “Light hits Left Detector” and “Light hits Right Detector”. Babu claims the two events are simultaneous. Alisha claims the second happened earlier. Emmy is insistent that the first happened earlier. Who is right?

They are ALL correct, in their own reference frames. Events that appear simultaneous in one reference frame can appear to occur the one before the other in a different frame, or indeed the one after the other, in another frame. This is called the Relativity of Simultaneity. Basically, this means that you cannot attribute one of these events to have {\bf {caused}} the other, since their order can be changed. Events that are separated in this fashion are called “space-like separated”.

Now, on to the topic of this post. In the physics of quantum field theory, particles interact with each other by exchanging other particles, called gauge bosons. This interaction is depicted, in very simplified fashion so we can calculate things like the effective force between the particles, in a sequence of diagrams called Feynman diagrams. Here’s a diagram that depicts the simplest possible interaction between two electrons


Time goes from the bottom to the top, the electrons approach each other, exchange a photon, then scoot off in different directions.

This is the simplest diagram, though and to get the exact numerical results for such scattering, you have to add higher orders of this diagram, as shown below


When you study such processes, you have to perform mathematical integrals – all you know is that you sent in some particles from far away into your experimental set-up, something happened and some particles emerged from inside. Since you don’t know where and when the interaction occurred (where a particle was emitted or picked up, as at the vertexes in the above diagrams), you have to sum over all possible places and times that the interaction {\bf {could}} have occurred.

Now comes the strange bit. Look at what might happen when you sum over all possible paths for a collision between an electron and a photon.


In the above diagram, the exchange was simultaneous.

In the next one, the electron emitted a photon, then went on to absorb a photon.


and then comes the strange bit –



Here the electron emitted a photon, then went backwards in time, absorbed a photon, then went its way.

When we sum over all possible event times and locations, this is really what the integrals in quantum field theory instruct us to do!

Really, should we allow ourselves to count processes where  two events occur simultaneously, which means we would then have to allow for them to happen in reverse order, as in the third diagram? What’s going on? This has to be wrong! And what’s an electron going backwards in time anyway? Have we ever seen such a thing?

Could we simply ban such processes? So, we would only sum over positions and times where the intermediate particles had enough time to go from one place to another,

There’s a problem with this. Notice the individual vertexes where an electron comes in, emits (or absorbs) a photon, then moves on. If this were a “real” process, it wouldn’t be allowed. It violates the principle of energy-momentum conservation. A simple way to understand this is to ask, could a stationary electron suddenly emit a photon and shoot off in a direction opposite to the photon, It looks deceptively possible! The photon would have, say, energy E and momentum p = E/c. This means that the electron would also have momentum E/c, in the opposite direction (for conservation) but then its energy would have to be \sqrt{E^2+m^2 c^4} from the relativistic formula. This is higher than the energy m c^2 of the initial electron : A+ \sqrt{A^2+m^2} is bigger than m! Not allowed !

We are stuck. We have to assume that energy-momentum conservation is violated in the intermediate state – in all possible ways. But then, all hell breaks loose – in relativity, the speed of a particle v is related to its momentum p and its energy E by v = \frac {p}{E} – since p and E can be {\underline {anything}}, the intermediate electron could, for instance, travel faster than light. If so, in the appropriate reference frame, it would be absorbed before it was created. If you can travel faster than light, you can travel backwards in time (read this post in Matt Buckley’s blog for a neat explanation).

If the electron were uncharged, we would probably be hard-pressed to notice. But the electron is charged. This means if we had the following sequence of events,

– the world has -1 net charge

– electron emits a photon and travels forward in time

– electron absorbs a photon and goes on.

This sequence doesn’t appear to change the net charge in the universe.

But consider the following sequence of events

– the world has -1 net charge

– the electron emits a photon and travels backwards in time

– the electron absorbs a photon in the past and then starts going forwards in time

Now, at some intermediate time, the universe seems to have developed two extra negative charges.

This can’t happen – we’d notice! Extra charges tend to cause trouble, as you’d realize if you ever received an electric shock.

The only way to solve this is to postulate that an electron moving backward in time has a positive charge. Then the net charge added for all time slices is always -1.

ergo, we have antiparticles. We need to introduce the concept to marry relativity with quantum field theory.

There is a way out of this morass if we insist that all interactions occur at the same point in space and that we never have to deal with “virtual” particles that violate momentum-energy conservation at intermediate times. This doesn’t work because of something that Ken Wilson discovered in the late ’70s, called the renormalization group – the results of our insistence would be we would disagree with experiment – the effects would be too weak.

For quantum-field-theory students, this is basically saying that the expansion of the electron’s field operator into its components can’t simply be

\Psi(\vec x, t) = \sum\limits_{spin \: s} \int \frac {d \vec k}{(2 \pi)^3} \frac{1}{\sqrt{2 E_k}} b_s(\vec k) e^{- i E_k t + i {\vec k}.{\vec x}}

but has to be

\Psi(\vec x, t) = \sum\limits_{spin \: s} \int \frac {d \vec k}{(2 \pi)^3} \frac{1}{\sqrt{2 E_k}} \left( b_s(\vec k) e^{- i E_k t + i {\vec k}.{\vec x}}  +d^{\dagger}_s(\vec k) e^{+ i E_k t - i {\vec k}.{\vec x}} \right)

including particles being destroyed on par with anti-particles being created and vice versa.

The next post in this sequence will discuss another interesting principle that governs particle interactions – C-P-T.

Quantum field theory is an over-arching theory of fundamental interactions. One bedrock of the theory is something called C-P-T invariance.  This means that if you take any physical situation involving any bunch of particles, then do the following

  • make time go backwards
  • parity-reverse space (so in three-dimensions, go into a mirror world, where you and everything else is opposite-handed)
  • change all particles into anti-particles (with the opposite charge)

then you will get a process which could (and should) happen in your own world. As far as we know this is always right, in general and it has been proven under a variety of assumptions. A violation of the C-P-T theorem in the universe would create quite a stir. I’ll discuss that in a future post.

Addendum: After this article was published, I got a message from someone I respect a huge amount that there is an interesting issue here. When we take a non-relativistic limit of the relativistic field theory, where do the anti-particles vanish off to? This is a question that is I am going to try and write about in a bit!


Can a quantum particle come to a fork in the road and take it?

I have always been fascinated by the weirdness of the Universe. One aspect of the weirdness is the quantum nature of things – others relate to the mysteries of Lorentz invariance, Special Relativity, the General Theory of Relativity, the extreme size and age of the Universe, the vast amount of stuff we don't seem to be able to see and so on.

This post is about an experiment that directly points to the fundamental weirdness of small (and these days, not so small) particles. While quantum effects matter at the sub-atomic particle level, these effects can coalesce into macroscopic phenomena like superconductivity, the quantum Hall effect and so on, so they can't be ignored. This experiment, usually referred to as the "Double-Slit" experiment, is described and explained in detail in Vol. 3 of the Feynman Lectures in Physics. While it would be silly of me to try and outdo Feynman's explanation, which by the way was one of the reasons why I was enthused to study physics in the first place,  I want to go beyond the Double-Slit experiment to discuss the Delayed-Choice experiment – this extra wrinkle on the Double-Slit experiment was invented by the famous scientist John Wheeler (who was Feynman's Ph.D advisor) and displays for all to see, even more aspects of quantum weirdness.

Let's get started.

The Double-Slit experiment  is carried out by shooting electrons at a pair of closely placed slits – the electron flux is sufficiently small that one is able to count the number of times electrons hit various points on a television screen placed past the slits. If no measures are taken to identify which of the two paths the electrons actually took to reach the screen, then the probability density of arrival at various points on the television screen displays an “interference'' pattern. If, however, the experiment is set up so as to identify which slit the electron went through, for example by shining an intense beam of photons at the slits that scatter off the electrons, then the interference pattern for those “which-path'' identified electrons switches to a “clump'' pattern, centered around the open slit. The standard experiment is displayed, schematically, below, where since both slits are open and we don't bother to check which slit the electron goes through, we see an "interference" pattern. If we used photons (light) instead of electrons, we'd see alternate light and dark fringes.


If only one slit were open, we'd get a "clump" pattern as below


Note – no interference "bumps" at places far away from the peak.

This behavior is also what we'd get for light – photons.

Quantum mechanics is the theory that was constructed to "explain" this behavior. We construct a quantity called the "amplitude". The "amplitude" is a complex number that has a (complex) value at every point in space and time. Complex numbers have two properties – a magnitude and a phase. The magnitude squared of the amplitude at some time t  times a small volume in space v  is the probability of finding the particle (if its an amplitude for the electron, then the probability of finding the electron etc.) in that volume v at time t. Since you need to multiply the magnitude squared by a little volume element, the squared magnitude of the amplitude is referred to as the "Probability Density".

Schrodinger's equation writes down how this amplitude evolves in time – from the electron gun to the screen. To this equation, you need to add the "Born" prescription – that you have to square the magnitude of the amplitude to get the probability density.

Feynman found a neat, equivalent interpretation of Schrodinger's equation – his method basically said – if you want to find the amplitude for the electron (say) at some point in the screen, just write down path-amplitudes for all the different ways the electron could get from the electron gun to the screen. Add these path-amplitudes and then call the net sum the "total" amplitude for the electron to be found at the particular spot on the screen. Square the magnitude of this "total" amplitude and you will get the probability density for the electron to be found at that spot (times the little volume around that spot will give you the probability to find the electron at that spot).

All this discussion of path-amplitudes would be academic if the amplitudes were real numbers. The phase is a critical piece of the amplitude. Though the magnitude (squared) is the physical quantity (related to a probability), the magnitude of a sum of complex numbers depends delicately on the phase difference between the two. As an example, z_1=1+2i, z_2=-1-2i, z_3=1-2i, z_4=-1+2i have the same magnitude \sqrt{5}. However, z_1+z_2=0, z_1+z_3=2, z_1+z_4=4i all of which have magnitudes very different from just the sum of the individual magnitudes. That's the reason we get alternate light and dark fringes on the television screen – the phases of the amplitudes for the electron to get to those spots from either of the two slits sometimes causes the sum amplitude to be 0 (which is called destructive interference) and sometimes causes the sum amplitude to add to the maximum (which is called constructive interference) and all magnitudes between these two extremes. While this behavior is extremely counter-intuitive for particles, it resembles behavior we are used to with waves, as this YouTube video shows, so does this one. This is usually referred to as wave-particle duality.

The thing you need to take away from this experiment is that if you don't force the electron to get to the screen through only one slit, by say, closing off the other slit, it appears to behave like it goes through both slits.

Wait, it gets even more interesting.

The Delayed-Choice experiment was proposed by Marlan Scully (who is an inspiring scientist in his own right) and is portrayed in the accompanying figure.


An SPDC (spontaneous parametric down conversion) setup is used – basically, each of the red regions in the picture above produces two photons, when one photon hits it. Since the laser's photon can go through either of the two slits, this is just like the Double-Slit experiment, except that things are arranged so that after going through a slit, the photon would produce two other photons. The photons that travel towards the interferometer/detector D0 are referred to as “signal'' photons. The photons that travel towards the prism are the “idler'' photons. After passing through the prism, the “idler'' photons pass through a beam-splitter, that has a 50 \% probability of deflecting the incoming photon to detectors D3 and D4 respectively and 50\% probability of letting the photon pass on to the fully silvered (reflecting mirrors) at the bottom of the picture. Another beam-splitter is placed between the detectors D1 and D2, so photons that are detected at D1 and D2 have their “which-path'' information obliterated – for instance, an “idler'' photon arriving at D1 could have come along either of two paths. The actual experiment was performed by Kim et. al.

The detector D0 accumulates “signal'' photons – a coincidence counter correlates it to “idler'' photons detected at the detectors D3, D4, D1 and D2 (the “idler'' photons arrive at those detectors a few nanoseconds after the “signal'' photons are received. From the accumulated “signal'' photons received at D0, if we separate the ones received in coincidence with the detectors D3 or D4, since the “which-path'' information is clear in those cases, the pattern of interference observed (in the spatial density of the “signal'' photons) is the “clump'' pattern. However, the accumulated “signal'' photons received at D0 that are coincident with the ones received at D1 display an interference pattern, since one cannot infer the path taken by the “idler'' photons that show up at detector D1, which means one cannot tell which slit the “signal'' photon came through before arriving at detector D0. Similarly, the accumulated “signal'' photons received at D0 that are coincident with the ones received at D2 display an interference pattern in their spatial distribution, however this pattern is spatially offset half a wavelength off the one due to D1. This is just enough to put peaks where the other pattern has dips and vice versa. So, if you aren't careful to note which detector D1 or D2 detected the photon coincident with the "signal" photon detector D0, you get a clump pattern. The reason for this offset is  little tricky to explain – its called "unitarity" or conservation of probability at the beam-splitter, which forces some delicate phase assignments for the amplitudes we spoke about earlier.

Note, however, that someone could hide the specific arrival times of the photons at D1 and D2 from you for months and then tell you, say, a year later. All this time, you wouldn't have known there was an interference pattern "hiding" under the massive "clump" pattern you see. When you selectively look at the coincident detections separately for D1 and D2, it is then and only then, that you see the interference pattern.

Curious! This experiment, as I said, has been done and the results are as described above.

With a friend, I put together another set up in an interesting locale, a black hole,  that we tried to make work. Quantum mechanics defeated us in our attempts too, but its an interesting problem to work through.

A digression on statistics and a party with Ms. Fermi-Dirac and Mr. Bose (Post #5)

To explain the next standard candle, I need to digress a little into the math of statistics of lots of particles. The most basic kind is the statistics of distinguishable particles.

Consider the following scenario. You’ve organized a birthday party for a lot of different looking kids (no twins, triplets, quadruplets, quintuplets …). Each kid has equal access to a large pot of M&Ms in the center of the room. Each kid can grab M&Ms from the pot and additionally when they bounce off each other while playing, can exchange a few M&Ms with each other. After a long while, you notice that all the M&Ms in the central pot are gone.  Let’s suppose there are truly a {\underline {large}} number of kids (K) and a truly {\underline {humongous}} number of M&Ms (N).

Interesting question – how many M&Ms is each kid likely to have? A simpler question might be – how many kids likely have 1 M&M? How many likely have 2?..How many likely have 55?… How many likely have 5,656,005?…

How do we answer this question?

If you use the notation n_i for the number of kids that have i M&Ms?, then, we can easily write down

\sum\limits_{i} n_i  = K

is the total number of kids.

\sum\limits_i i n_i = N

is the total number of M&Ms.

But that isn’t enough to tell! We need some additional method to find the most likely distribution of M&Ms (clearly this wouldn’t work if I were there; I would have all of them and the kids would be looking at the mean dad that took the pot home, but that’s for a different post). The result, that Ludwig Boltzmann discovered, at the end of the 19th century, was {\bf not} simply the one where everybody has an equal number of M&Ms. The most likely distribution is the one with the most number of possible ways to exchange the roles of the kids and still have the same distribution. In other words, maximize the combinatoric number of ways

{\it \Omega} = \frac {K!} {n_1! n_2! n_3! ...n_{5,005,677}! ...}

which is the way of distributing these kids so that n_1 have 1 M&M, n_2 have 2 M&Ms, n_3 have 3 M&Ms…, n_{5,005,677} have 5,005,677 M&Ms and so on.

Boltzmann had a nervous breakdown a little after he invented the statistical mechanics, which is this method and its consequences, so don’t worry if you feel a little ringing in your ears. It will shortly grow in loudness!

How do we maximize this {\it \Omega}?

The simplest thing to do is to maximize the logarithm of {\it \Omega}, which means we maximize

\log \Omega = \log K! - \sum\limits_{i} \log n_i!

but we have to satisfy the constraints

\sum\limits_{i} n_i  = K, \hspace{5 mm} \sum\limits_i i n_i = N

The solution (a little algebra is required here) is that n_i \propto e^{-\beta i} where \beta is some constant for this ‘ere party. For historical reasons and since these techniques were initially used to describe the behavior of gases, it is called the inverse temperature. I much prefer “inverse gluttony” – the lower \beta is, the larger the number of kids with a lot of M&Ms.

Instead of the quantity i, which is the number of M&Ms the children have, if we considered \epsilon_i, which is (say) the dollar value of the i M&Ms, then the corresponding number of kids with "value" \: \epsilon_i is n_i \propto e^{-\beta \epsilon_i}

Few kids have a lot of M&Ms, many have very few – so there you go, Socialists, doesn’t look like Nature prefers the equal distribution of M&Ms either.

If you thought of these kids as particles in a gas and \epsilon_i as one of the possible energy levels (“number of M&Ms”) the particles could have, then the fraction of particles that have energy \epsilon_i would be

n(\epsilon_i) \propto e^{- \beta \epsilon_i}

This distribution of particles into energy levels is called the Boltzmann distribution (or the Boltzmann rule). The essential insight is that for several {\bf distinguishable} particles the probability that a particular particle is in a state of energy \epsilon is proportional to e^{-\beta \epsilon}.

After Boltzmann discovered this, the situation was static till the early 1920s when people started discovering particles in nature that were {\bf indistinguishable}. It is a fascinating fact of nature that every photon or electron or muon or tau particle is exactly identical to every other photon or electron or muon or tau particle (respectively and for all other sub-atomic particles too). While this fact isn’t “explained” by quantum field theory, it is used in the construction of our theories of nature.

Back to our party analogy.

Suppose, instead of a wide variety of kids, you invited the largest K-tuplet the world has ever seen. K kids that {\bf ALL} look identical. They all have the same parents (pity them {\bf please}), but hopefully were born in some physically possible way, like test-tubes. You cannot tell the kids apart, so if one of them has 10 M&Ms, its indistinguishable from any of the kids having 10 M&Ms.

Now what’s the distribution of the number of kids n_i with \epsilon_i value in M&Ms? The argument I am going to present is one I personally have heard from Lubos Motl’s blog (I wouldn’t be surprised if its more widely available, though given the age of the field) and it is a really cute one.

There are a couple of possibilities.

Suppose there was a funny rule (made up by Ms. Fermi-Dirac, a well known and strict party host) that said that there could be at most 1 kid that had, say \epsilon_i value in M&Ms (for every i). Suppose P_0(\epsilon_i) were the probability that {\underline {no}} kid had \epsilon_i of value in M&Ms. Then the probability that that 1 kid has \epsilon_i of value in M&Ms is P_0 e^{-\beta \epsilon_i} – remember the Boltzmann rule! Now if no other possibility is allowed (and if one kid has i M&Ms, it is indistinguishable from any of the other kids, so you can’t ask which one has that many M&Ms)

P_0(\epsilon_i) + P_0(\epsilon_i) e^{-\beta \epsilon_i} = 1

since there are only two possibilities, the sum of the probabilities has to be 1.

This implies

P_0(\epsilon_i) = \frac {1}{1 + e^{-\beta \epsilon_i}}

And we can find the probability of there being 1 kid with value \epsilon_i in M&Ms. It would be

P_1({\epsilon_i}) = 1 - P_0({\epsilon_i}) =  \frac {e^{-\beta \epsilon_i}}{1 + e^{-\beta \epsilon_i}}

The expected number of kids with value \epsilon_i in M&Ms would be

{\bar{\bf n}}(\epsilon_i) = 0 P_0(\epsilon_i) + 1 P_1({\epsilon_i}) = {\bf \frac {1}{e^{\beta \epsilon_i}+1} }

But we could also invite the fun-loving Mr. Bose to run the party. He has no rules! Take as much as you want!

Now, with the same notation as before, again keeping in mind that we cannot distinguish between the particles,

P_0(\epsilon_i) + P_0(\epsilon_i) e^{-\beta \epsilon_i} + P_0(\epsilon_i) e^{-2 \beta \epsilon_i} + .... = 1

which is an infinite (geometric) series. The sum is

\frac {P_0(\epsilon_i) }{1 - e^{-\beta \epsilon_i} } = 1

which is solved by

P_0(\epsilon_i) = 1 - e^{-\beta \epsilon_i}

The expected number of kids with value \epsilon_i in M&Ms is

{\bar{\bf n}}(\epsilon_i) = 0 P_0(\epsilon_i) + 1 P_0(\epsilon_i) e^{-\beta \epsilon_i} + 2 P_0(\epsilon_i) e^{-2 \beta \epsilon_i} + ...

which is

{\bar{n}}(\epsilon_i) = P_0(\epsilon_i) \frac {e^{-\beta \epsilon_i} } {(1 - e^{-\beta \epsilon_i})^2} = {\bf \frac {1}{e^{\beta \epsilon_i} -1}}

Now, here’s a logical question. If you followed the argument above, you could ask this -could we perhaps have a slightly less strict host, say Ms. Fermi-Dirac-Bose-2 that allows up to 2 kids to possess a number of M&Ms whose value is \epsilon_i? How about a general number L kids that are allowed to possess M&Ms of value \epsilon_i (the host being the even more generous Ms. Fermi-Dirac-Bose-L). More about this on a different thread. But the above kinds of statistics are the only ones Nature seems to allow in our 4 – dimensional world (three space and one time). Far more are allowed in 3 – dimensional worlds (two space and one time) and that will also be in a different post (the sheer number of connections one can come up with is fantastic!).

The thing to understand is that particles that obey Fermi-Dirac statistics (a maximum of one particle in every energy state) have a “repulsion” for each other – they don’t want to be in the same state as another Fermi-Dirac particle, because Nature forces them to obey Fermi-Dirac statistics.  If the states were characterized by position in a box, they would want to stay apart. This leads to a kind of outwards pressure. This pressure (described in the next post) is called Fermi-degeneracy pressure – its what keeps a peculiar kind of dense star called a white dwarf from collapsing onto itself. However, beyond a certain limit of mass (called the Chandrasekhar limit after the scientist that discovered it), the pressure isn’t enough and the star collapses on itself – leading to a colossal explosion.

These explosions are the next kind of “standard candle”.


{\bf {Addendum}}:

I feel the need to address the question I asked above, since I have been asked informally. Can one get statistics with choosing different values of L in the above party? The answer is “No”. The reason is this – suppose you have K L kids at the party, with a maximum of L kids that can carry M&Ms of value \epsilon_i. Then we should be able to divide all our numbers by L (making a scale model of our party that is L times smaller) that has K kids, with a maximum of 1 kid that is allowed to hold M&Ms of value \epsilon_i. You’d expect the expected number of kids with M&Ms of value \epsilon_i to be, correspondingly, L times smaller! Then, the expected number of particles in a state (with a limit of L particles in each state) is just L times the expected number with a limit of 1 particle in each state.

So all we have are the basic Fermi-Dirac and Bose statistics (1 or many), in our three-space-dimensional party!

Cosmology: Cepheid Variables – or why Henrietta couldn’t Leavitt alone …(Post #4)

Having exhausted the measurement capabilities for small angles, to proceed further, scientists really needed to use the one thing galaxies and stars put out in plenty – light. The trouble is, to do so, we either need detailed, correct theories of galaxy and star life-cycles (so we know when they are dim or bright) or we need a “standard candle”. That term needs explanation.

If I told you to estimate how far away a bulb was, you could probably make an estimate based on how bright the bulb seemed. For this you need two things. You need to know how bright the bulb is {\bf intrinsically} – this is the absolute luminosity and its measured in watts which is Joules \: per \: second. Remember, however, that a 100 watt bulb right next to you appears brighter (and hotter) than the same 100 watt bulb ten miles away! To account for that, you could use the fact that the bulb distributes its light almost uniformly into a sphere around itself, to compute what fraction of the light energy you are actually able to intercept – we might have a patch of CCD (like the little sensor inside your video camera), of area A capturing the light emitted by the bulb. Putting these together, as in the figure below, the amount of light captured is I_{Apparent} watts while the bulb puts out I_{Intrinsic} watts.

Luminosity Falls Off

I_{Apparent} = I_{Intrinsic} \frac{CCD \: Area}{Sphere \: Surface \: Area}

I_{Apparent} = A \frac {I_{Intrinsic}}{4 \pi R^2}

where if you dig into your memory, you should recall that the area of a sphere of radius R is 4 \pi R^2!

you can compute R

R = \sqrt{A \frac {I_{Intrinsic}}{4 \pi I_{Apparent}}}

You know how big your video camera’s sensor area is (it is in that manual that you almost threw away!) You know how much energy you are picking up every second (the apparent luminosity) – you’d need to buy a multimeter from Radio Shack for that (if you can find one now). But to actually compute the distance, you need to know the {\bf Intrinsic} or {\bf actual} luminosity of the light source!

That’s the problem! To do this, we need a set of “standard candles” (a light source of known actual luminosity in watts!) distributed around the universe. In fact the story of cosmology really revolves around the story of standard candles.

The first “standard candles” could well be the stars. If you assume you know how far away the Sun is, and if you assume other stars are just like our Sun, then you could make the first estimates of the size of the Universe.

We already know that the method of parallax could be used with the naked eye to calculate the distance to the moon. Hipparchus calculated that distance to be 59 earth radii. Aristarchus measured the distance to the sun (the method is a tour de force of elementary trigonometry and I will point to a picture here as an exercise!)

Aristarchus Figures out the Distance to the Sun

His calculation of the Earth-Sun distance was only 5 million miles, a fine example of a large experimental error – the one angle he had to measure was \alpha, he got wrong by a factor of 20. Of course, he was wise – he would have been blinded if he had tried to be very accurate and look at the sun’s geometric center!

Then, if you blindly used this estimate and ventured bravely on to calculate distances to other stars based on their apparent brightness relative to the sun, the results were startlingly large (and of course, still too small!) and people knew this as early as 200 B.C. The history of the world might have well been different if people had taken these observers seriously. It was quite a while and not till the Renaissance in Europe that quantitative techniques were re-discovered for distance measurements to the stars.

The problem with the technique of using the Sun as a “standard candle” is that stars differ quite a bit in their luminosity based on their composition, their size, their age and so on. The classification of stars and the description of their life-cycle was completed with the Hertzsprung-Russell diagram in 1910. In addition, the newly discovered nebulae had been resolved into millions of stars, so it wasn’t clear there was a simple way to think of stellar “standard candles” unless someone had a better idea of the size of these stellar clusters. However, some of the nearby galaxy companions of the Milky Way could have their distances estimated approximately (the Magellanic Cloud, for instance).

Enter Henrietta Leavitt. Her story is moving and representative of her time, from her Radcliffe college education to her $0.30 / hour salary for her work studying variable stars (she was a human computer for her academic boss), as well as the parsimonious recognition for her work while she was alive. She independently discovered that a class of variable stars called Cepheids in the Magellanic clouds appeared to have a universal connection between their intrinsic luminosity and the time period of their brightness oscillation. Here’s a typical graph (Cepheids are much brighter than the Sun and can be observed separately in many galaxies)


If you inverted the graph, you simply had to observe a Cepheid variable’s period to determine the absolute luminosity. Voila! You had a standard candle.

A little blip occurred in 1940, when Walter Baade discovered that Cepheids in the wings of  the Andromeda galaxy were older stars (called Population II, compared to the earlier ones that are now referred to as Population I) and were in general dimmer than Population I Cepheids.  When the Luminosity vs. Period graph was drawn for those, it implied the galaxy they were in was actually even further away! The size of the universe quadrupled (as it turned out) overnight!

Henrietta Leavitt invented the first reliable light-based distance measurement method for galaxies. Edwin Hubble and Milton Humason used data collected mainly from an analysis of Cepheids to derive the equation now known as Hubble’s law.

Next post will be about something called Olbers’ paradox before we start studying the expansion of the Universe, the Cosmic Microwave background and the current belief that we constitute just 4% of the universe  – the rest being invisible to us and not (as far as we can tell) interacting with us.










Cosmology: Distance Measurements – Parallax (Post #3)

This post describes the cool methods people use to figure out how far away stars and galaxies are. Figuring out how far away your friend lives is easy – you walk or drive at a constant speed in a straight line from your home to their house – then once you know how much time this took, you multiply speed times the time of travel to get the distance to your friend’s house.

This might seem like an excessively detailed description of a simple task, but don’t forget that the ancients would have difficulty with several things here – how do you travel at a constant speed and how do you measure time of travel? The first seems like a possible task, but how do you measure time ? Humans have invented many ways to measure time – water clocks (reported in Greece, China and India), sand clocks, burning knotted ropes. The Antikythera mechanism, if confirmed to be an astronomical device, would be similar to a true mechanical clock, but it took the ability to work metal and the Industrial Revolution to reliably mass-produce clocks.

This was the most effective way to measure distances for many years; just travel there and keep notes!

The heavenly object closest to us appears to be the moon. Very early, to some extent by Aristarchus, but really by Edmund Halley (whose comet is more famous than he is), it was realized that Parallax could be used to figure distance to far away objects, without actually traveling there. Parallax is illustrated below – its the perceived angular shift in an object’s position relative to far-away things when you shift your viewing position. You experience this all the time when you see nearby things shift when you look from one eye, then the other.


The diagram above is a little busy, so let me explain it. L is the distance that we are trying to measure, between the Earth (where the fellow with the telescope is) and the bright blue star. R is the distance to the starry background, that is {\bf really} far away. Since R is {\bf much} bigger than L,  you should be able to convince yourself that the angles \alpha and \beta are very close to each other. From basic geometry, to a good approximation

D = \alpha L

which means L = \frac {D}{\alpha}. We just need to compute  \alpha, but it is roughly equal to \beta. \beta is the just the angular separation of the stars P and Q, which you could measure with, for instance, a sextant.

We know D, which is the baseline of the measurement. If you use your two eyes, it is a few inches. You could get ambitious and make measurements in summer and winter, when the baseline would be the diameter of the Earth’s orbit (OK, the orbit is very nearly a circle). The result is that you can figure out how far away the bright blue star is by computing the perceived angular shift.

The farther away something is, the smaller the perceived angular shift. For a long time, people could not measure angular shifts for really distant objects and made the assumption that the method was wrong for some reason, for they couldn’t believe stars could be that far away.

The state of the art in the parallax measurement was the Hipparcos satellite and is currently the Gaia satellite (as well as Hubble). Distances upto 30,000 light years are capable of being measured. For reference, we think Andromeda galaxy is 2.5 million light years away and the Milky Way’s dark matter halo extends out to 180,000 light years. So to measure out to these distances needs different techniques, which will be discussed in the next post.

Cosmology and the Expanding Universe ..(Post #2)

The previous post discussed what Cosmological Red Shift is (and we defined z, the red-shift parameter). The saga of cosmology begins with general speculations for thousands of years about what those points of light in the sky really were. The construction of the first telescope around 1608, followed by visual explorations (by people like Galileo) of the Moon, Venus, Jupiter, Saturn and their moons led to the increasing certainty that the heavens were made of the same materials as those found on the earth. By the way,  it is indeed surprising (as you will see) that to some extent, cosmology has come full circle – it seems to appear that the heavens might be composed of different “stuff” than us on Earth.

Anyway, as I alluded to in the first post, the first mystery of modern cosmology was discovered in the light from distant galaxies.  If we make the entirely reasonable assumption that those galaxies were composed of stars like our sun, the light from those stars should be similar in composition (the mix of colors etc) to the light from our sun. Of course, it was entirely reasonable to expect that some of those stars might be smaller/bigger/younger/older than our sun, so if you had a good idea of how stars produced their light, you could figure out what the light should look like. Now in the 1910’s, 1920’s, 1930s, which is the era we are talking about, people didn’t really understand nuclear fusion, so there was some speculation going on about what made the stars shine. However, one thing was clear – stars contain lots of hydrogen, so we should be able to see the colors (the wavelengths) typical of emission from hot hydrogen atoms. Vesto Slipher was the first to note that the light emitted from the hydrogen (and some other light elements) in the stars in distant galaxies appeared to be red-shifted, i.e., to be redder than expected. This was puzzling, if you expected that hydrogen and other elements had the same properties as that on the Earth. The most sensible explanation was that this was an indication the galaxies were receding away from the earth. Edwin Hubble did some more work and discovered the famous correlation, now known as Hubble’s Law – the more distant a galaxy, the faster it seemed to be receding away from us. If {\bf V_{recession}} is the recession speed of a far-away galaxy, D is how far away it is and {\it H_0} is Hubble’s constant,

{\bf V_{recession}} = {\it H_0} D

Hubble’s constant is currently believed to be around 70 \frac {km/sec}{MegaParsec}. A MegaParsec is a million parsecs – a parsec is a convenient distance unit in cosmology and is roughly 3.26 light years. To interpret the formula, if a galaxy were 1 MegaParsec away, it would be rushing away from us at 70 km/sec . In terms of miles, 1 MegaParsec is 19 million trillion miles.

The story of how Edwin Hubble and others discovered how {\bf far} away the galaxies are (the right side of this equation) is interesting in its own right and features people such as Henrietta Leavitt. This will be the subject of my next post. Probably the best discussion of this is by Isaac Asimov, in a book called “The Guide to  Science – Physical Sciences”.

Getting back to our discussion, we don’t think we are somehow specially located in the Universe. This, by the way, was a philosophical principle that really traces back to the Copernican idea that the Earth wasn’t the Center of the Solar System. If we aren’t in some special place in the Universe, and if we see the galaxies receding away from us, it must be that ALL galaxies are receding from each other with a relative speed proportional to their mutual distance.

Thus was born the theory of the Expanding Universe.

One way to think of the Expanding Universe is to think of a rubber sheet, that is being stretched from all sides. Think of a coordinate system drawn on this rubber sheet, with the coordinates actually marked 1,2,3 .... The actual distance between points on the sheet is then, not just the coordinate difference, but a “scale factor” times the coordinate difference. This “scale factor”, which is usually referred to as a in cosmological discussions, is usually assumed to be the same number for all points in space at the same point of time in the Universe’s life.


In this picture – the grid spacing is 1 as the Universe expands. However, the distance between the grid points is a times the grid spacing of 1. In the picture, a is initially 1, but it increases to 4 as the expansion continues.

Next post, after I talk about distance measurements in the Universe, I’ll discuss the ideas of homogeneity and isotropy – two important concepts  that we use when studying the Universe.



A simple sum

This calculation was inspired, a few years ago, by trying to find a simple way to explain the sum of the first N natural numbers to my (then) twelve-year-old daughter, without the use of calculus. As many people know, the sum of the first N natural numbers is found very easily, using the method that Gauss (apparently) re-discovered as a two-year old, i.e.,

S_N^1 = 1 + 2 + 3 + ... + N



Adding the above two equations

2 (S^1_N) = N(N+1)


S^l_N = \frac{N(N+1)}{2}

Of course, this method cannot be used to sum up the series with squares or higher powers.

Let’s however, continue to use the convenient notation for the sum of squares

S_N^2 = 1^2+2^2+3^2+...+N^2

There’s a useful recursion relation between S_N^2 and S^2_{N-1}

S_N^2= S_{N-1}^2+N^2

Let’s imagine the following – say you have 1 \times 1 (one-unit by one-unit) squares of fixed height cut out of paper. Suppose you arrange them first as below

{\bf Layer 0}


there are 1^2 pieces of paper here

{\bf Layer 1}


there are 2^2 pieces of paper here

And another layer, {\bf Layer 2}


there are 3^2 pieces of paper here

Let’s place the pieces of paper so that Layer 0 is at the bottom, Layer 1 is next on top of it, Layer 2 is on top of that and so on.

Let’s compute the heights of the “skyscraper” that results!

The highest tower is the one on top of the square with vertices (x=0, y=0), (x=1, y=0), (x=0,y=1) and (x=1,y=1). It has height N and the total number of pieces of paper in it are

N \times 1 = (N - 0) \times (1^2 - 0^2) square pieces of paper.

I’ve just written this in a suggestive way.

The next tower is the one just surrounding this one on two sides, there are actually three towers, of height  (N-1) and the total number of pieces of paper in it is

N \times 3 = (N-1) \times (2^2 - 1^2) square pieces of paper

Again, this is written in a suggestive way

The next tower is the one surrounding this last tower on two sides, there are five of them, of height (N-2) and the total number of pieces of paper in it is

 (N-2) \times 5 = (N-2) \times (3^2 - 2^2) square pieces of paper.

Yet again!

In general, the k^{th} skyscraper has height (N-k) and there are ((k+1)^2-k^2) of them, so the total number of square pieces of paper in it are

(N-k) \times ((k+1)^2 - k^2)

Note, for later use, that the term ((k+1)^2-k^2) derives from the difference in the total number of pieces of 1 \times 1 square paper that form a k \times k square vs. a  (k+1) \times (k+1) square.

Adding this up for the remaining, we are left with the total number of square  1 \times 1 pieces of paper, which is, indeed, S_N^2 = 1^2+2^2+3^2 +4^2 ...+N^2.

Writing this in summation notation

S_N^2 = \sum\limits_{k=0}^{N-1} (N-k) \times (2 k+1) 

which can be expanded into

S_N^2 = \sum\limits_{k=0}^{N-1} (2 N k + N - 2 k^2 - k)


S_N^2 = 2 N \frac {N (N-1)}{2} + N^2 - 2 S_{N-1}^2 - \frac{N(N-1)}{2}

Using our useful result from above

S_{N-1}^2 = S_N^2 - N^2

We find

S_N^2 = \frac {N(N+1)(2 N+1)}{6}

Aha – now we can generalize this!

We can continue this for the sum of the cubes of integers and so forth. Let’s start with 1 \times 1 \times 1 cubes that are placed, starting at the origin. Again, using the notation

S_N^3 = 1^3+2^3+3^3+...+N^3

Again, we note

S_N^3 = S_{N-1}^3 + N^3

Continuing in the same vein, alas, we cannot layer the cubes on top of each other in three dimensions! Let’s assume there is indeed a fourth dimension and count the heights in this dimension – see, physicists and mathematicians are naturally led to higher dimensions! The number of cubical pieces used are found by counting the numbers in the expanding “perimeter” of cubes, just as in the two-dimensional example.

N \times 1 = (N-0) \times ( 1^3 - 0^3)

(N-1) \times 8 = (N-1) \times (2^3 -1^3)

(N -2) \times 19 = (N-2) \times (3^3 - 2^3)

(N-3) \times 27 = (N-3) \times (4^3 - 3^3)

(N-k) \times ( (k+1)^3 - k^3)

So we are left with

S_N^3 = \sum\limits_{k=0}^{N-1} (N-k) \times (3 k^2+3 k+1)

Which results in the usual formula (using the auxiliary relation  S_{N-1}^3=S_N^3-N^3,

S_N^3= ((N(N+1))/2)^2

In general, using this approach, and the auxillary relation

S_N^L= S_{N-1}^L+ N^L

We find

S_N^L= \frac {1}{L+1} ( N^2 + L N^L  + \frac{N+1}{L+1} [  C(L+1,1) L S_{N-1}^1

+ C(L+1,2) (L-1) S_{N-1}^2 + C(L+1,3) (L-2) S_{N-1}^3

+ ... + C(L+1,L-1) (L - [L-2]) S_{N-1}^{L-1} ]    )

where  C(n,m) = \frac{n!}{m! (n-m)!} is the combinatorics formula for the number of ways to select m items out of n.

This formula, while original (as far as I can tell) in its derivation,  is consistent with Faulhaber’s formula from the 16th century.

A course correction – and let’s get started!

I have received some feedback from people that felt the posts were too technical. I am going to address this by constructing a simpler thread of posts on one topic that will start simpler and stay conceptual rather than  become technical.

I want to discuss the current state of Cosmology, given that it is possibly the field in the most flux these days. And the basic concept to understand in Cosmology is that of Cosmological Red Shift. So here goes…

The Cosmological Red Shift means this – when we look at far away galaxies, the light they emit is redder than we would expect. When you look at light of various colors, the redder the light, the longer its wavelength. Why would that be? Why would we perceive the light emitted by a galaxy to be redder than it should be?

To understand Cosmological Red Shift, you need to understand two things – the Doppler Shift and Time Dilation.

Let’s start with Doppler Shift.

If you listen to an ambulance approaching you on a road, then (hopefully, if it hasn’t come for you) speeding away from you on the road, you will hear the pitch, i.e., frequency, of the siren go up, then go down. Listen to this here .

Why does this happen?

Sound is a pressure wave. When the instrument producing a sound vibrates at a certain rate (frequency), it pushes and pulls on the air surrounding it. Those pushes and pulls are felt far away, because the fluctuations in density of the air propagate (see the video). The air isn’t actually going anywhere as a whole – this is why when you have waves in the ocean, the ocean isn’t actually sending all the water towards you, its just the disturbance coming towards you. So these pressure variations hit your ears and that’s how you hear something – the eardrum vibrates the little bones in the ear, which set up little waves in the cochlear fluid that then create electrical signals that go to your auditory cortex and voila, you hear!

Now, waves are characterized by wavelength (\lambda), frequency (\nu) and their speed (\bf{v}). There’s a relation between these three quantities

\bf{v} = \lambda \nu

Sal Khan has a nice video describing this formula in some detail. Let’s try and understand this – wavelength (\lambda) is the distance between the successive positive crests of the wave, frequency (\nu) is the number of crests shooting out of the emitter per second, then (\lambda \nu) is the length of wave coming out of the emitter per second as measured by the emitter. That’s how far the first crest traveled in one second, i.e., the speed of the wave.

Now what happens if the emitter is moving away from you – think of the pressure waves like compressions of a spring, as in the video link above. If the emitter is moving away, that’s like the spring being extended while it is vibrating – ergo, the wavelength is increased in proportion to how fast the emitter is running away from you (call the emitter’s speed v_{emitter}). The formula is

\lambda_{observed} - \lambda_{emitted} = \frac {v_{emitter}} {\bf{v}} \lambda_{emitted}

Aha – this makes sense, so the sound that I hear when an ambulance is driving away from me has a longer wavelength – so it has a lower frequency – it has a lower pitch. If the ambulance is driving towards me, so v_{emitter} is negative in the above formula, then we hear shorter wavelength sound, which has a higher frequency, i.e., a higher pitch.

As an example, if the emitter flies away as fast as the speed of sound in the air, then the observed wavelength should be \bf {double} the emitted wavelength. In the simple picture of the emitter shooting out wave crests at a rate \nu per second, the emitter shoots out one crest, then shoots out another crest after a time interval \frac {1}{\nu}, by which time it has moved a distance \frac {\bf {v}} {\nu} which is indeed one wavelength! So the distance between the crests in the eyes of the observer is twice the emitted wavelength.

Whew!  So that is the Doppler effect. If something is moving away from me, I will hear the sound it emits will seem to be of lower pitch. Since light is also a wave, if a galaxy were moving away from me, I should expect to see the light looks like that of lower frequency – i.e., it looks redder.

When we specialize this to the case of light, so we replace {\bf{v}} by c, the speed of light. There is an additional effect that we need to think of, for light.

Onwards – let’s think about  \rightarrow Time Dilation.

This needs some knowledge of Einstein’s ideas about Special Relativity. I am going to give you a lightning introduction, but not much detail. I might write a post with some details later, but there are excellent popular books on the subject. Several years before Einstein, the Scottish physicist James Maxwell discovered the equations of electromagnetism. People had discovered universal laws of nature before – for instance Issac Newton discovered the Law of Gravitation, but Maxwell’s equations had a puzzling feature. They included a constant which wasn’t a mass, length or time, but was a speed! Think of that. If there was a law of nature that included the speed of your favorite runner (say how quickly that dude in “Temple Run” runs from the apes), how strange that would be. How fast does someone run, you ask. Well, it depends on how fast the observer is going! You must have seen this on the highway.  When your car has a flat and you are standing, ruing your luck, on the side of the highway, you think the cars are zipping past you at 50, 60,…80 miles per hour. When you are in one of the cars, traveling at, say 50 miles per hour, the other cars are moving \bf {relative \hspace {2 mm} to \hspace{2 mm} you} at 0, 10,…30 miles per hour only. That’s natural. How can a physical law, a universal law of nature, depend on a speed! The world is bizarre indeed!

Einstein discovered exactly how bizarre. It turns out if you want the idea of a universal constant that is a speed (for light) to make sense, ALL observers need to agree on the actual speed of light, regardless of how fast they are traveling, along or against or perpendicular to the ray of light. For that to happen their clocks and rulers need to get screwed up, in just the right way to allow for this. Suppose you have two observers that are moving relative to each other, at a constant speed in some direction.  Einstein derived the exact equations that relate the coordinates (x,y,z,t) that the first observer assigns to a moving object to the coordinates (x',y',z',t') that the other observer ascribes to the same object. It’s high school algebra, as it turns out, but the relation implies, among other things that a moving clock ticks slower than a stationary clock, {\bf when \hspace{2 mm} the \hspace{2 mm} clocks \hspace{2 mm} are \hspace{2 mm} compared \hspace{2 mm} at \hspace{2 mm} the \hspace{2 mm} same \hspace{2 mm} point \hspace{2 mm} in \hspace{2 mm} space}.That, by the way is how the twin paradox sorts itself out – the twins have to meet at some point in order to compare their ages, so one has to turn his or her rocket around.

When you use the formulas of relativity, if the emitter is flying away at speed v_{emitter} relative to the observer, the emitter’s clock will seem to run slower than the observer’s clock (from the observer’s point of view). Since the frequency of the emitted wave essentially is a “clock” for both, we will obtain (and this needs a little algebra and some persistence!)

\nu_{observed} = \nu_{emitted} \sqrt{1 - (\frac{v_{emitter}}{c})^2}

Using our previous relation connecting frequency and wavelength, this means the wavelengths are related as below

\lambda_{observed} = \lambda_{emitted} \frac{1}{\sqrt{1 - (\frac{v_{emitter}}{c})^2}}

so when we combine the two effects – Doppler and Relativity, which operate on the same emitted light, but successively, we multiply the two effects, we get the final observed wavelength

\lambda_{observed} = \lambda_{emitted} \sqrt{\frac{1 + \frac{v_{emitter}}{c}}{1 - \frac{v_{emitter}}{c}}}

We see that if something is moving away from us, i.e., v_{emitter} is positive, the observed wavelength is longer than the emitted wavelength, i.e., it is red-shifted. If the moving object emits light of a certain color, the stationary observer of this light sees it to be redder than the emitted color. So here’s the upshot – if you observe light from some object that is redder than you’d expect from that object, one strong possibility is that it is receding away from you. That’s how modern cosmology got started!

A note about terminology; astronomers define a quantity called “red-shift” – denoted by the letter z to define this wavelength difference. It is defined as the relative change in wavelength

z = \frac{\lambda_{observed} - \lambda_{emitted}}{\lambda_{emitted}}

z is a “dimensionless” number – it is a ratio of two lengths. z=0 corresponds to you and me, things that are in the vicinity of each other. The moon isn’t receding away from us (if it is it is immeasurable), neither is our sun, so they all have a z=0. In fact, the entire Milky Way galaxy, our home, is at red-shift z = 0. We really have to leave the vicinity of our local group of large galaxies (that includes principally Andromeda and the Large and Small Magellanic clouds) to start seeing red-shifts exceeding 0. Conversely, the largest red-shifts we have seen are for distant quasars and intense galaxies – red-shift of about 11. Think of what that means – the 21 cm emission wavelength of neutral  hydrogen would be shifted by 232 cm – almost 7 feet! For people constructing prisms and other apparata for telescopes, this is a ridiculously (physically) large apparatus you need. More on this later!


A simple connection between Entropy and Information

This article follows a simple example laid out by Jaynes (1996).
Jaynes’ example is one that shows how one’s computation of the change in entropy in a physical / chemical process depends on the precise variables that one uses to label the macro state. If you use different variables (say you are insensitive to properties that someone else does have the ability to measure) you can invent situations where the entropy change might (to one observer) be contrived to violate the Second Law of Thermodynamics while the other observer sees no such violation.
Let’s consider a vessel whose volume is V, with a diaphragm that separates it into two parts – volumes V_1 and V_2 with N_1 and N_1 molecules of each type, respectively. The “1” side is filled with Argon gas of type A_1 which is indistinguishable from the type of Argon gas filling side “2”, which we call type A_2, at least for the observer named Babu. However, Alisha, with her access to superior technology is indeed able to perceive the difference between the two types of Argon. The container is in close thermal contact with a heat bath that maintains a constant temperature T. The equilibrium condition (same temperature throughout and equal pressure) implies that n_1/V_1 =n_2/V_2 =(n_1+n_2)/V.
Alisha, in addition to her ability to notice the difference between A_1 and A_2 also has sole access to a material called Whiffnium, which is permeable to A_1 but impervious to A_2. During the course of her research, she has also discovered Whaffnium, a new material which is permeable to A_2, but impervious to A_1. Let’s suppose that Alisha constructs two (infinitesimally thin) pistons that are initially placed very close to each other, one piston made of Whiffnium and the other of Whaffnium, as in the picture below.


Let’s suppose that just enough of A_1 permeates through the Whiffnium so that the partial pressure of A_1 is the same in the left side of the container as well as the intercalated region (between the Whiffnium and Whaffnium pistons). Similarly, let’s assume that just enough of A_2 permeates through the Whaffnium into the intercalated region (between the pistons) so that the partial pressure of A_2 is the same in the intercalated region as well as on the right side of the container. Now, due to the unbalanced pressure of A_2 impinging upon the Whiffnium piston, it is reversibly moved to the left and the entropy change in A_2 is
 = n_2 k_B ln(V/V_2)
Similarly, the Whaffnium piston is reversibly moved to the right and the entropy change in A_1 is
= n_1 k_B ln(V/V_1)
The total entropy change is hence
n_1 k_B ln(V/V_1 )+n_2 k_B ln(V/V_2 )
All this is pretty logical from Alisha’s point of view, since she does see the two parts of the container as having different materials, A_1 and A_2. She understands that the entropy change in the container is a consequence of the heat flowing into the system from the heat bath.
However, Babu sees a conundrum. He sees the argon as one undifferentiated gas and so the initial and final states of the system are identical. However, the system has absorbed an amount of heat and converted all of it into work, in violation of the Second Law of Thermodynamics. In addition, he sees the entropy change as 0. This is, however, simply a reflection of the fact that the entropy is a function of the macrostate variables that one uses and if Babu has an insufficiently specified macrostate, then Alisha is simply able to manipulate phenomena to cause Babu to think he has observed a violation of the Second Law.
How much information would Babu need to in order to deduce the correct entropy change? In the initial state, if he knew about the sub-identities A_1 and A_2, the macrostate where A_1 is on the left and A_2 is on the right (side of the container) has the following number of (equally probable) microstates
=(V_1/v)^(n_1 ) (V_2/v)^(n_2 )
Where we have assumed that each molecule can be localized to a minimum volume v and we can do this for all the n_1 molecules in V_1 and the n_2 molecules in V_2.
In the final state, all the n_1+n_2 molecules are strewn about in the total volume V and the total number of microstates is
= (V/v)^(n_1+n_2)
So to specify the microstate, he needs to be communicated extra information (along the lines of traditional information theory)

I= \log_2((\frac{V}{v})^{n_1+n_2} ) - \log_2 ( (\frac{V_1}{v})^{n_1} (\frac{V_2}{v})^{n_2} )

= n_1 \log_2(V/V_1)+n_2 \log_2(V/V_2 )
Which is exactly (up to a multiplicative factor of k_B and difference between \ln and \log, just the same as the entropy change to separate the molecules into the different varieties.
Note that if Babu didn’t actually meet Alisha and had no idea that there were two varieties of Argon, his calculation for the number of microstates before and after would be identical, equal to (V/v)^(n_1+n_2 ) – this is because he doesn’t even think the diaphragm separating the two sides of the container is even necessary – they are the same materials and are in thermodynamic equilibrium with each other.
However, once Babu has this information, in the form of a detailed message, he will have been supplied with enough information to deduce completely (as far as Alisha’s abilities admit) to the situation with the two varieties, where he had zero before. Ergo, the extra information he needs is the entropy difference.

Here’s an alternative history of how quantum mechanics came about…

Quantum Mechanics was the result of analysis of experiments that explored the emission and absorption spectra of various atoms and molecules. Once the electron and proton were discovered, very soon after the discovery of radioactivity, it was theorized that the atom was an electrically neutral combination of protons and electrons. Since it isn’t possible for a static arrangement of protons and electrons to be stable (a theorem in classical electromagnetism), the plum-pudding model of JJ Thomson was rejected in favor of one where the electrons orbited a central, heavy nucleus. However, it is well – known from classical electromagnetism that if an electron is accelerated, which is what happens when it revolves around the positively charged nucleus, it should radiate energy through electromagnetic radiation and quickly collapse into the nucleus.

The spectra of atoms and molecules were even more peculiar – there were observed an infinite number of specific spectral lines and no lines were observed at in-between frequencies. Clearly, the systems needed specific amounts of energy to be excited from one state to another and there wasn’t really a continuum of possible states from the ground state (or zero energy state) to high energies. In addition, the ground state of the hydrogen atom, for instance, seemed to have a specific energy, the ionization energy of the single electron in the atom that was specific to the hydrogen atom and could not be calculated from known parameters in any easy way. The relation between the lines was recognized by Rydberg – the frequency of the radiation emitted in various transitions in hydrogen was proportional to the difference of reciprocals of squares of small natural numbers.

Anyway, starting from an energy function (Hamiltonian) of the kind

H = \frac{\vec{p}^2}{2m} + V(\vec{r}) \hspace{3 mm} V(\vec{r}) = - \frac{e^2}{r}

For the single electron interacting with a heavy nucleus, we recover only the classical continuum of several possible solutions for the hydrogen atom, even neglecting the radiation of energy by the continuously accelerating electron.

We can state the conundrum as follows. We use the Energy function above, solve the classical problem and find that energy can take a continuum of values from some minimum negative number to infinity. In the lab, we find that Energy takes only a discrete infinity of values.

Let’s make a connection to matrices and operators. Matrices are mathematical objects that have discrete eigenvalues. Can we interpret H as a matrix of some sort and have the discrete energy values of the atom be eigenvalues of the matrix? In that case, there would be eigenvectors corresponding to those eigenvalues, let’s notate them as |E_i>, with eigenvalue E_i for the  i^{th} energy level. If  H were a matrix, so would  x and p , since otherwise, we wouldn’t be able to make sense of the definition of  H otherwise. We’d like to make the above definition of the energy in terms of the position and momentum variables since it allows us to guess at quantum theories for other systems in the future – to some extent while this approach is arbitrary, it is an example of conservative-radicalism (phrase I learned from a talk by Nima Arkani-Hamed); it’s also called the quantization prescription.

Now, if   x and p were to be matrices, could they have the same eigenvectors, presumably the same eigenvectors as  H? This could mean that they would need to be commuting matrices. Well, they can’t, otherwise, we’d be back to the same classical solution as before – if   x and p had the same eigenvectors, and then  H would just have the same eigenvectors and we would be stuck the same continuum of energy levels we had in the classical problem. So we are stuck with the situation that the eigenvectors of x and p  and indeed H , label them as |x> , |p> and  |E_i> can’t be the same – they stick out in different directions in the abstract state space of state vectors. The state vectors for H, i.e., the    |E_i> are some linear combinations of the  |x>‘s or the |p>’s,  assuming the  |x> and  |p> are each an orthogonal complete set of vectors that span the abstract state space.

This leads us to the second realization, i.e., if we make this assumption that the eigenvectors |x> , |p> and  |E_i> stick out in different directions in the state space and are a complete, orthogonal set, then we are only able to specify the state of the system by giving their components to  |x> or to  |p> or to  |E_i>, unlike in classical physics, where   x and p are both needed to specify completely the state of the system.

What is the physical significance of dot products such as  <x|E_i>  and <p|E_i>. These might be complex numbers – does the magnitude and phase denote specific physical quantities that can be measured? When we study the meaning of a dot product such as <x|x'>  , which should be zero unless  x = x' and should yield 1 when integrated over the entire set of  states, and given that x is a continuous variable,

<x|x'> = \delta(x - x')

This is akin to the probability density that a particle in state  can be found in the state . The implication is that the magnitude of the dot product has physical meaning. Later, in an inspired leap of imagination,  Max Born realized that we need to interpret the square of the magnitude as the quantity with physical meaning – the probability density.

What is the dot product of |x> and |p> 

Let’s start with some definitions, based on our simple minded notion that these variables need to be represented as matrices with eigenvectors.

x |x'> = x' |x'>

p|p'> = p'|p'>

The dot product is represented by <x|p>

Now this must be a function purely of  x and p . Hence

<x|p> = f(x,p>

We expect translational invariance in physics in our physically relevant quantities and |<x|p>| is (by the argument in the last paragraph)  a physically relevant quantity – related to the probability density that a particle in state |p>  is in the state |x>.

Let’s take the dot product |p> of with the vector |x=0>. This must be, from the above

<x=0|p> = f(0,p>

Now, if the origin of coordinates were moved by A, i.e.,

x \rightarrow x+A

We don’t expect there to be a physical change in the dot product, it should not care about where the origin of coordinates is, up to a factor of magnitude unity. This means

f(x+A,p) = f(x,p) e^{i \Phi(x,A,p)}

In addition,

f(A,p) = f(0,p) e^{i \Phi(0,A,p)}

The simplest choice of function that has this property is (up to some units)

f(x,p) =e^{i \alpha p x + iC}

Where  is an arbitrary constant, which we can choose to be 0 and  \alpha is a quantity that makes the dimensions come out right in the exponent (need to have all the dimensions cancelled out).

Since you also have

<x'| p |p'> = p' e^{i \alpha p' x'}

The above expression allows us to make the identification

<x'| p |p'> = - \frac {i}{\alpha} \frac{\partial}{\partial x'} <x'|p'>

So, the matrix  p can be identified, in the space spanned by the eigenvectors of x, as

p \equiv  - \frac {i}{\alpha} \frac{\partial}{\partial x}

Now, suppose the eigenvectors of the  H matrix are the |E_i> , so we have

<x'|H|E_i> = <x'|  \frac{\vec{p}^2}{2m} + V(\vec{r}) |E_i>

= \left(  - \frac {1}{2 m \alpha^2} \frac {\partial^2}{\partial x^{'2}} + V(x') \right) <x'|E_i> = E_i |E_i>

This is Schrodinger’s equation, if we make the interpretation \alpha \equiv \frac {1}{\hbar}

Apart from the mental leap to make from treating x, p as a continuous set of variables to treating them as matrices (apparently that was considered higher mathematics in the early 1920s), the flow seems pretty straightforward.


To see Nima Arkani-Hamed talk about the phrase “conservative-radicalism” and other interesting topics, see the YouTube video here.

Another physics blog? Why?

I graduated in 1993 with a physics Ph.D and after a short post-doc, went off to work as an options trader and “quant”. After 22 years in that area, I realized that physics is actually back to being interesting again – 22 years ago, people were saying that fundamental physics was all done, with the invention of string theory and that other people would be shortly out with a numerical computation of the mass of the electron from fundamental principles (some combination of π,γ (Euler’s constant) and some simple integral of a dimensionless variety). None of this has come to pass. The discovery of accelerating expansion of the universe, dark matter, dark energy, the realization that gravity might be the key to many of these puzzles – have left us (possibly!) at square one. We might be at the dawn of a new paradigm shift, which might happen tomorrow with some unexpected discovery either by a satellite or at a particle collider, or might happen two hundred years from now. Either way, we are all at sea, maybe not knowing what happened in the intervening years is better.

Anyway, with the ability to pursue interesting projects, I decided last year to get back to physics. Twelve months later, having attended three physics schools aimed at graduate students and post-docs, as well as working on some courses offered by one of the physics stars at Rutgers University, I feel that I am close to finding something to work on.

As I learn new things, I will post interesting ideas that I am playing with here. I don’t have the all-round expertise of a Matthew Strassler (https://profmattstrassler.com/), the sheer genius of a Lubos Motl (http://motls.blogspot.com/), the current up-to-date-ness of Sabine Hossenfelder (http://backreaction.blogspot.com/) or the point of view of Peter Woit (http://www.math.columbia.edu/~woit/wordpress/). There is still room for the Simply Curious.

If you want to discuss stuff in a peaceable, civilised way, feel free to post. If you are angry at something, look elsewhere. And if you want to inform me about aliens in pre-history, I am simply not interested.