A digression on statistics and a party with Ms. Fermi-Dirac and Mr. Bose – Post #5

Posted on Updated on

To explain the next standard candle, I need to digress a little into the math of statistics of lots of particles. The most basic kind is the statistics of distinguishable particles.

Consider the following scenario. You've organized a birthday party for a lot of different looking kids (no twins, triplets, quadruplets, quintuplets …). Each kid has equal access to a large pot of M&Ms in the center of the room. Each kid can grab M&Ms from the pot and additionally when they bounce off each other while playing, can exchange a few M&Ms with each other. After a long while, you notice that all the M&Ms in the central pot are gone.  Let's suppose there are truly a {\underline {large}} number of kids (K) and a truly {\underline {humongous}} number of M&Ms (N).

Interesting question – how many M&Ms is each kid likely to have? A simpler question might be – how many kids likely have 1 M&M? How many likely have 2?..How many likely have 55?… How many likely have 5,656,005?…

How do we answer this question?

If you use the notation n_i for the number of kids that have i M&Ms?, then, we can easily write down

\sum\limits_{i} n_i  = K

is the total number of kids.

\sum\limits_i i n_i = N

is the total number of M&Ms.

But that isn't enough to tell! We need some additional method to find the most likely distribution of M&Ms (clearly this wouldn't work if I were there; I would have all of them and the kids would be looking at the mean dad that took the pot home, but that's for a different post). The result, that Ludwig Boltzmann discovered, at the end of the 19th century, was {\bf not} simply the one where everybody has an equal number of M&Ms. The most likely distribution is the one with the most number of possible ways to exchange the roles of the kids and still have the same distribution. In other words, maximize the combinatoric number of ways

{\it \Omega} = \frac {K!} {n_1! n_2! n_3! ...n_{5,005,677}! ...}

which is the way of distributing these kids so that n_1 have 1 M&M, n_2 have 2 M&Ms, n_3 have 3 M&Ms…, n_{5,005,677} have 5,005,677 M&Ms and so on.

Boltzmann had a nervous breakdown a little after he invented the statistical mechanics, which is this method and its consequences, so don't worry if you feel a little ringing in your ears. It will shortly grow in loudness!

How do we maximize this {\it \Omega}?

The simplest thing to do is to maximize the logarithm of {\it \Omega}, which means we maximize

\log \Omega = \log K! - \sum\limits_{i} \log n_i!

but we have to satisfy the constraints

\sum\limits_{i} n_i  = K, \hspace{5 mm} \sum\limits_i i n_i = N

The solution (a little algebra is required here) is that n_i \propto e^{-\beta i} where \beta is some constant for this 'ere party. For historical reasons and since these techniques were initially used to describe the behavior of gases, it is called the inverse temperature. I much prefer "inverse gluttony" – the lower \beta is, the larger the number of kids with a lot of M&Ms.

Instead of the quantity i, which is the number of M&Ms the children have, if we considered \epsilon_i, which is (say) the dollar value of the i M&Ms, then the corresponding number of kids with "value" \: \epsilon_i is n_i \propto e^{-\beta \epsilon_i}

Few kids have a lot of M&Ms, many have very few – so there you go, Socialists, doesn't look like Nature prefers the equal distribution of M&Ms either.

If you thought of these kids as particles in a gas and \epsilon_i as one of the possible energy levels ("number of M&Ms") the particles could have, then the fraction of particles that have energy \epsilon_i would be

n(\epsilon_i) \propto e^{- \beta \epsilon_i}

This distribution of particles into energy levels is called the Boltzmann distribution (or the Boltzmann rule). The essential insight is that for several {\bf distinguishable} particles the probability that a particular particle is in a state of energy \epsilon is proportional to e^{-\beta \epsilon}.

After Boltzmann discovered this, the situation was static till the early 1920s when people started discovering particles in nature that were {\bf indistinguishable}. It is a fascinating fact of nature that every photon or electron or muon or tau particle is exactly identical to every other photon or electron or muon or tau particle (respectively and for all other sub-atomic particles too). While this fact isn't "explained" by quantum field theory, it is used in the construction of our theories of nature.

Back to our party analogy.

Suppose, instead of a wide variety of kids, you invited the largest K-tuplet the world has ever seen. K kids that {\bf ALL} look identical. They all have the same parents (pity them {\bf please}), but hopefully were born in some physically possible way, like test-tubes. You cannot tell the kids apart, so if one of them has 10 M&Ms, its indistinguishable from any of the kids having 10 M&Ms.

Now what's the distribution of the number of kids n_i with \epsilon_i value in M&Ms? The argument I am going to present is one I personally have heard from Lubos Motl's blog (I wouldn't be surprised if its more widely available, though given the age of the field) and it is a really cute one.

There are a couple of possibilities.

Suppose there was a funny rule (made up by Ms. Fermi-Dirac, a well known and strict party host) that said that there could be at most 1 kid that had, say \epsilon_i value in M&Ms (for every i). Suppose P_0(\epsilon_i) were the probability that {\underline {no}} kid had \epsilon_i of value in M&Ms. Then the probability that that 1 kid has \epsilon_i of value in M&Ms is P_0 e^{-\beta \epsilon_i} – remember the Boltzmann rule! Now if no other possibility is allowed (and if one kid has i M&Ms, it is indistinguishable from any of the other kids, so you can't ask which one has that many M&Ms)

P_0(\epsilon_i) + P_0(\epsilon_i) e^{-\beta \epsilon_i} = 1

since there are only two possibilities, the sum of the probabilities has to be 1.

This implies

P_0(\epsilon_i) = \frac {1}{1 + e^{-\beta \epsilon_i}}

And we can find the probability of there being 1 kid with value \epsilon_i in M&Ms. It would be

P_1({\epsilon_i}) = 1 - P_0({\epsilon_i}) =  \frac {e^{-\beta \epsilon_i}}{1 + e^{-\beta \epsilon_i}}

The expected number of kids with value \epsilon_i in M&Ms would be

{\bar{\bf n}}(\epsilon_i) = 0 P_0(\epsilon_i) + 1 P_1({\epsilon_i}) = {\bf \frac {1}{e^{\beta \epsilon_i}+1} }

But we could also invite the fun-loving Mr. Bose to run the party. He has no rules! Take as much as you want!

Now, with the same notation as before, again keeping in mind that we cannot distinguish between the particles,

P_0(\epsilon_i) + P_0(\epsilon_i) e^{-\beta \epsilon_i} + P_0(\epsilon_i) e^{-2 \beta \epsilon_i} + .... = 1

which is an infinite (geometric) series. The sum is

\frac {P_0(\epsilon_i) }{1 - e^{-\beta \epsilon_i} } = 1

which is solved by

P_0(\epsilon_i) = 1 - e^{-\beta \epsilon_i}

The expected number of kids with value \epsilon_i in M&Ms is

{\bar{\bf n}}(\epsilon_i) = 0 P_0(\epsilon_i) + 1 P_0(\epsilon_i) e^{-\beta \epsilon_i} + 2 P_0(\epsilon_i) e^{-2 \beta \epsilon_i} + ...

which is

{\bar{n}}(\epsilon_i) = P_0(\epsilon_i) \frac {e^{-\beta \epsilon_i} } {(1 - e^{-\beta \epsilon_i})^2} = {\bf \frac {1}{e^{\beta \epsilon_i} -1}}

Now, here's a logical question. If you followed the argument above, you could ask this -could we perhaps have a slightly less strict host, say Ms. Fermi-Dirac-Bose-2 that allows up to 2 kids to possess a number of M&Ms whose value is \epsilon_i? How about a general number N kids that are allowed to possess M&Ms of value \epsilon_i (the host being the even more generous Ms. Fermi-Dirac-Bose-N). More about this on a different thread. But the above kinds of statistics are the only ones Nature seems to allow in our 4 – dimensional world (three space and one time). Far more are allowed in 3 – dimensional worlds (two space and one time) and that will also be in a different post (the sheer number of connections one can come up with is fantastic!).

The thing to understand is that particles that obey Fermi-Dirac statistics (a maximum of one particle in every energy state) have a "repulsion" for each other – they don't want to be in the same state as another Fermi-Dirac particle, because Nature forces them to obey Fermi-Dirac statistics.  If the states were characterized by position in a box, they would want to stay apart. This leads to a kind of outwards pressure. This pressure (described in the next post) is called Fermi-degeneracy pressure – its what keeps a peculiar kind of dense star called a white dwarf from collapsing onto itself. However, beyond a certain limit of mass (called the Chandrasekhar limit after the scientist that discovered it), the pressure isn't enough and the star collapses on itself – leading to a colossal explosion.

These explosions are the next kind of "standard candle".

Cosmology: Cepheid Variables – or why Henrietta couldn’t Leavitt alone …(Post #4)

Posted on Updated on

Having exhausted the measurement capabilities for small angles, to proceed further, scientists really needed to use the one thing galaxies and stars put out in plenty – light. The trouble is, to do so, we either need detailed, correct theories of galaxy and star life-cycles (so we know when they are dim or bright) or we need a “standard candle”. That term needs explanation.

If I told you to estimate how far away a bulb was, you could probably make an estimate based on how bright the bulb seemed. For this you need two things. You need to know how bright the bulb is {\bf intrinsically} – this is the absolute luminosity and its measured in watts which is Joules \: per \: second. Remember, however, that a 100 watt bulb right next to you appears brighter (and hotter) than the same 100 watt bulb ten miles away! To account for that, you could use the fact that the bulb distributes its light almost uniformly into a sphere around itself, to compute what fraction of the light energy you are actually able to intercept – we might have a patch of CCD (like the little sensor inside your video camera), of area A capturing the light emitted by the bulb. Putting these together, as in the figure below, the amount of light captured is I_{Apparent} watts while the bulb puts out I_{Intrinsic} watts.

Luminosity Falls Off

I_{Apparent} = I_{Intrinsic} \frac{CCD \: Area}{Sphere \: Surface \: Area}

I_{Apparent} = A \frac {I_{Intrinsic}}{4 \pi R^2}

where if you dig into your memory, you should recall that the area of a sphere of radius R is 4 \pi R^2!

you can compute R

R = \sqrt{A \frac {I_{Intrinsic}}{4 \pi I_{Apparent}}}

You know how big your video camera’s sensor area is (it is in that manual that you almost threw away!) You know how much energy you are picking up every second (the apparent luminosity) – you’d need to buy a multimeter from Radio Shack for that (if you can find one now). But to actually compute the distance, you need to know the {\bf Intrinsic} or {\bf actual} luminosity of the light source!

That’s the problem! To do this, we need a set of “standard candles” (a light source of known actual luminosity in watts!) distributed around the universe. In fact the story of cosmology really revolves around the story of standard candles.

The first “standard candles” could well be the stars. If you assume you know how far away the Sun is, and if you assume other stars are just like our Sun, then you could make the first estimates of the size of the Universe.

We already know that the method of parallax could be used with the naked eye to calculate the distance to the moon. Hipparchus calculated that distance to be 59 earth radii. Aristarchus measured the distance to the sun (the method is a tour de force of elementary trigonometry and I will point to a picture here as an exercise!)

Aristarchus Figures out the Distance to the Sun

His calculation of the Earth-Sun distance was only 5 million miles, a fine example of a large experimental error – the one angle he had to measure was \alpha, he got wrong by a factor of 20. Of course, he was wise – he would have been blinded if he had tried to be very accurate and look at the sun’s geometric center!

Then, if you blindly used this estimate and ventured bravely on to calculate distances to other stars based on their apparent brightness relative to the sun, the results were startlingly large (and of course, still too small!) and people knew this as early as 200 B.C. The history of the world might have well been different if people had taken these observers seriously. It was quite a while and not till the Renaissance in Europe that quantitative techniques were re-discovered for distance measurements to the stars.

The problem with the technique of using the Sun as a “standard candle” is that stars differ quite a bit in their luminosity based on their composition, their size, their age and so on. The classification of stars and the description of their life-cycle was completed with the Hertzsprung-Russell diagram in 1910. In addition, the newly discovered nebulae had been resolved into millions of stars, so it wasn’t clear there was a simple way to think of stellar “standard candles” unless someone had a better idea of the size of these stellar clusters. However, some of the nearby galaxy companions of the Milky Way could have their distances estimated approximately (the Magellanic Cloud, for instance).

Enter Henrietta Leavitt. Her story is moving and representative of her time, from her Radcliffe college education to her $0.30 / hour salary for her work studying variable stars (she was a human computer for her academic boss), as well as the parsimonious recognition for her work while she was alive. She independently discovered that a class of variable stars called Cepheids in the Magellanic clouds appeared to have a universal connection between their intrinsic luminosity and the time period of their brightness oscillation. Here’s a typical graph (Cepheids are much brighter than the Sun and can be observed separately in many galaxies)


If you inverted the graph, you simply had to observe a Cepheid variable’s period to determine the absolute luminosity. Voila! You had a standard candle.

A little blip occurred in 1940, when Walter Baade discovered that Cepheids in the wings of  the Andromeda galaxy were older stars (called Population II, compared to the earlier ones that are now referred to as Population I) and were in general dimmer than Population I Cepheids.  When the Luminosity vs. Period graph was drawn for those, it implied the galaxy they were in was actually even further away! The size of the universe quadrupled (as it turned out) overnight!

Henrietta Leavitt invented the first reliable light-based distance measurement method for galaxies. Edwin Hubble and Milton Humason used data collected mainly from an analysis of Cepheids to derive the equation now known as Hubble’s law.

Next post will be about something called Olbers’ paradox before we start studying the expansion of the Universe, the Cosmic Microwave background and the current belief that we constitute just 4% of the universe  – the rest being invisible to us and not (as far as we can tell) interacting with us.










Cosmology: Distance Measurements – Parallax (Post #3)

Posted on Updated on

This post describes the cool methods people use to figure out how far away stars and galaxies are. Figuring out how far away your friend lives is easy – you walk or drive at a constant speed in a straight line from your home to their house – then once you know how much time this took, you multiply speed times the time of travel to get the distance to your friend’s house.

This might seem like an excessively detailed description of a simple task, but don’t forget that the ancients would have difficulty with several things here – how do you travel at a constant speed and how do you measure time of travel? The first seems like a possible task, but how do you measure time ? Humans have invented many ways to measure time – water clocks (reported in Greece, China and India), sand clocks, burning knotted ropes. The Antikythera mechanism, if confirmed to be an astronomical device, would be similar to a true mechanical clock, but it took the ability to work metal and the Industrial Revolution to reliably mass-produce clocks.

This was the most effective way to measure distances for many years; just travel there and keep notes!

The heavenly object closest to us appears to be the moon. Very early, to some extent by Aristarchus, but really by Edmund Halley (whose comet is more famous than he is), it was realized that Parallax could be used to figure distance to far away objects, without actually traveling there. Parallax is illustrated below – its the perceived angular shift in an object’s position relative to far-away things when you shift your viewing position. You experience this all the time when you see nearby things shift when you look from one eye, then the other.


The diagram above is a little busy, so let me explain it. L is the distance that we are trying to measure, between the Earth (where the fellow with the telescope is) and Saturn. R is the distance to the starry background, that is {\bf really} far away. Since R is {\bf much} bigger than L,  you should be able to convince yourself that the angles \alpha and \beta are very close to each other. From basic geometry, to a good approximation

D = \alpha L

which means L = \frac {D}{\alpha}. We just need to compute  \alpha, but it is roughly equal to \beta. \beta is the just the angular separation of the stars P and Q, which you could measure with, for instance, a sextant.

We know D, which is the baseline of the measurement. If you use your two eyes, it is a few inches. You could get ambitious and make measurements in summer and winter, when the baseline would be the diameter of the Earth’s orbit (OK, the orbit is very nearly a circle). The result is that you can figure out how far away Saturn is by computing the perceived angular shift.

The farther away something is, the smaller the perceived angular shift. For a long time, people could not measure angular shifts for really distant objects and made the assumption that the method was wrong for some reason, for they couldn’t believe stars could be that far away.

The state of the art in the parallax measurement was the Hipparcos satellite and is currently the Gaia satellite (as well as Hubble). Distances upto 30,000 light years are capable of being measured. For reference, we think Andromeda galaxy is 2.5 million light years away and the Milky Way’s dark matter halo extends out to 180,000 light years. So to measure out to these distances needs different techniques, which will be discussed in the next post.

Cosmology and the Expanding Universe ..(Post #2)

Posted on Updated on

The previous post discussed what Cosmological Red Shift is (and we defined z, the red-shift parameter). The saga of cosmology begins with general speculations for thousands of years about what those points of light in the sky really were. The construction of the first telescope around 1608, followed by visual explorations (by people like Galileo) of the Moon, Venus, Jupiter, Saturn and their moons led to the increasing certainty that the heavens were made of the same materials as those found on the earth. By the way,  it is indeed surprising (as you will see) that to some extent, cosmology has come full circle – it seems to appear that the heavens might be composed of different “stuff” than us on Earth.

Anyway, as I alluded to in the first post, the first mystery of modern cosmology was discovered in the light from distant galaxies.  If we make the entirely reasonable assumption that those galaxies were composed of stars like our sun, the light from those stars should be similar in composition (the mix of colors etc) to the light from our sun. Of course, it was entirely reasonable to expect that some of those stars might be smaller/bigger/younger/older than our sun, so if you had a good idea of how stars produced their light, you could figure out what the light should look like. Now in the 1910’s, 1920’s, 1930s, which is the era we are talking about, people didn’t really understand nuclear fusion, so there was some speculation going on about what made the stars shine. However, one thing was clear – stars contain lots of hydrogen, so we should be able to see the colors (the wavelengths) typical of emission from hot hydrogen atoms. Vesto Slipher was the first to note that the light emitted from the hydrogen (and some other light elements) in the stars in distant galaxies appeared to be red-shifted, i.e., to be redder than expected. This was puzzling, if you expected that hydrogen and other elements had the same properties as that on the Earth. The most sensible explanation was that this was an indication the galaxies were receding away from the earth. Edwin Hubble did some more work and discovered the famous correlation, now known as Hubble’s Law – the more distant a galaxy, the faster it seemed to be receding away from us. If {\bf V_{recession}} is the recession speed of a far-away galaxy, D is how far away it is and {\it H_0} is Hubble’s constant,

{\bf V_{recession}} = {\it H_0} D

Hubble’s constant is currently believed to be around 70 \frac {km/sec}{MegaParsec}. A MegaParsec is a million parsecs – a parsec is a convenient distance unit in cosmology and is roughly 3.26 light years. To interpret the formula, if a galaxy were 1 MegaParsec away, it would be rushing away from us at 70 km/sec . In terms of miles, 1 MegaParsec is 19 million trillion miles.

The story of how Edwin Hubble and others discovered how {\bf far} away the galaxies are (the right side of this equation) is interesting in its own right and features people such as Henrietta Leavitt. This will be the subject of my next post. Probably the best discussion of this is by Isaac Asimov, in a book called “The Guide to  Science – Physical Sciences”.

Getting back to our discussion, we don’t think we are somehow specially located in the Universe. This, by the way, was a philosophical principle that really traces back to the Copernican idea that the Earth wasn’t the Center of the Solar System. If we aren’t in some special place in the Universe, and if we see the galaxies receding away from us, it must be that ALL galaxies are receding from each other with a relative speed proportional to their mutual distance.

Thus was born the theory of the Expanding Universe.

One way to think of the Expanding Universe is to think of a rubber sheet, that is being stretched from all sides. Think of a coordinate system drawn on this rubber sheet, with the coordinates actually marked 1,2,3 .... The actual distance between points on the sheet is then, not just the coordinate difference, but a “scale factor” times the coordinate difference. This “scale factor”, which is usually referred to as a in cosmological discussions, is usually assumed to be the same number for all points in space at the same point of time in the Universe’s life.


In this picture – the grid spacing is 1 as the Universe expands. However, the distance between the grid points is a times the grid spacing of 1. In the picture, a is initially 1, but it increases to 4 as the expansion continues.

Next post, after I talk about distance measurements in the Universe, I’ll discuss the ideas of homogeneity and isotropy – two important concepts  that we use when studying the Universe.



A simple sum

Posted on Updated on

This calculation was inspired, a few years ago, by trying to find a simple way to explain the sum of the first N natural numbers to my (then) twelve-year-old daughter, without the use of calculus. As many people know, the sum of the first N natural numbers is found very easily, using the method that Gauss (apparently) re-discovered as a two-year old, i.e.,

S_N^1 = 1 + 2 + 3 + ... + N



Adding the above two equations

2 (S^1_N) = N(N+1)


S^l_N = \frac{N(N+1)}{2}

Of course, this method cannot be used to sum up the series with squares or higher powers.

Let’s however, continue to use the convenient notation for the sum of squares

S_N^2 = 1^2+2^2+3^2+...+N^2

There’s a useful recursion relation between S_N^2 and S^2_{N-1}

S_N^2= S_{N-1}^2+N^2

Let’s imagine the following – say you have 1 \times 1 (one-unit by one-unit) squares of fixed height cut out of paper. Suppose you arrange them first as below

{\bf Layer 0}


there are 1^2 pieces of paper here

{\bf Layer 1}


there are 2^2 pieces of paper here

And another layer, {\bf Layer 2}


there are 3^2 pieces of paper here

Let’s place the pieces of paper so that Layer 0 is at the bottom, Layer 1 is next on top of it, Layer 2 is on top of that and so on.

Let’s compute the heights of the “skyscraper” that results!

The highest tower is the one on top of the square with vertices (x=0, y=0), (x=1, y=0), (x=0,y=1) and (x=1,y=1). It has height N and the total number of pieces of paper in it are

N \times 1 = (N - 0) \times (1^2 - 0^2) square pieces of paper.

I’ve just written this in a suggestive way.

The next tower is the one just surrounding this one on two sides, there are actually three towers, of height  (N-1) and the total number of pieces of paper in it is

N \times 3 = (N-1) \times (2^2 - 1^2) square pieces of paper

Again, this is written in a suggestive way

The next tower is the one surrounding this last tower on two sides, there are five of them, of height (N-2) and the total number of pieces of paper in it is

 (N-2) \times 5 = (N-2) \times (3^2 - 2^2) square pieces of paper.

Yet again!

In general, the k^{th} skyscraper has height (N-k) and there are ((k+1)^2-k^2) of them, so the total number of square pieces of paper in it are

(N-k) \times ((k+1)^2 - k^2)

Note, for later use, that the term ((k+1)^2-k^2) derives from the difference in the total number of pieces of 1 \times 1 square paper that form a k \times k square vs. a  (k+1) \times (k+1) square.

Adding this up for the remaining, we are left with the total number of square  1 \times 1 pieces of paper, which is, indeed, S_N^2 = 1^2+2^2+3^2 +4^2 ...+N^2.

Writing this in summation notation

S_N^2 = \sum\limits_{k=0}^{N-1} (N-k) \times (2 k+1) 

which can be expanded into

S_N^2 = \sum\limits_{k=0}^{N-1} (2 N k + N - 2 k^2 - k)


S_N^2 = 2 N \frac {N (N-1)}{2} + N^2 - 2 S_{N-1}^2 - \frac{N(N-1)}{2}

Using our useful result from above

S_{N-1}^2 = S_N^2 - N^2

We find

S_N^2 = \frac {N(N+1)(2 N+1)}{6}

Aha – now we can generalize this!

We can continue this for the sum of the cubes of integers and so forth. Let’s start with 1 \times 1 \times 1 cubes that are placed, starting at the origin. Again, using the notation

S_N^3 = 1^3+2^3+3^3+...+N^3

Again, we note

S_N^3 = S_{N-1}^3 + N^3

Continuing in the same vein, alas, we cannot layer the cubes on top of each other in three dimensions! Let’s assume there is indeed a fourth dimension and count the heights in this dimension – see, physicists and mathematicians are naturally led to higher dimensions! The number of cubical pieces used are found by counting the numbers in the expanding “perimeter” of cubes, just as in the two-dimensional example.

N \times 1 = (N-0) \times ( 1^3 - 0^3)

(N-1) \times 8 = (N-1) \times (2^3 -1^3)

(N -2) \times 19 = (N-2) \times (3^3 - 2^3)

(N-3) \times 27 = (N-3) \times (4^3 - 3^3)

(N-k) \times ( (k+1)^3 - k^3)

So we are left with

S_N^3 = \sum\limits_{k=0}^{N-1} (N-k) \times (3 k^2+3 k+1)

Which results in the usual formula (using the auxiliary relation  S_{N-1}^3=S_N^3-N^3,

S_N^3= ((N(N+1))/2)^2

In general, using this approach, and the auxillary relation

S_N^L= S_{N-1}^L+ N^L

We find

S_N^L= \frac {1}{L+1} ( N^2 + L N^L  + \frac{N+1}{L+1} [  C(L+1,1) L S_{N-1}^1

+ C(L+1,2) (L-1) S_{N-1}^2 + C(L+1,3) (L-2) S_{N-1}^3

+ ... + C(L+1,L-1) (L - [L-2]) S_{N-1}^{L-1} ]    )

where  C(n,m) = \frac{n!}{m! (n-m)!} is the combinatorics formula for the number of ways to select m items out of n.

This formula, while original (as far as I can tell) in its derivation,  is consistent with Faulhaber’s formula from the 16th century.

A course correction – and let’s get started!

Posted on Updated on

I have received some feedback from people that felt the posts were too technical. I am going to address this by constructing a simpler thread of posts on one topic that will start simpler and stay conceptual rather than  become technical.

I want to discuss the current state of Cosmology, given that it is possibly the field in the most flux these days. And the basic concept to understand in Cosmology is that of Cosmological Red Shift. So here goes…

The Cosmological Red Shift means this – when we look at far away galaxies, the light they emit is redder than we would expect. When you look at light of various colors, the redder the light, the longer its wavelength. Why would that be? Why would we perceive the light emitted by a galaxy to be redder than it should be?

To understand Cosmological Red Shift, you need to understand two things – the Doppler Shift and Time Dilation.

Let’s start with Doppler Shift.

If you listen to an ambulance approaching you on a road, then (hopefully, if it hasn’t come for you) speeding away from you on the road, you will hear the pitch, i.e., frequency, of the siren go up, then go down. Listen to this here .

Why does this happen?

Sound is a pressure wave. When the instrument producing a sound vibrates at a certain rate (frequency), it pushes and pulls on the air surrounding it. Those pushes and pulls are felt far away, because the fluctuations in density of the air propagate (see the video). The air isn’t actually going anywhere as a whole – this is why when you have waves in the ocean, the ocean isn’t actually sending all the water towards you, its just the disturbance coming towards you. So these pressure variations hit your ears and that’s how you hear something – the eardrum vibrates the little bones in the ear, which set up little waves in the cochlear fluid that then create electrical signals that go to your auditory cortex and voila, you hear!

Now, waves are characterized by wavelength (\lambda), frequency (\nu) and their speed (\bf{v}). There’s a relation between these three quantities

\bf{v} = \lambda \nu

Sal Khan has a nice video describing this formula in some detail. Let’s try and understand this – wavelength (\lambda) is the distance between the successive positive crests of the wave, frequency (\nu) is the number of crests shooting out of the emitter per second, then (\lambda \nu) is the length of wave coming out of the emitter per second as measured by the emitter. That’s how far the first crest traveled in one second, i.e., the speed of the wave.

Now what happens if the emitter is moving away from you – think of the pressure waves like compressions of a spring, as in the video link above. If the emitter is moving away, that’s like the spring being extended while it is vibrating – ergo, the wavelength is increased in proportion to how fast the emitter is running away from you (call the emitter’s speed v_{emitter}). The formula is

\lambda_{observed} - \lambda_{emitted} = \frac {v_{emitter}} {\bf{v}} \lambda_{emitted}

Aha – this makes sense, so the sound that I hear when an ambulance is driving away from me has a longer wavelength – so it has a lower frequency – it has a lower pitch. If the ambulance is driving towards me, so v_{emitter} is negative in the above formula, then we hear shorter wavelength sound, which has a higher frequency, i.e., a higher pitch.

As an example, if the emitter flies away as fast as the speed of sound in the air, then the observed wavelength should be \bf {double} the emitted wavelength. In the simple picture of the emitter shooting out wave crests at a rate \nu per second, the emitter shoots out one crest, then shoots out another crest after a time interval \frac {1}{\nu}, by which time it has moved a distance \frac {\bf {v}} {\nu} which is indeed one wavelength! So the distance between the crests in the eyes of the observer is twice the emitted wavelength.

Whew!  So that is the Doppler effect. If something is moving away from me, I will hear the sound it emits will seem to be of lower pitch. Since light is also a wave, if a galaxy were moving away from me, I should expect to see the light looks like that of lower frequency – i.e., it looks redder.

When we specialize this to the case of light, so we replace {\bf{v}} by c, the speed of light. There is an additional effect that we need to think of, for light.

Onwards – let’s think about  \rightarrow Time Dilation.

This needs some knowledge of Einstein’s ideas about Special Relativity. I am going to give you a lightning introduction, but not much detail. I might write a post with some details later, but there are excellent popular books on the subject. Several years before Einstein, the Scottish physicist James Maxwell discovered the equations of electromagnetism. People had discovered universal laws of nature before – for instance Issac Newton discovered the Law of Gravitation, but Maxwell’s equations had a puzzling feature. They included a constant which wasn’t a mass, length or time, but was a speed! Think of that. If there was a law of nature that included the speed of your favorite runner (say how quickly that dude in “Temple Run” runs from the apes), how strange that would be. How fast does someone run, you ask. Well, it depends on how fast the observer is going! You must have seen this on the highway.  When your car has a flat and you are standing, ruing your luck, on the side of the highway, you think the cars are zipping past you at 50, 60,…80 miles per hour. When you are in one of the cars, traveling at, say 50 miles per hour, the other cars are moving \bf {relative \hspace {2 mm} to \hspace{2 mm} you} at 0, 10,…30 miles per hour only. That’s natural. How can a physical law, a universal law of nature, depend on a speed! The world is bizarre indeed!

Einstein discovered exactly how bizarre. It turns out if you want the idea of a universal constant that is a speed (for light) to make sense, ALL observers need to agree on the actual speed of light, regardless of how fast they are traveling, along or against or perpendicular to the ray of light. For that to happen their clocks and rulers need to get screwed up, in just the right way to allow for this. Suppose you have two observers that are moving relative to each other, at a constant speed in some direction.  Einstein derived the exact equations that relate the coordinates (x,y,z,t) that the first observer assigns to a moving object to the coordinates (x',y',z',t') that the other observer ascribes to the same object. It’s high school algebra, as it turns out, but the relation implies, among other things that a moving clock ticks slower than a stationary clock, {\bf when \hspace{2 mm} the \hspace{2 mm} clocks \hspace{2 mm} are \hspace{2 mm} compared \hspace{2 mm} at \hspace{2 mm} the \hspace{2 mm} same \hspace{2 mm} point \hspace{2 mm} in \hspace{2 mm} space}.That, by the way is how the twin paradox sorts itself out – the twins have to meet at some point in order to compare their ages, so one has to turn his or her rocket around.

When you use the formulas of relativity, if the emitter is flying away at speed v_{emitter} relative to the observer, the emitter’s clock will seem to run slower than the observer’s clock (from the observer’s point of view). Since the frequency of the emitted wave essentially is a “clock” for both, we will obtain (and this needs a little algebra and some persistence!)

\nu_{observed} = \nu_{emitted} \sqrt{1 - (\frac{v_{emitter}}{c})^2}

Using our previous relation connecting frequency and wavelength, this means the wavelengths are related as below

\lambda_{observed} = \lambda_{emitted} \frac{1}{\sqrt{1 - (\frac{v_{emitter}}{c})^2}}

so when we combine the two effects – Doppler and Relativity, which operate on the same emitted light, but successively, we multiply the two effects, we get the final observed wavelength

\lambda_{observed} = \lambda_{emitted} \sqrt{\frac{1 + \frac{v_{emitter}}{c}}{1 - \frac{v_{emitter}}{c}}}

We see that if something is moving away from us, i.e., v_{emitter} is positive, the observed wavelength is longer than the emitted wavelength, i.e., it is red-shifted. If the moving object emits light of a certain color, the stationary observer of this light sees it to be redder than the emitted color. So here’s the upshot – if you observe light from some object that is redder than you’d expect from that object, one strong possibility is that it is receding away from you. That’s how modern cosmology got started!

A note about terminology; astronomers define a quantity called “red-shift” – denoted by the letter z to define this wavelength difference. It is defined as the relative change in wavelength

z = \frac{\lambda_{observed} - \lambda_{emitted}}{\lambda_{emitted}}

z is a “dimensionless” number – it is a ratio of two lengths. z=0 corresponds to you and me, things that are in the vicinity of each other. The moon isn’t receding away from us (if it is it is immeasurable), neither is our sun, so they all have a z=0. In fact, the entire Milky Way galaxy, our home, is at red-shift z = 0. We really have to leave the vicinity of our local group of large galaxies (that includes principally Andromeda and the Large and Small Magellanic clouds) to start seeing red-shifts exceeding 0. Conversely, the largest red-shifts we have seen are for distant quasars and intense galaxies – red-shift of about 11. Think of what that means – the 21 cm emission wavelength of neutral  hydrogen would be shifted by 232 cm – almost 7 feet! For people constructing prisms and other apparata for telescopes, this is a ridiculously (physically) large apparatus you need. More on this later!


A simple connection between Entropy and Information

Posted on Updated on

This article follows a simple example laid out by Jaynes (1996).
Jaynes’ example is one that shows how one’s computation of the change in entropy in a physical / chemical process depends on the precise variables that one uses to label the macro state. If you use different variables (say you are insensitive to properties that someone else does have the ability to measure) you can invent situations where the entropy change might (to one observer) be contrived to violate the Second Law of Thermodynamics while the other observer sees no such violation.
Let’s consider a vessel whose volume is V, with a diaphragm that separates it into two parts – volumes V_1 and V_2 with N_1 and N_1 molecules of each type, respectively. The “1” side is filled with Argon gas of type A_1 which is indistinguishable from the type of Argon gas filling side “2”, which we call type A_2, at least for the observer named Babu. However, Alisha, with her access to superior technology is indeed able to perceive the difference between the two types of Argon. The container is in close thermal contact with a heat bath that maintains a constant temperature T. The equilibrium condition (same temperature throughout and equal pressure) implies that n_1/V_1 =n_2/V_2 =(n_1+n_2)/V.
Alisha, in addition to her ability to notice the difference between A_1 and A_2 also has sole access to a material called Whiffnium, which is permeable to A_1 but impervious to A_2. During the course of her research, she has also discovered Whaffnium, a new material which is permeable to A_2, but impervious to A_1. Let’s suppose that Alisha constructs two (infinitesimally thin) pistons that are initially placed very close to each other, one piston made of Whiffnium and the other of Whaffnium, as in the picture below.


Let’s suppose that just enough of A_1 permeates through the Whiffnium so that the partial pressure of A_1 is the same in the left side of the container as well as the intercalated region (between the Whiffnium and Whaffnium pistons). Similarly, let’s assume that just enough of A_2 permeates through the Whaffnium into the intercalated region (between the pistons) so that the partial pressure of A_2 is the same in the intercalated region as well as on the right side of the container. Now, due to the unbalanced pressure of A_2 impinging upon the Whiffnium piston, it is reversibly moved to the left and the entropy change in A_2 is
 = n_2 k_B ln(V/V_2)
Similarly, the Whaffnium piston is reversibly moved to the right and the entropy change in A_1 is
= n_1 k_B ln(V/V_1)
The total entropy change is hence
n_1 k_B ln(V/V_1 )+n_2 k_B ln(V/V_2 )
All this is pretty logical from Alisha’s point of view, since she does see the two parts of the container as having different materials, A_1 and A_2. She understands that the entropy change in the container is a consequence of the heat flowing into the system from the heat bath.
However, Babu sees a conundrum. He sees the argon as one undifferentiated gas and so the initial and final states of the system are identical. However, the system has absorbed an amount of heat and converted all of it into work, in violation of the Second Law of Thermodynamics. In addition, he sees the entropy change as 0. This is, however, simply a reflection of the fact that the entropy is a function of the macrostate variables that one uses and if Babu has an insufficiently specified macrostate, then Alisha is simply able to manipulate phenomena to cause Babu to think he has observed a violation of the Second Law.
How much information would Babu need to in order to deduce the correct entropy change? In the initial state, if he knew about the sub-identities A_1 and A_2, the macrostate where A_1 is on the left and A_2 is on the right (side of the container) has the following number of (equally probable) microstates
=(V_1/v)^(n_1 ) (V_2/v)^(n_2 )
Where we have assumed that each molecule can be localized to a minimum volume v and we can do this for all the n_1 molecules in V_1 and the n_2 molecules in V_2.
In the final state, all the n_1+n_2 molecules are strewn about in the total volume V and the total number of microstates is
= (V/v)^(n_1+n_2)
So to specify the microstate, he needs to be communicated extra information (along the lines of traditional information theory)

I= \log_2((\frac{V}{v})^{n_1+n_2} ) - \log_2 ( (\frac{V_1}{v})^{n_1} (\frac{V_2}{v})^{n_2} )

= n_1 \log_2(V/V_1)+n_2 \log_2(V/V_2 )
Which is exactly (up to a multiplicative factor of k_B and difference between \ln and \log, just the same as the entropy change to separate the molecules into the different varieties.
Note that if Babu didn’t actually meet Alisha and had no idea that there were two varieties of Argon, his calculation for the number of microstates before and after would be identical, equal to (V/v)^(n_1+n_2 ) – this is because he doesn’t even think the diaphragm separating the two sides of the container is even necessary – they are the same materials and are in thermodynamic equilibrium with each other.
However, once Babu has this information, in the form of a detailed message, he will have been supplied with enough information to deduce completely (as far as Alisha’s abilities admit) to the situation with the two varieties, where he had zero before. Ergo, the extra information he needs is the entropy difference.