# Quantum Mechanics

### Master Traders and Bayes’ theorem

Posted on Updated on

Imagine you were walking around in Manhattan and you chanced upon an interesting game going on at the side of the road. By the way, when you see these games going on, a safe strategy is to walk on, since they usually reduce to methods of separating a lot of money from you in various ways.

The protagonist, sitting at the table tells you (and you are able to confirm this by a video taken by a nearby security camera run by a disinterested police officer), that he has managed to toss the same quarter (an American coin) thirty times and managed to get “Heads” ${\bf ALL}$ of those times. What would you say about the fairness or unfairness of the coin in question?

Next, your good friend rushes to your side and whispers to you that this guy is actually one of a $really \: large$ number of people (a little more than a billion) that were asked to successively toss freshly minted, scrupulously clean and fair quarters. People that tossed tails were “tossed” out at each successive toss and only those that tossed heads were allowed to toss again. This guy (and one more like him) were the only ones that remained. What can you say now about the fairness or unfairness of the coin in question?

What if the number of coin tosses was $100$ rather than $30$, with a larger number of initial subjects?

Just to make sure you think about this correctly, suppose you were the Director of a large State Pension Fund and you need to invest the life savings of your state’s teachers, firemen, policemen, highway maintenance workers and the like. You get told you have to decide to allocate some money to a bet based made by an investment manager based on his or her track record (he successively tossed “Heads” a hundred times in a row). Should you invest money on the possibility that he or she will toss “Heads” again? If so, how much should you invest? Should you stay away?

This question cuts to the heart of how we operate in real life. If you cut out the analytical skills you learnt in school and revert to how our “lizard” brain thinks, we would assume the coin was unfair (in the first instance) and express total surprise at the knowledge of the second fact. In fact, even though the second situation could well have happened to every similar situation of the first sort we encounter in the real world, we would still operate as if the coin was unfair, as our “lizard” brain would instruct us to behave.

What we are doing unconsciously is using Bayes’ theorem. Bayes’ theorem is the linchpin of inferential deduction and is often misused even by people who understand what they are doing with it. If you want to read couple of rather interesting books that use it in various ways, read Gerd Gigirenzer’s “Reckoning with Risk: Learning to Live with Uncertainty” or Hans Christian von Baeyer’s “QBism“. I will discuss a few classic examples. In particular Gigirenzer’s book discusses several such, as well as ways to overcome popular mistakes made in the interpretation of the results.

Here’s a very overused, but instructive example. Let’s say there is a rare disease (pick your poison) that afflicts $0.25 \%$of the population. Unfortunately, you are worried that you might have it. Fortunately for you, there is a test that can be performed, that is $99 \%$ accurate – so if you do have the disease, the test will detect it $99 \%$ of the time. Unfortunately for us, the test has a $0.1 \%$ false positive rate, which means that if you don’t have the disease, $0.1 \%$ of such tested people will mistakenly get a positive result. Despite this, the results look exceedingly good, so the test is much admired.

You nervously proceed to your doctor’s office and get tested. Alas, the result comes back “Positive”. Now, ask yourself, what the chances you actually have the disease? After all, you have heard of false positives!

A simple way to turn the percentages above into numbers, suppose you consider a population of $1,000,000$ people. Since the disease is rather rare, only $(0.25 \% \equiv ) \: 2,500$ have the disease. If they are tested, only $(1 \% \equiv ) \: 25$ of them will get an erroneous “negative” result. However, if the rest of the population were tested in the same way, $(0.1 \%=) \: 1000$ people would get a “Positive” result, despite not having the disease. In other words, of the $3475$ people who would get a “Positive” result, only $2475$ actually have the disease, which is roughly $72\%$– so such an accurate test can only give you a 7-in-10 chance of actually being diseased, despite its incredible accuracy. The reason is that the “false positive” rate is low, but not low enough to overcome the extreme rarity of the disease in question.

Notice, as Gigirenzer does, how simple the argument seems when phrased with numbers, rather than with percentages. To do this using standard probability theory, one writes, if we are speaking about Events $A$ and $B$ and write the probability that $A$ could occur once we know that $B$ has occurred as $P(A/B)$, then

$P(A/B) P(B) = P(A)$

Using this

$P(I \: am \: diseased \: GIVEN \: I \: tested \: positive) = \frac {P(I \: am \: diseased)}{P(I \: test \: positive)}$

and then we note

$P(I \: am \: diseased) = 0.25\%$

$P(I \: test \: positive) = 0.25 \% \times 99 \% + 99.75 \% \times 0.1 \%$

since I could test positive for two reasons – either I really among the $0.25 \%$ positive people and additionally was among the $99 \%$ that the test caught OR I really was among the $99.75 \%$ negative people but was among the $0.1 \%$ that unfortunately got a false positive.

Indeed, $\frac{0.25 \%}{0.25 \% \times 99 \% + 99.75 \% \times 0.1 \%} \approx 0.72$

which was the answer we got before.

The rather straightforward formula I used in the above is one formulation of Bayes’ theorem. Bayes’ theorem allows one to incorporate one’s knowledge of partial outcomes to deduce what the underlying probabilities of events were to start with.

There is no good answer to the question that I posed in the first paragraph. It is true that both a fair and an unfair coin could give results consistent with the first event (someone gets $30$ or even $100$ coin tosses). However, if one desires that probability has an objective meaning independent of our experience, based upon the results of an infinite number of repetitions of some experiment (the so-called “frequentist” interpretation of probability), then one is stuck. In fact, based upon that principle, if you haven’t heard something contrary to the facts about the coin, your a priori assumption about the probability of heads must be $\frac {1}{2}$. On the other hand, that isn’t how you run your daily life. In fact, the most legally defensible (many people would argue the ${\bf {only}}$ defensible) strategy for the Director of the Pension Fund would be to

• not assume that prior returns were based on pure chance and would be equally likely to be positive or negative
• bet on the manager with the best track record

At a minimum, I would advise people to stay away from a stable of managers that simply are the survivors of a talent test where the losers were rejected (oh wait, that sounds like a large number of investment managers in business these days!). Of course, the manager that knows they have a good thing going is likely to not allow investors at all for fear of reducing their returns due to crowding. Such managers also exist in the global market.

The Bayesian approach has a lot in common with our every-day approach to life. It is not surprising that it has been applied to the interpretation of Quantum Mechanics and that will be discussed in a future post.

### Special Relativity; Or how I learned to relax and love the Anti-Particle

Posted on Updated on

The Special Theory of Relativity, which is the name for the set of ideas that Einstein proposed in 1905 in a paper titled “On the Electrodynamics of moving bodies”, starts with the premise that the Laws of Physics are the same for all observers that are traveling at uniform speeds relative to each other. One of the Laws of Physics includes a special velocity – Maxwell’s equations for electromagnetism include a special speed $c$, which is the same for all observers. This leads to some spectacular consequences. One of them is called the “Relativity of Simultaneity”. Let’s discuss this with the help of the picture below.

Babu is sitting in a railway carriage, manufactured by the famous C-Kansen company, that travels at speeds close to that of light. Babu’s sitting exactly in the middle of the carriage and for reasons best known to himself (I guess the pantry car was closed and he was bored), decides to shoot a laser beam simultaneously at either end of the carriage from his position. There are detectors/mirrors that detect the light at the two ends of the carriage. As far as he is concerned, light travels at $3 \times 10^5 \frac {km}{sec}$ and he will challenge anyone who says otherwise to a gunfight – note that he is wearing a cowboy hat and probably practices open carry.

Since the detectors at the end of the carriage are equidistant from him, he is going to be sure to find the laser beams hit the detectors simultaneously, from his point of view.

Now, consider the situation from the point of view of Alisha, standing outside the train, near the tracks, but safely away from Babu and his openly carried munitions. She sees that the train is speeding away to the left, so clearly since ${\bf she}$ thinks light also travels at $3 \times 10^5 \frac {km}{sec}$, she would say that the light hit the ${\bf right}$ detector first before the ${\bf left}$ detector. She doesn’t ${\underline {at \: all \: think}}$ that the light hit the two detectors simultaneously. If you asked her to explain, she’d say that the right detector is speeding towards the light, while the left detector is speeding away from the light, which is why the light strikes them at different times.

Wait – it is worse. If you had a third observer, Emmy, who is skiing to the ${\bf {left}}$ at an even higher speed than the C-Kansen (some of these skiers are crazy), she thinks the C-Kansen train is going off to the right (think about it), not able to keep up with her. As far as ${\underline {\bf {she}}}$ is concerned, the laser beam hit the ${\bf {left}}$ detector before the other beam hit the ${\bf {right}}$ detector.

What are we finding? The Events in question are – “Light hits Left Detector” and “Light hits Right Detector”. Babu claims the two events are simultaneous. Alisha claims the second happened earlier. Emmy is insistent that the first happened earlier. Who is right?

They are ALL correct, in their own reference frames. Events that appear simultaneous in one reference frame can appear to occur the one before the other in a different frame, or indeed the one after the other, in another frame. This is called the Relativity of Simultaneity. Basically, this means that you cannot attribute one of these events to have ${\bf {caused}}$ the other, since their order can be changed. Events that are separated in this fashion are called “space-like separated”.

Now, on to the topic of this post. In the physics of quantum field theory, particles interact with each other by exchanging other particles, called gauge bosons. This interaction is depicted, in very simplified fashion so we can calculate things like the effective force between the particles, in a sequence of diagrams called Feynman diagrams. Here’s a diagram that depicts the simplest possible interaction between two electrons

Time goes from the bottom to the top, the electrons approach each other, exchange a photon, then scoot off in different directions.

This is the simplest diagram, though and to get the exact numerical results for such scattering, you have to add higher orders of this diagram, as shown below

When you study such processes, you have to perform mathematical integrals – all you know is that you sent in some particles from far away into your experimental set-up, something happened and some particles emerged from inside. Since you don’t know where and when the interaction occurred (where a particle was emitted or picked up, as at the vertexes in the above diagrams), you have to sum over all possible places and times that the interaction ${\bf {could}}$ have occurred.

Now comes the strange bit. Look at what might happen when you sum over all possible paths for a collision between an electron and a photon.

In the above diagram, the exchange was simultaneous.

In the next one, the electron emitted a photon, then went on to absorb a photon.

and then comes the strange bit –

Here the electron emitted a photon, then went backwards in time, absorbed a photon, then went its way.

When we sum over all possible event times and locations, this is really what the integrals in quantum field theory instruct us to do!

Really, should we allow ourselves to count processes where  two events occur simultaneously, which means we would then have to allow for them to happen in reverse order, as in the third diagram? What’s going on? This has to be wrong! And what’s an electron going backwards in time anyway? Have we ever seen such a thing?

Could we simply ban such processes? So, we would only sum over positions and times where the intermediate particles had enough time to go from one place to another,

There’s a problem with this. Notice the individual vertexes where an electron comes in, emits (or absorbs) a photon, then moves on. If this were a “real” process, it wouldn’t be allowed. It violates the principle of energy-momentum conservation. A simple way to understand this is to ask, could a stationary electron suddenly emit a photon and shoot off in a direction opposite to the photon, It looks deceptively possible! The photon would have, say, energy $E$ and momentum $p = E/c$. This means that the electron would also have momentum $E/c$, in the opposite direction (for conservation) but then its energy would have to be $\sqrt{E^2+m^2 c^4}$ from the relativistic formula. This is higher than the energy $m c^2$ of the initial electron : $A+ \sqrt{A^2+m^2}$ is bigger than $m$! Not allowed !

We are stuck. We have to assume that energy-momentum conservation is violated in the intermediate state – in all possible ways. But then, all hell breaks loose – in relativity, the speed of a particle $v$ is related to its momentum $p$ and its energy $E$ by $v = \frac {p}{E}$ – since $p$ and $E$ can be ${\underline {anything}}$, the intermediate electron could, for instance, travel faster than light. If so, in the appropriate reference frame, it would be absorbed before it was created. If you can travel faster than light, you can travel backwards in time (read this post in Matt Buckley’s blog for a neat explanation).

If the electron were uncharged, we would probably be hard-pressed to notice. But the electron is charged. This means if we had the following sequence of events,

– the world has -1 net charge

– electron emits a photon and travels forward in time

– electron absorbs a photon and goes on.

This sequence doesn’t appear to change the net charge in the universe.

But consider the following sequence of events

– the world has -1 net charge

– the electron emits a photon and travels backwards in time

– the electron absorbs a photon in the past and then starts going forwards in time

Now, at some intermediate time, the universe seems to have developed two extra negative charges.

This can’t happen – we’d notice! Extra charges tend to cause trouble, as you’d realize if you ever received an electric shock.

The only way to solve this is to postulate that an electron moving backward in time has a positive charge. Then the net charge added for all time slices is always -1.

ergo, we have antiparticles. We need to introduce the concept to marry relativity with quantum field theory.

There is a way out of this morass if we insist that all interactions occur at the same point in space and that we never have to deal with “virtual” particles that violate momentum-energy conservation at intermediate times. This doesn’t work because of something that Ken Wilson discovered in the late ’70s, called the renormalization group – the results of our insistence would be we would disagree with experiment – the effects would be too weak.

For quantum-field-theory students, this is basically saying that the expansion of the electron’s field operator into its components can’t simply be

$\Psi(\vec x, t) = \sum\limits_{spin \: s} \int \frac {d \vec k}{(2 \pi)^3} \frac{1}{\sqrt{2 E_k}} b_s(\vec k) e^{- i E_k t + i {\vec k}.{\vec x}}$

but has to be

$\Psi(\vec x, t) = \sum\limits_{spin \: s} \int \frac {d \vec k}{(2 \pi)^3} \frac{1}{\sqrt{2 E_k}} \left( b_s(\vec k) e^{- i E_k t + i {\vec k}.{\vec x}} +d^{\dagger}_s(\vec k) e^{+ i E_k t - i {\vec k}.{\vec x}} \right)$

including particles being destroyed on par with anti-particles being created and vice versa.

The next post in this sequence will discuss another interesting principle that governs particle interactions – C-P-T.

Quantum field theory is an over-arching theory of fundamental interactions. One bedrock of the theory is something called C-P-T invariance.  This means that if you take any physical situation involving any bunch of particles, then do the following

• make time go backwards
• parity-reverse space (so in three-dimensions, go into a mirror world, where you and everything else is opposite-handed)
• change all particles into anti-particles (with the opposite charge)

then you will get a process which could (and should) happen in your own world. As far as we know this is always right, in general and it has been proven under a variety of assumptions. A violation of the C-P-T theorem in the universe would create quite a stir. I’ll discuss that in a future post.

Addendum: After this article was published, I got a message from someone I respect a huge amount that there is an interesting issue here. When we take a non-relativistic limit of the relativistic field theory, where do the anti-particles vanish off to? This is a question that is I am going to try and write about in a bit!

### Can a quantum particle come to a fork in the road and take it?

Posted on Updated on

I have always been fascinated by the weirdness of the Universe. One aspect of the weirdness is the quantum nature of things – others relate to the mysteries of Lorentz invariance, Special Relativity, the General Theory of Relativity, the extreme size and age of the Universe, the vast amount of stuff we don't seem to be able to see and so on.

This post is about an experiment that directly points to the fundamental weirdness of small (and these days, not so small) particles. While quantum effects matter at the sub-atomic particle level, these effects can coalesce into macroscopic phenomena like superconductivity, the quantum Hall effect and so on, so they can't be ignored. This experiment, usually referred to as the "Double-Slit" experiment, is described and explained in detail in Vol. 3 of the Feynman Lectures in Physics. While it would be silly of me to try and outdo Feynman's explanation, which by the way was one of the reasons why I was enthused to study physics in the first place,  I want to go beyond the Double-Slit experiment to discuss the Delayed-Choice experiment – this extra wrinkle on the Double-Slit experiment was invented by the famous scientist John Wheeler (who was Feynman's Ph.D advisor) and displays for all to see, even more aspects of quantum weirdness.

Let's get started.

The Double-Slit experiment  is carried out by shooting electrons at a pair of closely placed slits – the electron flux is sufficiently small that one is able to count the number of times electrons hit various points on a television screen placed past the slits. If no measures are taken to identify which of the two paths the electrons actually took to reach the screen, then the probability density of arrival at various points on the television screen displays an “interference'' pattern. If, however, the experiment is set up so as to identify which slit the electron went through, for example by shining an intense beam of photons at the slits that scatter off the electrons, then the interference pattern for those “which-path'' identified electrons switches to a “clump'' pattern, centered around the open slit. The standard experiment is displayed, schematically, below, where since both slits are open and we don't bother to check which slit the electron goes through, we see an "interference" pattern. If we used photons (light) instead of electrons, we'd see alternate light and dark fringes.

If only one slit were open, we'd get a "clump" pattern as below

Note – no interference "bumps" at places far away from the peak.

This behavior is also what we'd get for light – photons.

Quantum mechanics is the theory that was constructed to "explain" this behavior. We construct a quantity called the "amplitude". The "amplitude" is a complex number that has a (complex) value at every point in space and time. Complex numbers have two properties – a magnitude and a phase. The magnitude squared of the amplitude at some time $t$  times a small volume in space $v$  is the probability of finding the particle (if its an amplitude for the electron, then the probability of finding the electron etc.) in that volume $v$ at time $t$. Since you need to multiply the magnitude squared by a little volume element, the squared magnitude of the amplitude is referred to as the "Probability Density".

Schrodinger's equation writes down how this amplitude evolves in time – from the electron gun to the screen. To this equation, you need to add the "Born" prescription – that you have to square the magnitude of the amplitude to get the probability density.

Feynman found a neat, equivalent interpretation of Schrodinger's equation – his method basically said – if you want to find the amplitude for the electron (say) at some point in the screen, just write down path-amplitudes for all the different ways the electron could get from the electron gun to the screen. Add these path-amplitudes and then call the net sum the "total" amplitude for the electron to be found at the particular spot on the screen. Square the magnitude of this "total" amplitude and you will get the probability density for the electron to be found at that spot (times the little volume around that spot will give you the probability to find the electron at that spot).

All this discussion of path-amplitudes would be academic if the amplitudes were real numbers. The phase is a critical piece of the amplitude. Though the magnitude (squared) is the physical quantity (related to a probability), the magnitude of a sum of complex numbers depends delicately on the phase difference between the two. As an example, $z_1=1+2i, z_2=-1-2i, z_3=1-2i, z_4=-1+2i$ have the same magnitude $\sqrt{5}$. However, $z_1+z_2=0, z_1+z_3=2, z_1+z_4=4i$ all of which have magnitudes very different from just the sum of the individual magnitudes. That's the reason we get alternate light and dark fringes on the television screen – the phases of the amplitudes for the electron to get to those spots from either of the two slits sometimes causes the sum amplitude to be 0 (which is called destructive interference) and sometimes causes the sum amplitude to add to the maximum (which is called constructive interference) and all magnitudes between these two extremes. While this behavior is extremely counter-intuitive for particles, it resembles behavior we are used to with waves, as this YouTube video shows, so does this one. This is usually referred to as wave-particle duality.

The thing you need to take away from this experiment is that if you don't force the electron to get to the screen through only one slit, by say, closing off the other slit, it appears to behave like it goes through both slits.

Wait, it gets even more interesting.

The Delayed-Choice experiment was proposed by Marlan Scully (who is an inspiring scientist in his own right) and is portrayed in the accompanying figure.

An SPDC (spontaneous parametric down conversion) setup is used – basically, each of the red regions in the picture above produces two photons, when one photon hits it. Since the laser's photon can go through either of the two slits, this is just like the Double-Slit experiment, except that things are arranged so that after going through a slit, the photon would produce two other photons. The photons that travel towards the interferometer/detector $D0$ are referred to as “signal'' photons. The photons that travel towards the prism are the “idler'' photons. After passing through the prism, the “idler'' photons pass through a beam-splitter, that has a $50 \%$ probability of deflecting the incoming photon to detectors $D3$ and $D4$ respectively and $50\%$ probability of letting the photon pass on to the fully silvered (reflecting mirrors) at the bottom of the picture. Another beam-splitter is placed between the detectors $D1$ and $D2$, so photons that are detected at $D1$ and $D2$ have their “which-path'' information obliterated – for instance, an “idler'' photon arriving at $D1$ could have come along either of two paths. The actual experiment was performed by Kim et. al.

The detector $D0$ accumulates “signal'' photons – a coincidence counter correlates it to “idler'' photons detected at the detectors $D3, D4, D1$ and $D2$ (the “idler'' photons arrive at those detectors a few nanoseconds after the “signal'' photons are received. From the accumulated “signal'' photons received at $D0$, if we separate the ones received in coincidence with the detectors $D3$ or $D4$, since the “which-path'' information is clear in those cases, the pattern of interference observed (in the spatial density of the “signal'' photons) is the “clump'' pattern. However, the accumulated “signal'' photons received at $D0$ that are coincident with the ones received at $D1$ display an interference pattern, since one cannot infer the path taken by the “idler'' photons that show up at detector $D1$, which means one cannot tell which slit the “signal'' photon came through before arriving at detector $D0$. Similarly, the accumulated “signal'' photons received at $D0$ that are coincident with the ones received at $D2$ display an interference pattern in their spatial distribution, however this pattern is spatially offset half a wavelength off the one due to $D1$. This is just enough to put peaks where the other pattern has dips and vice versa. So, if you aren't careful to note which detector $D1$ or $D2$ detected the photon coincident with the "signal" photon detector $D0$, you get a clump pattern. The reason for this offset is  little tricky to explain – its called "unitarity" or conservation of probability at the beam-splitter, which forces some delicate phase assignments for the amplitudes we spoke about earlier.

Note, however, that someone could hide the specific arrival times of the photons at $D1$ and $D2$ from you for months and then tell you, say, a year later. All this time, you wouldn't have known there was an interference pattern "hiding" under the massive "clump" pattern you see. When you selectively look at the coincident detections separately for $D1$ and $D2$, it is then and only then, that you see the interference pattern.

Curious! This experiment, as I said, has been done and the results are as described above.

With a friend, I put together another set up in an interesting locale, a black hole,  that we tried to make work. Quantum mechanics defeated us in our attempts too, but its an interesting problem to work through.

### Here’s an alternative history of how quantum mechanics came about…

Posted on Updated on

Quantum Mechanics was the result of analysis of experiments that explored the emission and absorption spectra of various atoms and molecules. Once the electron and proton were discovered, very soon after the discovery of radioactivity, it was theorized that the atom was an electrically neutral combination of protons and electrons. Since it isn’t possible for a static arrangement of protons and electrons to be stable (a theorem in classical electromagnetism), the plum-pudding model of JJ Thomson was rejected in favor of one where the electrons orbited a central, heavy nucleus. However, it is well – known from classical electromagnetism that if an electron is accelerated, which is what happens when it revolves around the positively charged nucleus, it should radiate energy through electromagnetic radiation and quickly collapse into the nucleus.

The spectra of atoms and molecules were even more peculiar – there were observed an infinite number of specific spectral lines and no lines were observed at in-between frequencies. Clearly, the systems needed specific amounts of energy to be excited from one state to another and there wasn’t really a continuum of possible states from the ground state (or zero energy state) to high energies. In addition, the ground state of the hydrogen atom, for instance, seemed to have a specific energy, the ionization energy of the single electron in the atom that was specific to the hydrogen atom and could not be calculated from known parameters in any easy way. The relation between the lines was recognized by Rydberg – the frequency of the radiation emitted in various transitions in hydrogen was proportional to the difference of reciprocals of squares of small natural numbers.

Anyway, starting from an energy function (Hamiltonian) of the kind

$H = \frac{\vec{p}^2}{2m} + V(\vec{r}) \hspace{3 mm} V(\vec{r}) = - \frac{e^2}{r}$

For the single electron interacting with a heavy nucleus, we recover only the classical continuum of several possible solutions for the hydrogen atom, even neglecting the radiation of energy by the continuously accelerating electron.

We can state the conundrum as follows. We use the Energy function above, solve the classical problem and find that energy can take a continuum of values from some minimum negative number to infinity. In the lab, we find that Energy takes only a discrete infinity of values.

Let’s make a connection to matrices and operators. Matrices are mathematical objects that have discrete eigenvalues. Can we interpret $H$ as a matrix of some sort and have the discrete energy values of the atom be eigenvalues of the matrix? In that case, there would be eigenvectors corresponding to those eigenvalues, let’s notate them as $|E_i>$, with eigenvalue $E_i$ for the  $i^{th}$ energy level. If  $H$ were a matrix, so would  $x$ and $p$ , since otherwise, we wouldn’t be able to make sense of the definition of  $H$ otherwise. We’d like to make the above definition of the energy in terms of the position and momentum variables since it allows us to guess at quantum theories for other systems in the future – to some extent while this approach is arbitrary, it is an example of conservative-radicalism (phrase I learned from a talk by Nima Arkani-Hamed); it’s also called the quantization prescription.

Now, if   $x$ and $p$ were to be matrices, could they have the same eigenvectors, presumably the same eigenvectors as  $H$? This could mean that they would need to be commuting matrices. Well, they can’t, otherwise, we’d be back to the same classical solution as before – if   $x$ and $p$ had the same eigenvectors, and then  $H$ would just have the same eigenvectors and we would be stuck the same continuum of energy levels we had in the classical problem. So we are stuck with the situation that the eigenvectors of $x$ and $p$  and indeed $H$ , label them as $|x>$ , $|p>$ and  $|E_i>$ can’t be the same – they stick out in different directions in the abstract state space of state vectors. The state vectors for $H$, i.e., the    $|E_i>$ are some linear combinations of the  $|x>$‘s or the $|p>$’s,  assuming the  $|x>$ and  $|p>$ are each an orthogonal complete set of vectors that span the abstract state space.

This leads us to the second realization, i.e., if we make this assumption that the eigenvectors $|x>$ , $|p>$ and  $|E_i>$ stick out in different directions in the state space and are a complete, orthogonal set, then we are only able to specify the state of the system by giving their components to  $|x>$ or to  $|p>$ or to  $|E_i>$, unlike in classical physics, where   $x$ and $p$ are both needed to specify completely the state of the system.

What is the physical significance of dot products such as  $$  and $$. These might be complex numbers – does the magnitude and phase denote specific physical quantities that can be measured? When we study the meaning of a dot product such as $$  , which should be zero unless  $x = x'$ and should yield 1 when integrated over the entire set of  states, and given that $x$ is a continuous variable,

$ = \delta(x - x')$

This is akin to the probability density that a particle in state  can be found in the state . The implication is that the magnitude of the dot product has physical meaning. Later, in an inspired leap of imagination,  Max Born realized that we need to interpret the square of the magnitude as the quantity with physical meaning – the probability density.

What is the dot product of $|x>$ and $|p>$

Let’s start with some definitions, based on our simple minded notion that these variables need to be represented as matrices with eigenvectors.

$x |x'> = x' |x'>$

$p|p'> = p'|p'>$

The dot product is represented by $$

Now this must be a function purely of  $x$ and $p$ . Hence

$ = f(x,p>$

We expect translational invariance in physics in our physically relevant quantities and $||$ is (by the argument in the last paragraph)  a physically relevant quantity – related to the probability density that a particle in state $|p>$  is in the state $|x>$.

Let’s take the dot product $|p>$ of with the vector $|x=0>$. This must be, from the above

$ = f(0,p>$

Now, if the origin of coordinates were moved by $A$, i.e.,

$x \rightarrow x+A$

We don’t expect there to be a physical change in the dot product, it should not care about where the origin of coordinates is, up to a factor of magnitude unity. This means

$f(x+A,p) = f(x,p) e^{i \Phi(x,A,p)}$

$f(A,p) = f(0,p) e^{i \Phi(0,A,p)}$

The simplest choice of function that has this property is (up to some units)

$f(x,p) =e^{i \alpha p x + iC}$

Where  is an arbitrary constant, which we can choose to be $0$ and  $\alpha$ is a quantity that makes the dimensions come out right in the exponent (need to have all the dimensions cancelled out).

Since you also have

$ = p' e^{i \alpha p' x'}$

The above expression allows us to make the identification

$ = - \frac {i}{\alpha} \frac{\partial}{\partial x'} $

So, the matrix  $p$ can be identified, in the space spanned by the eigenvectors of $x$, as

$p \equiv - \frac {i}{\alpha} \frac{\partial}{\partial x}$

Now, suppose the eigenvectors of the  $H$ matrix are the $|E_i>$ , so we have

$ = $

$= \left( - \frac {1}{2 m \alpha^2} \frac {\partial^2}{\partial x^{'2}} + V(x') \right) = E_i |E_i>$

This is Schrodinger’s equation, if we make the interpretation $\alpha \equiv \frac {1}{\hbar}$

Apart from the mental leap to make from treating $x, p$ as a continuous set of variables to treating them as matrices (apparently that was considered higher mathematics in the early 1920s), the flow seems pretty straightforward.

To see Nima Arkani-Hamed talk about the phrase “conservative-radicalism” and other interesting topics, see the YouTube video here.