Here’s an alternative history of how quantum mechanics came about…

Quantum Mechanics was the result of analysis of experiments that explored the emission and absorption spectra of various atoms and molecules. Once the electron and proton were discovered, very soon after the discovery of radioactivity, it was theorized that the atom was an electrically neutral combination of protons and electrons. Since it isn’t possible for a static arrangement of protons and electrons to be stable (a theorem in classical electromagnetism), the plum-pudding model of JJ Thomson was rejected in favor of one where the electrons orbited a central, heavy nucleus. However, it is well – known from classical electromagnetism that if an electron is accelerated, which is what happens when it revolves around the positively charged nucleus, it should radiate energy through electromagnetic radiation and quickly collapse into the nucleus.

The spectra of atoms and molecules were even more peculiar – there were observed an infinite number of specific spectral lines and no lines were observed at in-between frequencies. Clearly, the systems needed specific amounts of energy to be excited from one state to another and there wasn’t really a continuum of possible states from the ground state (or zero energy state) to high energies. In addition, the ground state of the hydrogen atom, for instance, seemed to have a specific energy, the ionization energy of the single electron in the atom that was specific to the hydrogen atom and could not be calculated from known parameters in any easy way. The relation between the lines was recognized by Rydberg – the frequency of the radiation emitted in various transitions in hydrogen was proportional to the difference of reciprocals of squares of small natural numbers.

Anyway, starting from an energy function (Hamiltonian) of the kind

$H = \frac{\vec{p}^2}{2m} + V(\vec{r}) \hspace{3 mm} V(\vec{r}) = - \frac{e^2}{r}$

For the single electron interacting with a heavy nucleus, we recover only the classical continuum of several possible solutions for the hydrogen atom, even neglecting the radiation of energy by the continuously accelerating electron.

We can state the conundrum as follows. We use the Energy function above, solve the classical problem and find that energy can take a continuum of values from some minimum negative number to infinity. In the lab, we find that Energy takes only a discrete infinity of values.

Let’s make a connection to matrices and operators. Matrices are mathematical objects that have discrete eigenvalues. Can we interpret $H$ as a matrix of some sort and have the discrete energy values of the atom be eigenvalues of the matrix? In that case, there would be eigenvectors corresponding to those eigenvalues, let’s notate them as $|E_i>$, with eigenvalue $E_i$ for the  $i^{th}$ energy level. If  $H$ were a matrix, so would  $x$ and $p$ , since otherwise, we wouldn’t be able to make sense of the definition of  $H$ otherwise. We’d like to make the above definition of the energy in terms of the position and momentum variables since it allows us to guess at quantum theories for other systems in the future – to some extent while this approach is arbitrary, it is an example of conservative-radicalism (phrase I learned from a talk by Nima Arkani-Hamed); it’s also called the quantization prescription.

Now, if   $x$ and $p$ were to be matrices, could they have the same eigenvectors, presumably the same eigenvectors as  $H$? This could mean that they would need to be commuting matrices. Well, they can’t, otherwise, we’d be back to the same classical solution as before – if   $x$ and $p$ had the same eigenvectors, and then  $H$ would just have the same eigenvectors and we would be stuck the same continuum of energy levels we had in the classical problem. So we are stuck with the situation that the eigenvectors of $x$ and $p$  and indeed $H$ , label them as $|x>$ , $|p>$ and  $|E_i>$ can’t be the same – they stick out in different directions in the abstract state space of state vectors. The state vectors for $H$, i.e., the    $|E_i>$ are some linear combinations of the  $|x>$‘s or the $|p>$’s,  assuming the  $|x>$ and  $|p>$ are each an orthogonal complete set of vectors that span the abstract state space.

This leads us to the second realization, i.e., if we make this assumption that the eigenvectors $|x>$ , $|p>$ and  $|E_i>$ stick out in different directions in the state space and are a complete, orthogonal set, then we are only able to specify the state of the system by giving their components to  $|x>$ or to  $|p>$ or to  $|E_i>$, unlike in classical physics, where   $x$ and $p$ are both needed to specify completely the state of the system.

What is the physical significance of dot products such as  $$  and $$. These might be complex numbers – does the magnitude and phase denote specific physical quantities that can be measured? When we study the meaning of a dot product such as $$  , which should be zero unless  $x = x'$ and should yield 1 when integrated over the entire set of  states, and given that $x$ is a continuous variable,

$ = \delta(x - x')$

This is akin to the probability density that a particle in state  can be found in the state . The implication is that the magnitude of the dot product has physical meaning. Later, in an inspired leap of imagination,  Max Born realized that we need to interpret the square of the magnitude as the quantity with physical meaning – the probability density.

What is the dot product of $|x>$ and $|p>$

Let’s start with some definitions, based on our simple minded notion that these variables need to be represented as matrices with eigenvectors.

$x |x'> = x' |x'>$

$p|p'> = p'|p'>$

The dot product is represented by $$

Now this must be a function purely of  $x$ and $p$ . Hence

$ = f(x,p>$

We expect translational invariance in physics in our physically relevant quantities and $||$ is (by the argument in the last paragraph)  a physically relevant quantity – related to the probability density that a particle in state $|p>$  is in the state $|x>$.

Let’s take the dot product $|p>$ of with the vector $|x=0>$. This must be, from the above

$ = f(0,p>$

Now, if the origin of coordinates were moved by $A$, i.e.,

$x \rightarrow x+A$

We don’t expect there to be a physical change in the dot product, it should not care about where the origin of coordinates is, up to a factor of magnitude unity. This means

$f(x+A,p) = f(x,p) e^{i \Phi(x,A,p)}$

$f(A,p) = f(0,p) e^{i \Phi(0,A,p)}$

The simplest choice of function that has this property is (up to some units)

$f(x,p) =e^{i \alpha p x + iC}$

Where  is an arbitrary constant, which we can choose to be $0$ and  $\alpha$ is a quantity that makes the dimensions come out right in the exponent (need to have all the dimensions cancelled out).

Since you also have

$ = p' e^{i \alpha p' x'}$

The above expression allows us to make the identification

$ = - \frac {i}{\alpha} \frac{\partial}{\partial x'} $

So, the matrix  $p$ can be identified, in the space spanned by the eigenvectors of $x$, as

$p \equiv - \frac {i}{\alpha} \frac{\partial}{\partial x}$

Now, suppose the eigenvectors of the  $H$ matrix are the $|E_i>$ , so we have

$ = $

$= \left( - \frac {1}{2 m \alpha^2} \frac {\partial^2}{\partial x^{'2}} + V(x') \right) = E_i |E_i>$

This is Schrodinger’s equation, if we make the interpretation $\alpha \equiv \frac {1}{\hbar}$

Apart from the mental leap to make from treating $x, p$ as a continuous set of variables to treating them as matrices (apparently that was considered higher mathematics in the early 1920s), the flow seems pretty straightforward.

To see Nima Arkani-Hamed talk about the phrase “conservative-radicalism” and other interesting topics, see the YouTube video here.