Ergodicity Economics and von Neumann-Morgenstern utility
I have seen claims that the von Neumann Morgenstern utility theorem (vNM) grounding Expected Utility Theory (EUT) voids Ergodicity Economics (EE) and time averages, including appeals to authority about von Neumann being an expert on the idea of (non)ergodicity. I think that misses a subtlety; in what follows I have tried to avoid any use of "right" or "wrong", instead trying to emphasize the "usefulness" of modeling assumptions given the situation.
The vNM theorem guarantees the existence of a utility function IFF four axioms (termed {completeness, transitivity, continuity, independence}) are satisfied. Quite reasonable on the face of it, but the theorem implicitly assumes the existence of a probability distribution, and doesn't say anything about where the probabilities come from. That choice is a separate modeling assumption when applying the vNM framework to a specific situation, and presumably a crucial implicit assumption underlying EUT, as commonly used.
Consider the simple experiment of repeated tosses of a fair coin, increasing or decreasing the agent's wealth by \(10\%\) on each turn1, i.e. Geometric Brownian motion (without the drift, for simplicity). What probability distribution over the set of trajectories is appropriate to capture the agent's behavior? Note that vNM has nothing to say about this – leaving it up to the modeler to choose judiciously. That choice implicitly decides between time and ensemble averages (or whatever else kind of average).
If we expect this game to be played a small number of times compared with the number of trajectories (eg: a billion = \(2^{30} \ll 2^{100}\), for 100 sequential coin tosses), then the distribution of "typical" trajectories concentrates around the median case (expected number of H and T) with probability approaching one, leaving the agent almost no chance to probe the extreme scenarios (strings with mostly H or strings with mostly T). This concentrated measure – which describes how the (available "small" number of) agents/plays will "probe" the trajectories, and leads to what we call "time averages"2 – is very different from the distribution corresponding to ensemble probabilities of trajectories, where each trajectory is equally likely (for a fair coin).
The only way to realize a situation which circumvents this concentration of measure is to live in the regime where the number of (parallel) plays far exceeds the number of trajectories. In such a scenario, it will turn out that the distribution probed by the ensemble of agents/plays corresponds to what we call the "ensemble average"3 among the trajectories.
To be even more thorough, we care about the probability distribution not in the space of trajectories, but the induced distribution on the observable (i.e. the space of payoffs) – which is the quantity we are actually averaging. These two distributions could look very different eg. in cases where a small set of trajectories in the ensemble have outsized (exponentially large) payoffs! IFF we happen to be in the latter situation (depending on the observable/payoff, and the process) we may call the {observable, process} combination "ergodic".
1 Ergodicity and economic modeling
EE is not in opposition with the vNM framework! Rather, the point of EE is that one can avoid performing complicated gymnastics with utilities/payoffs to model behavior, if one is instead more judicious about the probabilities multiplying them. That is the role played by time averages instead of ensemble averages.
If one were only interested in modeling a single problem, the prescription of distribution is about as much input as a prescription of utilities – they both beg the question equally, in a sense. The true value of EE comes from its potential generalizability to modeling multiple problems with the same prescription of averaging if it turns out that the prescription of time averages is sufficient to correctly model behaviour using simple/obvious utility functions, rather than cooking up a different utility function to model the behavior in each problem.
2 Ergodicity and physical systems
We have all the pieces in place to understand the roots of ergodicity in physical systems, so we might as well touch on that. (This section is a digression from the economics discussion, and can be safely skipped by readers primarily interested in that.)
The ergodic hypothesis claims that (even) a single agent (i.e. a physical system in a given macrostate) will (over a reasonable time scale) uniformly explore the microstates in the ensemble corresponding to the macrostate (through natural dynamics). The only observables that can be probed by experiments on a single system (or a few systems) are time averages, but the ergodic hypothesis allows those to be equated with ensemble averages (which are far easier to compute). For this replacement to be a useful model (at least in non-pathological systems), we don't really need the system's trajectory to probe all the microstates, but merely to probe the phase space volume densely4 and uniformly. Loosely speaking, chaotic dynamics can induce a dense sampling, while Liouville's theorem guarantees uniformity along the trajectory. There are caveats to that statement, whose subtleties I haven't fully mapped out yet; suffice to say that there are interesting examples where this doesn't trivially happen, such as the Fermi–Pasta–Ulam–Tsingou problem or glassy dynamics.
Footnotes:
Somewhat like the St. Petersburg paradox, but not quite.
The concentrated measure is basically the model underlying the Kelly criterion.
This is the commonly (implicitly) assumed situation in probability theory – where the number of trials (far) exceeds the number of events underlying a random variable (validating a frequentist perspective) – under which ensemble averaging turns out to be a meaningful/useful concept, because of the concentration driven by the law of large numbers.
Like the rationals among the reals.