Paradox: In a game show there are two closed envelopes containing money. One contains twice as much as the other (both have even amounts). You [randomly] choose one envelope (denoted EnvelopeA) and then the host asks you if you wish to change and prefer the other envelope. Should you change? You can take a look and know what your envelope contains.

Paradoxical answer: Say that your envelope contains $20, so the other should have either $10 or $40. Since each alternative is equally probable, the expected value of switching is 1/2 $10 + 1/2 $40 which equals $25. Since this is more than your envelope contains, this suggests that you should switch. If someone else was playing the game and had chosen the second envelope, the same arguments as above would suggest that that person should switch to your envelope to have a better expected value.

——– My analysis ——-

Fundamentally, the whole thing depends on the probability distribution of X, where X is a secret number printed invisibly inside both envelopes and denotes the smaller of the 2 amounts. When the host presents you the 2 envelopes, X is already printed. X is “generated” at beginning of each game.

D1) Here’s one simple illustration of a distribution — Suppose a computer generates an even natural number X, and the game host simply puts X dollars into one envelope and 2X into the other. Clearly X follows a deterministic distribution implemented in the computer. X >= 2, and since a computer has finite memory it has an upper limit on X. We should also remember any game host can’t have a trillion dollars.

D2) Another distribution — the game host simply think of a number in a split second. Suppose she isn’t profit driven. Again this X won’t be too large like 2 million.

D3) Another distribution — ask a dog to fetch a number from the audience, each contributing 100 even integers.

In all the scenarios, as in any practical context there’s invariably a constraint on how large the integer can be. We will assume the game host announced X <= 1000.

Why we talk about “distribution” of X — because we treat X as a sample drawn from a population of eligible numbers, which is fundamental to D1/D2/D3 and all other X-generation schemes. Some people object, saying that population is unlimited, but I feel any discussion in any remotely practical context will inevitably come up against some real world constraints on X. A simple yet practical assumption is that X has a discrete uniform distribution [2, 1000]. As soon as we nail down the distribution of X, the paradox unravels, but I don’t have time to elaborate fully. Just a few pointers.

If we don’t open our first Envelop (A), then we must be indifferent, Regardless of distribution. If we do open, then long answer. Suppose you see $20.

If we have no idea how X is generated, we can safely assume it has an upper limit and a lower limit of 2. Still it’s insufficient information to make a decision. Switching might be better or worse or flat.

If the number generator is known to be uniform [2,1000] switch is profitable(?). If we saw $2000 (2 times the max X) we don’t switch. Any other amount we see in EnvelopeA, we switch (?). So most players would open first envelope and switch!

When we see $20 and still feel indifferent, we implicitly assume X distribution follows P(X=10) == 2*P(X=20). Not a uniform distribution but definitely conceivable.

———-

Let’s keep things even simpler. Say we know X has some unknown distribution but only 3 eligible values [2,4,8] and we see $4 in A. Why would some players feel indifferent after seeing $4? If an insider reveals the X-generation is heavily biased towards low amounts, then we obviously know B is highly likely to be $2 not $8 so don’t switch.

Extreme case — P(X=8) is near zero. P(X=2) is 10 times higher than P(4), so most players would see 2 or 4 when they open EnvelopeA, and very few see a $4 as the “small brother of $8”. After you see enough patterns, you realize A=4 is most likely to be the bigger brother of a $2.

Therefore the player indifference is based on no rational thoughts. Indifference is unwise (i.e. ignoring key signals) when we see a lot of pattern about the X-generation. You can probably apply conditional probability treating the pattern as prior knowledge.