Since I decided to call this blog martingalemeasure it seems only fitting that the first post should be about probability; martingales in particular. In my favorite introductory book on measure theoretical probability, “Probablity with Martingales” by David Williams, we find an exercise in Chapter 10, which I paraphrase here:
Suppose a monkey is typing randomly at a typewriter whose only keys are the capital letters through of the english alphabet. What is the expected (average) time it will take for the monkey to type the word ?
This is not an easy problem. In fact it’s not entirely obvious that the average time is even finte! Williams expects the reader to solve it using the beautiful theory of martingales and in particular Doob’s optional-stopping theorem. We will calculate the result below, but stop short of a proof. (There are many proofs of this result already on the web. A quick search for “monkey abracadabra” will provide several links).
The purpose of this post is to try to provide the intuition behind the result. We do this by comparing the expected time it takes for the monkey to type with the expected time it takes to type the first eleven letters of the alphabet .
It’s almost always a good idea when trying to solve a problem to first attack the simplest case you can think of that still embodies the essence of the problem. For our problem we can certainly benefit both from picking shorter strings, and by choosing a smaller alphabet. The important distinction between and is that the former starts to repeat at charcter 8, i.e has a repeated sub-string. So if we look at versus we have a much shorter word that has the repeated sub-string . Since we lose nothing by restricting the possible letters to one of ,, or ; we restrict our alphabet to these three letters.
Ok, let’s use martingale theory to solve the simplified problem and then apply the result to the bigger problem. As is often the case in probability theory we will think in terms of gamblers making bets. Suppose that just before each keystroke made by the monkey, a new gambler shows up at a casino and employs the following betting strategy. She wagers one dollar that the monkey will type an , the first letter in the word. If she wins, the casino pays her fair odds and she receives dollars. She then bets the entire dollars that the next keystroke will be a , the second letter. She continues until either she has lost her intial dollar or the word is typed in full; in that case she wins dollars, for a profit of .
Each individaul bet is fair, that is, has an expected value of zero. Indeed, our gambler has a rd chance of tripling her wager, say for profit of and a rds chance of losing it. Therefore, her expected value is . Hence, the entire strategy is fair, the gambler on average will not make or lose any money. If we now consider the total amount won or lost by the casino at any point in time, then that too must have expectation zero, since each gambler is playing a fair game, so is the casino as a whole. I should probably mention that the stochastic process describing the profit of the casino is a martingale, which is why we can apply the necessary mathematical machinary we need to solve this problem.
Here comes the big trick, when we say that the casino has expected gains of zero at any time, we mean random times as well. Including random times like the first time the word is typed or the first time the word is typed (technically time is called a stopping time and needs to satisfies one of the sufficient conditions of Doob’s theorem). We are almost there, all we need to do is calculate the expected gain by the casino at times and and we will be done. Let’s do first. The first time the monkey types , all of the gamblers except the 3rd to last will have winnings of zero. The third to last will have winnings of . The total amount wagered will be . So in order for the game to be fair we must have,
Hence, by the linearity of expectation,
But what happens when we analyze the string instead? At time , not only has the third to last gambler 27 in winnings, but the last gambler has 3. Since she wagered one dollar on the event the monkey typed an A and the monkey did. Hence,
Wow, the average time for the monkey to type is actually longer than the average time to type .
Back to ABRACADABRA
It should now be straight forward to calculate the expected time it takes for the monkey to type and . We simply compute how much money the casino owes the first time the monkey types the word. In the case of the only winner is the gambler who started 11 keystrokes ago. She grosses so that turns out to be the expected time. But in the case of the gambler who started playing 4 keystrokes ago has winnings of and the gambler who bet on the last keystroke won , hence for ,
I don’t know about you, but I find this result is counter-intuitive. I expected it would take the same time on average, or if anything shorter to type , since at the 8th keystroke the monkey types an A starting the sequence again. Here is where the simplified case using and can help us get a better grip on what is really going on. In these cases we can easily caculate some probabilities. Let’s look at all of the winning strings after 5 keystrokes. We use * to mean any character in the alphabet, in this case one of , , or . Here are the three events that represent a case where the monkey has typed the string in 5 keystrokes or less:
A B C * * * A B C * * * A B C
If we want to know the probability that the monkey typed in the first 5 strokes, we need to take the probability of the union of the three events above. By looking at the third character in each string, we can see that the three cases are mutually exclusive. Therefore, we can just add their probabilities. The probability of the first case is clearly so the probability of the union is . In other words the probability that the monkey types in 5 keystrokes or less is . On the other hand for the case we have events:
A B A * * * A B A * * * A B A
And again the probability of each event is , but we can no longer add the three events since they are no longer mutually exclusive. We can see that the first and third cases overlap, they both contain the sequence:
A B A B A
And that is the only string that they have in common. So by the inclusion – exclusion principle we need to subtract that case out to get the probability as Which is lower by rd.
And now it should be clear why the average time for the monkey to type a word is longer if the word has repeated sub-strings. For any given time the probability of typing the string with repeated sub-strings is lower so it must take longer on average for the event to happen.