On the radio JT and I typically talk some baseball, because we are both baseball fans, and sometimes you get tired of talking about wage growth, or lack thereof. So for fun I simulated hit streaks in baseball. I assumed a few things, as we must always. I assumed four at bats per game and I simulated 1,000,000 games. I also took as my career batting average Joe DiMaggio’s career average of .325. This will end up being an important cutoff for my conclusions. Each at bat is an independent realization and a player only needs one hit in a game to add to the streak. My premise is that the hit streak for Joe DiMaggio is a monumentally impressive feat, but is not a statistical impossibility. Think more in line with improbability, like Douglas Adams’ Hitchhikers Guide to the Galaxy.
So what do the results look like?
There are a huge number of short streaks as you might expect. The numbers drop off greatly as you approach 20 consecutive games with a hit. The scale of the count makes it difficult to see what we encounter on the high end for the streak. I decided to take a subset of my simulated data equal to and above 20 consecutive games with a hit.
This gets a bit better and we see there are certainly some streaks above 40 and some above 50 even. In fact, my simulation gives more than 20 streaks of 50 or more, with the highest five equal to 56, 57, 58, 59, and 60. So I have exactly five streaks at or greater than the longest recorded streak. So let us put these numbers into some perspective.
To date there are 33 players with a .325 career batting average or better. Collectively they played a total of 70,251 games. I simulated 1,000,000 total. That is I simulated 14.23 times as many games as has already occurred in the historical record. Said another way, my simulation says one of these streaks could occur on average every 200,000 games (not really but that is an average) which is still almost three times what already occurred.
70,000 games is a lot, do not get me wrong. However, in the scale of how we separate “impossible” and “highly, highly improbable” in a statistical sense, it is not enough. So the question on everyone’s mind (or at least JT’s) is when will this happen again. No clue. This is not that kind of simulation exercise and I question if I would want to know it anyway.