A few years ago, I posted a data analysis of the progression to world class for female 5k runners. I’m an inner data nerd, the kind of kid who grew up calculating baseball stats and looking for patterns before we could easily pull all that data off the web. I’m not a stats guru by any means, I like to keep things simple. What I really like to do is see how stats match up to the real world.
A few weeks ago, I posted an article on how certain governing bodies utilize progressions of athletes from year to year to make decisions on athlete funding. Often, they determine whether an athlete gets governinmental funding based on if they are hitting a certain pathway towards some measured success (i.e. getting a medal or making a final). But it’s not just funding that is impacted. Team selection is too. Countries like the UK will deny an athlete who places 3rd at their national championship and who has the standard a place at the Olympics. Why? Because they don’t ‘project’ them to be of the caliber to make the final or win a medal. As I hope you’ll soon see, unless that athlete is in their mid-30’s, that statement is incorrect.
So what I did was pull out the top ~50 female 800m runners from the US, Canada, and the UK. Everyone of them had run 1:59.6 or better. Why these countries? Because I had the data and I knew well enough who the athletes/coaches were. This is important because I wanted to increase the liklihood that the athletes I chose were relatively clean. So eliminated from this group are athletes who tested positive or had an association with a coach who was even considered “shady”. There are no Regina Jacobs or Mary Slaney types on this list.
Average age of PR: 27.0 Standard deviation (2.94)
I broke the data in a coupe of different ways. Since I was interested in progression, I normalized everything to percentages of that individuals PR. So if they ran 1:57.0 at age 28, then 1:57.0 is equal to 100%. This way we could see progressions to their individual best.
Percentage of best at each Age:
Now this first graph represents the AVERAGE for each individual at each age point. So we what see here is the age on x axis and the perecentage of their best on the y-axis. The data reaches a peak at the ages of 25 and 27 where athletes averaged out to be at 98.73% of their peak. A word of note is that the data at extremes (below 18 and above 35) should be ignored for the most part. We don’t have a full set of data at those extremes, as its before most athletes run fast enough to have full data or at an age when most of the athletes are retired. But between 18-33 range, we have data points for a large number of the group.
So what do we notice? On average, a nice progression from the teen years until we reach peak years where between 24 and 31, the total average is better than 98% of their PR.
Also of note is the amount of variance. If we looked at standard deviation for each year, we would see:
What we can see is that there’s a larger variation until we get into our mid 20’s and then we see the variation drop. Which is to be expected.
So what we are left with is a nice little progression. But what if we looked at the individual data?
Let’s start with the model that we’d get if we utilized the individual data per year:
What I think we see here is a clear showing of how the data tends to cluster at the mid-late 20’s, but there is a wide variation of when people are at their best. If we devle into the individual progressions, things get a bit messier:
To not overwhelm you, these graphs both contrain the progressions of 10 different individuals. What you see here is that our nice smooth trend is gone. Yes, we can see how we might have averaged out to the numbers we got, but what we see is that on the individual level, there are many roads to peak performance. The individual path to get there can be a sloppy, crazy mess.
Years until Best:
But what if we looked at it in another way. Instead of using age as the classification, what if we were simply interested in what the athletes journey to their lifetime best looked like?
What this graph shows you is the athletes progression in the years before they reach their lifetime best (100%). It doesn’t matter if they hit it at 35 or 25, what the X-axis shows us is the years prior to peak.
What this graph shows us is that, once again, on average we have a fairly neat progression, but once again, if we look at a select few individuals, the picture is pretty drastic:
What we are left with is a demonstration of a wide range of progressions, especially several years out.
Regression to the mean: What about the years after your best?
Another topic, I find interesting is looking at what happens after an athlete reaches their best. What I’ve modeled below is the average for the 2 years before and 2 years after an athlete set their lifetime PR. What you see is a nice rise to the top, then a nice regression to the mean afterwards.
What do these numbers represent?
If we say our athletes lifetime PR is 2:00.00 then the 2 years before and after would look like this:
2 years before: 2:02.50
1 year before: 2:01.70
PR year: 2:00.00
1 year after: 2:01.35
2 years after: 2:01.95
What you can see is that, on average, people make a relatively sizeable jump to their lifetime best. 2:01.7 down to 2:00.0 is very significant and an average jump that I think most people would be surprised by. Which Is why it particularly unerves me to see governing bodies acting like they can predict breakthroughs.
If we look at the individual data (a sampling above) we can get an idea of how variable this process is. Although most are clustered around the high points, we have some who make dramatic improvements leading ot their breakthrough, while others are at 99.9% the year before. On the other hand, some people maintain performance the following year very well, while other begin a fall off the cliff.
If I were to simply utilize the average data presented here, one would think we could reasonably model performance improvements.
That would be a mistake. If we were looking to use performance mdoeling, we’d have to use much more sophisticated models, which exist in our sport, but are rarely utilized and come with their own problems.
The point is this, stats are a great tool if used correctly. On a group or population wide level they can be used with great effect to identify on average when things occur. On the individual level, unless we are utilizing models that are sophisticated and giving us liklihoods of performance levels reached (i.e. if you’re at 90% of target at the age of 33, chances are slim…), we have to be extremely careful. As evidenced by the Nicole Teters of the world who at the age of 25 still only had a PR of 2:02.56, but ended up running 1:57.9. One of those times barely squeeks you into the Olympic Trials, the other can win medals. Or perphas even more out there a Jen Toomey, who ran her PR of 1:59.6 at the age of 32, after only having a PR of 2:14 five years earlier at the age of 27. Even being new to the sport, no one would have predicted a 2:14 800m runner who is 27 would run sub-2 shortly after.
Why is that important? Because at the world class level, we are dealing with outliers. People who, by definition, defy norms.
What I take away is that there are many roads to Rome. There are athletes who have a nice steady progression, those who jump all over the place, and those who suddenly take up an event at a later stage and master it quickly. It varies. There is no magical age when you will hit your peak, that seems to vary as well. Obviously, it won’t happen when you are 45, but I’d feel fairly good if I was a 20-35 year old that any year could be your best.
You never know what year your best will occur.
Finally, I think progression is a misunderstood concept. We tend to put people in boxes based on what they have run. You are a 2:00 runner or you are a 2:05 runner, and as coaches and athletes we start to believe these boxes we are put in. I think this is the beginning of the end of progression. Once you put yourself in a box, or are assigned one, it limits the vision of improvement. Which is, again, why I think it’s so counterproductive and harmful for governing bodies to prevent athletes who have hit standards from going to the Olympics or World Championships.
It puts people in boxes. You just simply don’t know who is going to be your next Maggie Vessey, who has a best of 2:02.01 at the age of 26, and runs 1:57.8 the very next year.
Data sources: all-athletics.com, USATF, and IAAF.