# Classifying Runners- Fun with numbers

In my book, I had an entire section on individualization and how to classify runners. Most coaches rely on simply splitting Instead, I tried to expound upon a model using a Fast Twitch vs. Slow Twitch fiber continuum for each event. Where we expand and instead of classify someone as a 5k runner for example, we consider him a fast, slow, or specialist 5k runner.

To figure out how to classify someone, I suggested and have always used a combination of a few factors including PR comparison, lactate levels, stride mechanics, and so forth, which has given good results. But I wanted to see if there was a more quantifiable way, using PR comparisons in particular.

Usually when we compare our PR’s we are looking at how strong they are compared to each other. So we can see whether our 800, 1,500, or 5k is comparatively stronger than the other. We can use calculators that predict our races and see what ones we are closest and far away to get an idea or we can use tables like the IAAF’s to compare the relative “strength” of each PR.

These are all good methods, but again, there’s a bit of guesswork of what’s your strongest event and how much stronger or weaker it is. So what I wanted was something quantifiable and objective.

__Speed Preservation__

When we look at each runner, when we are comparing how fast there 400m PR is to their 800m, we are really looking at how much is preserved as we move up in distance.

Speed Preservation- Percentage of speed retained. So if we run 1,500 at 60sec pace and a 3k at 65sec pace, we preserve 92.30% of that speed for the 3k.

If you are a running training history buff, you might know Frank Horwill’s 4sec rule. Where in an ideal world, every time you double the distance you should be able to run the next at 4sec slower per 400m. The problem with this it’s a rule of thumb. And I wondered if that actually held up and whether we’d actually see variations in how much speed is perceived for each runner for each distance. If there was variation then we could flip the script and use it to predict what type of runner each person is.

I’m not a math or statistics guy, but what I did was take a database of 1,500 Professional, college, and British runners using data collected through the years on other projects to see what the average speed preservation numbers were for each. (Just a note, why these three groups because between tilastopaja, tfrrs, and powerof10, these groups have the most complete record of performances). Which gave me an average for the entire group for each speed preservation level.

For example, the average 3000/5000% was 96.61% with a standard deviation of 1.08%.

So, for example if we have an 8:00 3k guy, if he held onto the average speed, he would be a 13:47 runner. But we all know that not every 8 flat guy can run 13:47. Instead we have variation within that amount of speed preservation. Which made me ask the question of whether a 5k/10k runner would have better speed preservation when moving up in distance versus a 1500/5k runner. We assume this is true, and experience tells us it is, but can we map out and quantify it.

So what I did was separate out the runners based on their primary event and map out the differences between them to see if we could see a pattern of speed preservation for each event. And it turns out that you can.

So with a few calculations and a bit of messing around, what we’re left with is for all groups, this is the average speed preservation for each group

As you can see we can see distinct variations in speed preservation for each group from 800- ST 5k (the 10k is a projection because of insufficient 10k/marathon specialist data).

The interesting thing is that as we go up in endurance type we see higher and higher speed preservation numbers, regardless of distance. What changes within the groups, is the degree of drop off from one event to the next. For example, we can see that 800 or ST 800 all have pretty big drop offs when going from the 1,500 to the 3k for example. We can do some pretty cool stuff with the variance between each groups, but that’s for another time.

__So What?__

The cool thing is we can flip things on their head and use this standardized data for each “group” to classify runners. We can make it quantifiable.

Let me give a quick example.

Let’s say we have a runner whose PR’s are the following:

400-50.5

800- 1:53.0

1,500-3:46.0

3,000-8:04

5,000- 13:56

Then we know their speed preservation is:

400/800%- 90.1%

800/1500%- 93.75%

1500/3000%- 93.39%

3000/5000%- 96.49%

1500/5000%- 90.11%

We can take these percentages and see where they fall upon the line of pure Fast Twitch 800m to Slow Twitch 10k. Essentially, what we’re trying to do is find the best fit for this runner.

In this example our runner gets a classification score of: __4.10__

What the heck does that mean? It’s based on a classification from 1 (pure 800 FT) to 7 (pure 10k ST).

Where that puts this runner is right in the ST 1500/ FT 5k zone.

What that means is that right now, that athletes PR’s fit the line of a FT 5k runner best at the current moment. So we know how to train at the moment, and that if we are training for a 10k for example, we might need to shift the work to try and slightly move them more towards the speed preservation of a 5k specialist.

The next step is to add a bit to the database to clear up the data and then to compare levels of performance to see if higher level runners actually have better speed preservation.

Do we see greater preservation of speed as you increase the race distance of the group, or just a decreased ability to tolerate or prevent changes in muscle pH (which could be the result different training protocols by the different groups) and subsequently a decrease in 400/800/1500m times relative to their 5k/10k times? Interesting.

It would be hard to do with elite athletes, because they generally pick one or two events to specialize in, but I would like to see correlations between best 800m times and 5K/10K/Marathon times.

Speed preservation (or loss) is actually what is measured by the index of endurance (IE, Peronnet and Thibault). One value of IE can summarize the information contained in one of your curve, which may help simplifying the analysis (one number rather than a curve). Basically, knowing the IE allows to classify runner according to their more or less strong endurance (capacity to preserve speed when distance increases). It would be interesting to compute the IE for each runner in your database and analyse if and how it depends on the profile of the runner (I expect a strong dependence given the curves you showed!). It would also be interesting to track the IE of a runner who changes its focus from short to long distances (or the reverse) to measure how change in training can shift the endurance of a runner, to what extent ? I also wonder to what extent the IE is dependent on the level of performance of runner ?

very cool. makes me think of the CP curve for cycling

You say

(Where 1= 15/5k % 2=800/1500% 3= 1500/3000% and 4= 3000/5000%)

don't you mean

(Where 1= 400/800% 2=800/1500% 3= 1500/3000% and 4= 3000/5000%)

Where does the group label come from? Do those database already classify those runners as “800” vs. “800ST”? Or did you categorize these data based on your definition of “800” vs. “800ST”?

I like the idea as far as knowing the runner is concerned. But I think that there is a big factor that needs to be considered. That is if the data is good data or false corrupt data. I would say its extremely corrupt because runners specify. They train for a specific distance and then they race that specific distance. Maybe a 800/1500 jumps in an early season 5k after he finishes his base phaze and runs 13:50 the next week he run his first 1500 in 3:42 but then he gets used to racing, gets in the correct workouts and by the end of the summer he runs a pb of 3:35. So now he has pbs of 3:35 and 13:50. well If he would have focused on the 5k he might have run 13:20-30 and if he would have had a more 5k based offseason build up he might have run 13:10. I think the idea of this analysis is cool but I think its impossible to have accurate data for performing a correct anaylsis. But please feel free to debunk my opinion