Database and Methods

What makes me so smart?  Nothing.  I’m not particularly bright.  I didn’t play football at any level.  I’m not even great with computers.  But I have data, and that might just be enough.  I have logged more than one-thousand injuries over the last few seasons, and I think there’s a huge value in that.  Typically, the media likes to use those little “rules of thumb,” or heuristics, to neatly label injuries.  If somebody suffers a high-ankle sprain, a sports writer will usually just hang a, “out four to six weeks,” label on him.  I understand the need for these little shortcuts, and I’m not criticizing everyone who uses them.  As readers, we often want something quick and easy.  Also, those estimates can be a good place to start.  But they’re not really that accurate.  For example, I can look into my database and tell you exactly how many players I have logged with high-ankle sprains.  I can then tell you how many of them missed zero games, one game, two games, and so on.  I can tell you that the biggest return date for them was after missing exactly three weeks.  I can also tell you how many of them went on to suffer a re-injury, and how many of them eventually ended up on injured reserve to finish their season.  True, that data reflects only what has happened in the past, and there is no guarantee that a future player will follow those same timelines, but I still think that information is far more useful and complete than the, “out four to six weeks,” tags that we’re currently seeing.  To make matters even worse, different reporters will often give different return estimates for the same injuries.  One reporter might go with the conventional, “four to six weeks,” while another will just use a vague, “multi-week absence,” or, “week to week,” designation.  Also, if anyone can explain to me where the media came up with these rules of thumb in the first place, I’d be grateful.  At least I can show you my math to back up my estimates.

Let’s talk about my data.  If I don’t address this myself, I’m sure my detractors will, so we might as well get right to it.  My data is flawed.  I can admit that.  I have spent tons of time and effort trying to make it as accurate as possible, and I will spend even more time trying to fine tune it in the future.  But this is a noisy world to be sure.  It is often, perhaps usually, difficult to get entirely accurate injury information.  Early reports are often conflicting, even in regards to simple questions such as, “Which arm is broken?”  Players, coaches, and teams are mostly hesitant to go into detail about injuries, even long after the recovery process is complete.  There is a perceived value in keeping these injury details as murky as possible, for reasons that I will discuss later.  So what’s a girl to do?  My answer has been to pull up my pant legs and try to just wade right through the muck.  I’ve done my best to record and present this all as honestly and as accurately as possible, and I think that’s about all I can promise.  Yes, there will be some errors in my data.  But until the NFL opens up and becomes more forthcoming with injury data, there’s no better solution.  Don’t hold your breath for that day.

Also, my data outlines only what has happened in the past, not what will happen in the future.  This is a typical flash point between statheads and traditional sports fans.  Future players will not necessarily follow the same recovery timelines as past players.  In fact, with medical advances, future players will likely beat some of the established timelines.  But, as those future injury rehab timelines unfold, they will be entered into my database so as to make it more accurate.  In defense of my statistical approach, I would argue that at least my database is based on actual NFL injury rehabs, whereas the typical media recovery timetables don’t seem to be based upon anything at all.

There is a whole world of further flaws to this statistical injury approach.  Every injury is unique.  Similarly classified injuries are often different.  For example, Tony Romo’s broken collarbone was different from Aaron Rodger’s broken collarbone, even if, on paper, they seem like the same injury.  Furthermore, no two players are alike in their healing properties or how they respond to treatments.  Fair enough.  Add to that the fact that football is played with so many different players on the field, coaches on the sidelines, game plans, and game situations, and you have a lot of randomness.  For example, two running backs on two different teams might come back from a similar injury very differently depending upon how they are used and the personnel around them.  Also, due to how I have collected the data, the emphasis tends to fall on players that have actually missed games due to the injuries.  Players who crowd the weekly injury report but continue to start and play through nagging injuries are often not recorded in my database.  Those are all fair points, and I will do my best to admit to those flaws or biases when I see them (or have them pointed out to me).

But, even with all that noise, I feel that the signal is valuable and well worth the effort.  My model is not perfect.  But it is as perfect as I can think to make it, and it is far more informed than anything else being used right now.  Hopefully my readers will feel the same.