Qualifying Players in Long Term NHL Adjusted Plus Minus Regression


Hey everybody. I’ve been trying to figure this out for a couple weeks and I’ve had no luck…

I’m running a 10-year adjusted plus minus regression for NHL players similar to the one described at the end of this article (https://ownthepuck.wordpress.com/2017/01/21/hero-charts-frequently-asked-questions/) and this article (http://www.sloansportsconference.com/wp-content/uploads/2011/08/An-Improved-Adjusted-Plus-Minus-Statistic-for-NHL-Players.pdf). This is similar to basketball except offense and defense are separated in the regression (each shift is duplicated rather than giving offense a 1 and defense a -1… if that makes sense).

Does anyone have any suggestions for how to determine the time on ice (total minutes played) cutoff to use for including players in the regression? There needs to be a cutoff in a long term regression like this, but I have absolutely no idea how to determine it.

To start I used a qualified definition that estimates the top 13 forwards and 7 defensemen from each team per season (4.1/5 minutes per team GP for F/D)… multiplied by 10. This produces 393 forwards and 218 defensemen for the 10-year regression (~3200 / 3900 total 5v5 minutes played). This seems a little too exclusive… the regression results have a mean that is higher than 0 and the sample skews higher than what you’d expect (my thought is I’m not including enough bad players due to the high minutes played threshold).

The only thing I’ve found to reference is the 14-year RAPM sample Daniel Meyers used for Box Plus Minus which included 960 players, but his method for determining this is not well explained (appears to be based on removing the prior from basketball RAPM).

Anyone have any suggestions?