Hello. Welcome to my test kitchen.
Here at #fancystats, our goal is to serve you only the freshest and most piping-est hot statistical goodness we can find. We source all of our ingredients from the finest advanced stats purveyors in the game, combine them with our own home-grown statistical analysis, and bring it all to your table ready to enjoy and digest.
Yes folks, there will be plenty of stats and plenty of food puns to be had this season. After this wild "summer of analytics," rather suddenly, analytics are everywhere.
A veritable bounty of numbers are now out in the public eye, ready for you to review, consider, argue about, denounce, and ultimately hate me for. With the growth in popularity of advanced statistics, there is no denying the fact that some fans want to either add to their knowledge-base about what they are seeing, or simply give context and statistical backing to the conclusions they make while watching the team play.
As an aside, I don't particularly like calling this stuff "advanced stats," nor do I care for the #fancystats moniker. It creates this air of inaccessibility which, if you actually get into this stuff, shouldn't be there. A lot of the underlying numbers really aren't all that advanced or fancy.
I prefer "underlying numbers" or "metrics," but it's tough to change the lexicon, or resulting attitudes, overnight. Basically, it's about having the puck, and trying to score with it. Pretty simple.
Why Should I Care?!
You might care because possession stats correlate closely to winning games, and Stanley Cups.
You might care because possession stats have been shown to have more predictive value of a team's future performance than wins.
You might care because possession stats predicted the crash-and-burn of the 2013-14 Toronto Maple Leafs well before it happened. And then the Leafs hired a stat-savvy assistant GM and an analytics department as a result.
Maybe you just care because some of this stuff, like Corsi, gives you some water cooler fodder. If used right, it can help give some idea of overall performance. It's one more tool to help explain the game you love.
Some will give absolutely zero cares about this stuff. Which is totally fine. Ultimately, nobody should tell you how to enjoy your fandom. But whatever your interest or disinterest in the numbers, they are certainly relevant to what you are watching. And they are becoming increasingly unavoidable for even the most casual or numbers-averse fan. Even if some of the best statistical thinkers have been snapped up by NHL teams with an eye on closing off public data and idea sharing.
It isn't expected that everyone will develop a taste for this stuff. However, if you do find yourself face-to-face with a Corsi-wielding hockey hipster, I hope that this introductory guide, and the tracking that myself and others will be taking on all season, will help you to at least dabble when needed or desired. Hey, maybe you will start really looking at this stuff, weighing the numbers against each other, and fend off those who pretend to know what's going on. Maybe you're the next great NHL stats guy, and you didn't even know it!
Ok, ok. Let's not get carried away.
This article is not meant to be an in-depth discussion of underlying numbers. Instead, it's just a brief road map and glossary to get you acquainted with some of the most prominent terms and ideas you will hear about throughout the season. The basics you will need to skate by. I'll also provide a FAQ which will be continuously updated.
Throughout the season, we will go in-depth with a number of different features, which will all be archived at http://www.nucksmisconduct.com/advanced-stats in addition to their normal feature position on the NM homepage.
Canucks By The Numbers:
These articles will feature game by game stat-packs along with analysis utilizing conventional and "advanced" metrics. The goal being to provide readers with a greater understanding of the who, what, where and why of the Canucks' performance, their opponent and the game. Canucks By The Numbers will feature some of the things below and vary in form a bit, game to game.
Possession and Shooting Stats: I will be tracking the full battery of possession metrics...all the hits. Corsi, Fenwick, With or Without Yous (WOWY), shot charts and more. A glossary of common stats below. These will be housed within "Canucks By The Numbers" on a game-by-game basis, and I will also from time to time post a season update that shows season totals to date.
Zone Tracking: I will be manually tracking zone entries and exits for the Canucks and their opponents. This will be utilized to get a better understanding of individual and team performance and tendencies based on their movement and decision-making in each zone. For a more in-depth understanding of zone tracking, check out Corey Sznajder's (@shutdownline on the twitter) work from last season here.
Loose Puck Battle Project: To date, one area of the game that has been woefully under-analyzed has been determining the value of winning loose puck battles (and how they are won or what is done with the puck after winning them). You hear coaches talk about the 50/50 battles and loose-puck battles all the time. It is a standard part of the discussion between players and coaches on a daily basis. It's also what many in the game, eye-test proponents and more traditional scouts or analysts often refer to as "compete level." This season, I will be tracking loose-puck battles (primarily 50/50s between opposing players). Just like zone tracking, this is manual labor. It is arduous and time-consuming. Please visit my post on this project for more details (to be linked soon).
Setup passes are defined as those passes which result in shots on goal, and can help determine which players are most adept at setting up shots in a way that's independent of shooting percentages.Unfortunately this statistic is not recorded by the NHL (or anyone else). It is therefore available only as an estimate based on the player's (primary) assists and the average shooting percentage of his linemates. To some people that's a tremendous turn-off, even among those who think the concept is excellent. If you are of that mind, think of it only as a handy re-presentation of primary assists and on-ice shooting percentage.
I will be manually tracking actual setup passes for the Canucks and their opponents this year. In this way, I can turn the representation into a reality and we can see just how great certain passers are.
Goaltender Tracking: I will provide game-by-game info on performance, and rolling tracking of performance, utilizing traditional statistical measures along with some of the new measures like EVGSAA/60, which I previously introduced here.
Goal-Scoring Play Tracking: I will be manually tracking every goal-scoring play from first controlled touch to goal for the Canucks, and possibly their opponents (time permitting). The purpose being to get a better understanding of how individual and team offensive performance and systems work to create goals. As the season progresses, we may see patterns. Maybe we won't. But I'll post many of the more interesting plays and provide takeaways for you all to ponder. I would be remiss not to point out that the genesis for this project was inspired entirely by @kid_ish, and his better half @_wordgirl. Kid Ish is a wonderful hockey mind covering the Ducks for Anaheim Calling. @_wordgirl does design that I only hope to emulate. Gonna have to up my MS Paint skillz!
Their project and more on why goal-tracking could be significant can be found here. My charts only hope to be as nice as this one - from @kid_ish w/ design by @_wordgirl:
Terms You Need To Know (#fancystats CliffsNotes):
Here's a short and entertaining crash course on advanced stats from Russian Machine Never Breaks. A glossary of terms after the jump:
Corsi measures all shot attempts (on net, missed, blocked) and can be applied to either a team or an individual player on the ice. An individual's measure of Corsi takes into account the shot attempts of all 10 players on the ice. While it does not measure possession in and of itself, it is considered a proxy because a team must have the puck to shoot the puck (or, conversely, allow the other team to have it in their D zone in order to give up shots).
Corsi = Shot Attempts FOR - Shot Attempts AGAINST
In a single game it is often represented as a +/- (D. Sedin was a +3 Corsi). It is often displayed as a percentage for the season (CF% or Corsi%). If you see a number above 50%, it means that while the player is on the ice his team is producing more shot attempts for his team than against. You should care because there is a correlation between players who have higher Corsi and their ability to outscore the competition when on the ice.
It is important to consider the context that shapes Corsi. That is why it is valuable to calculate Corsi by situations such as Even Strength (EV Corsi% or 5v5 Corsi%). 75% of all shots come at even strength and doing so, in the least, removes special teams, which tend to skew shot attempt numbers for obvious reasons. Click here for a solid read on the uses and limitations of Corsi from our friends over at Arctic Ice Hockey.
Fenwick is the same thing as Corsi, but without blocked shots. The logic is that a blocked shot is not really a scoring chance, as the shot never has a chance of getting to the net. It is thus possible that blocking the shot is a skill, and not just a random event. Fenwick is commonly displayed as a percentage (FF% or Fenwick%), and similar assumption can be made for a player falling above or below 50%.
Fenwick = (SOG FOR + Missed SH FOR) - (SOG AGAINST + Missed SH AGAINST)
Because Fenwick reduces the number of total events in the calculation, it isn't quite as useful in smaller sample sizes such as individual games or partial seasons. IE: we want the most data possible, and Corsi gives us more data. For more on Fenwick vs Corsi, this is a good starting point.
Measures the difference in even strength Corsi between a player's on-ice performance and his team's performance when he's on the bench. If the player is +5.5 Corsi Rel, it generally means that he is driving possession 5.5% better than his team can without him.
"Luck Stats" - On-Ice Sh% For, On-Ice Save %, and PDO:
On-ice shooting percentage shows what percentage of shots are going into the net when a player is on the ice. Not just his shots, everyone on his team, or the opposition's team. On-ice save percentage shows what percentage aren't going in your net. PDO is simply both of those things combined together (note: PDO stands for nothing at all, it's just named after someone like Corsi and Fenwick).
The reason these are called "luck" stats is because the elements of each aren't totally in the individual's control or necessarily affected by the player's talent or skill. And because sometimes the puck will bounce for you, or against you.
PDO = On-Ice Sh% + On-Ice Sv%
The stat is often represented as a percentage or a 4 digit number.(100% or 1000). It is best measured at 5v5 to reduce variables.
The league averages a tick under 8% even strength sh%. And therefore it averages a tick over 92% even strength sv%. This means that if a player has a PDO above 1000 (or 100%), they might be getting a bit "lucky." Or if below, a bit "unlucky." However, it is important to note that it's really all about whether the PDO is substantially lower/higher than the individual player is accustomed to being over time. Some players are always above 1000. So if in one season they are suddenly at 980, we can assume that something has gone awry while he has been on the ice. It could be his goaltending behind him. It could be the sh% of himself and his teammates. For a great article on percentages and shot quality, click here.
Basically, score effect means that a team will play differently depending on how far ahead/behind they are in a game. Sometimes (often), a team that is up by several goals will sit back and play with the puck less (whether rightly or wrongly is a different discussion).
For this reason, Corsi is often broken down to Corsi Tied or Corsi Even (when game is tied), Corsi Ahead, Corsi Behind, and Corsi Close (game is +/- 1 goal or tied during the first two periods, and tied in the third). This helps eliminate, or illuminate, score effect biases in the numbers. It is especially worthwhile early in the season, when the sample size is still small and more affected by such a bias.
This is another context stat. Offensive Zone Starts (OZS% or sometimes simply ZS%) shows what percentage of the player's on ice deployment to start a shift is in the offensive zone. Defensive Zone Starts (DZS%) shows what percentage of shifts start in the defensive zone. Basically, if a player gets less offensive zone deployment, we should expect that it will negatively impact his offensive production.
Quality of Teammates:
Again...context. QualTeam the quality of teammates the player is deployed with on-ice, based on their on/off ice +/-. If a player is deployed with weaker possession players, his performance will likely suffer (or conversely, he may raise the performance of those around him). Whereas, if he is consistently playing with strong teammates, his own Corsi may be stronger than it would be without those teammates.
Quality of Competition:
Same thing, but QualComp measures the quality of the players the individual faces. If he is consistently deployed against stronger possession-driving players, it makes sense that his own possession stats may suffer.
With or Without Yous - WOWYs:
These are my favorite comparative stats, using the stuff above. David Johnson's (@hockeyanalysis) site stats.hockeyanalysis.com is an outstanding resource for comparing how players perform with certain teammates on-ice, vs. without them.
GF/60 and GA/60:
goals for/against while a player is on the ice per 60 minutes of ice time. Even strength and non-empty net situations, only.
Context Is Everything
Most of the stats we calculate are based on information that is tracked and provided to us by the NHL's official scoring reports. Sean McIndoe astutely points out one of the major issues with this in his excellent Grantland piece on the "Analytics Awakening" -
The NHL uses multiple people at every game to input data in real time, and there’s a degree of between-periods quality control. But hockey turns out to be an enormously difficult game to track. Unlike baseball or football with their frequent breaks, hockey can go long stretches without a pause in the action. In the time it takes a tracker to look down and press a button on an iPad, something else could happen and get missed. And a lot of what’s being tracked is subjective, leading to significant rink bias that skews the results even further.
Though the data is input in "real-time" it is not actually real-time in the sense that it is manually tracked. In a really fast game. So stuff gets missed. I'm looking at you, hits, giveaways, and takeaways. Also, much of what is tracked is what I consider to be "result-based" stats. IE: Sedin takes a shot, shot is recorded. I want to know more about how he got into the zone, who setup the shot, etc. We simply don't have the resources, outside of manually tracking every game.
The fact is, we are at the tip of the proverbial iceberg in statistically analyzing the sport. While sports like soccer and basketball benefit from technologically advanced tracking and motion-capture devices, such as SPORTVU, hockey is still in many respects operating in the dark ages.
True analysis still requires that the analyst watches the game and does some form of analysis beyond the numbers. In order to further make sense of them. Or qualify/disqualify them. This is the only way to develop greater context. But it is important to know that while the numbers are objective, they can sometimes be given to misinterpretation, error, or confirmation bias. There is subjectivity. And there will be subjectivity in my own analysis of what the numbers convey.
But the point is to have every resource at your disposal, if you want it. So that is what we will be trying to provide this season.
Advanced Stats Resources (in no particular order):
nhl.com - the old stand-by, which has a growing range of player and team stats
war-on-ice.com - Wonderful and growing set of statistics and resources, including live game tracking and the Hextally shot tracking system
stats.hockeyanalysis.com - the preeminent advanced stats site on the web. Includes WOWYs and several proprietary stats
puckalytics.com - new project by David Johnson of hockeyanalysis.com
hockeystats.ca - live game tracking
naturalstattrick.com - game tracking
progressivehockey.com - new stats site, featuring several goaltending stats
behindthenet.ca - one of the original stat sites, and still a valuable resource
hockey-reference.com - a growing database of traditional and advanced stats
nicetimeonice.com - TOI tracker and other resources
somekindofninja.com - player usage tracker and shot tracker
sportingcharts.com - a variety of charts and graphs, and shot heat maps
This Sounds Like a Massive Undertaking. Why Bother With Any Of This?
Ha. Yea. Weeeeee! Every manually tracked game will likely take me several additional hours of rewinding and fast-forwarding to get through. Sometimes, I'll do it in close to real-time. Those instances, you may see a game stat-pack up within hours of the game ending. Other times, I'll have things to do or simply want to, you know, enjoy the game. If so, it may take me a day or two or three to put together. Or I may wait until the end of the week and put several games into one package. It will be a fluid process.
The reason I am doing all this is because (1) It is interesting to me, and (2) what we currently have at our disposal is woefully incomplete. I think it's worthwhile. I think it's informative. Given my background playing, coaching and thinking about the game...I am just really into it all.
This is a hobby of mine. I enjoy it. I know that some of the voices out there have a keen eye on getting scooped up by a main stream media outlet or an NHL team. I think that if people are doing this stuff for the wrong reasons, they are just as likely to make colossal errors in judgment as they are to uncover the next big thing.
The true purpose of tracking, calculating, analyzing and tinkering is to, hopefully, learn more about what is going on. If this summer is any indication, teams agree that there is some utility in it and are using it as another resource to help inform decision-making. If we can get a leg up on understanding something, why not inform ourselves? Hell, we might end up knowing as much or more than what the organization does. And that's a pretty neat thing. At least I think so.
Want more information on a particular topic of interest? See something you still don't understand? Need another stat defined which doesn't appear here? Contact Nick Mercadante at @nmercad on twitter or by email at nick.mercadante [at] gmail [dot] com.