The shocking upset of the first week of the English Premier League season came in the very first match. Manchester United, whom I had suggested were headed to a bounce-back season, lost 2-1 at home to Swansea City. Manager Louis van Gaal has responded to his side’s loss with characteristic bluster, saying the players’ confidence was smashed and he would need to build them back up. While there were undoubted flaws in United’s performance, as an over-reliance on long passing allowed Swansea to anticipate and cut off many attacking moves. But by chances created, Manchester United clearly had the better of the match.
This map shows the location of every shot in the match, with the size of the marker relative to the “expected goals” value of the shot. The two best chances in the match, the two largest squares, represent the goals by Wayne Rooney and Gylfi Sigurdsson, both of whom took their shots from the close and central area of the 18-yard box. The danger zone.
But what exactly is the danger zone? A closer shot is better, and a more central shot is better, but how much better? I am rebuilding my expected goals model, and I want to offer a fully open expected goals model that anyone can use. The first step is figuring out how to quantify shot location.
After much work, I have settled on an exponential decay model. Exponential decay is common in natural processes. Radioactive material, for instance, decays at a rate relative to the amount of time for which it has been decaying. It rapidly becomes much less radioactive, but it does not fully decay for a very, very long time. The same is true of shot quality—shots very close to goal have a high probability of going in, shots a little further away have a much smaller chance, and even shots a long way from goal have some small chance of scoring. The odds that a shot will be scored decrease quickly at first as you move away from goal, but then much more slowly when you are further out. The degree to which a shot from 40 yards is worse than a shot from 20 yards is much less than the degree to which a shot from six yards is worse than a shot from three yards.
To estimate exact shot quality, I took a sample of all shots from English Premier League games in the last two seasons, and I separated out those which were not headed shots, and not assisted by either a cross or a throughball. These other factors will be included in my expected goals, as I will explain further in time. But for now, this gives a reasonably homogeneous sample of about 13,000 shots. I broke these shots down into buckets based on their distance and angle from goal, and fit the results to an exponential decay curve. These were the results.
An R2 of over 0.9 is very strong. I think this is clearly a good model. Visually, the dots match the line well. I will use this as the basis for my expected goals.
The last thing to explain here is “adjusted distance.” Both Sigurdsson’s and Rooney’s goals were high-expectation chances not just because they were close to goal, but also because they were in the central area of the box with more of the goal mouth available. To adjust distance for angle, I simply take the angle from goal at which the shot is struck as a proportion of 90 degrees. I adjust the distance relative to the angle of the shot, so that shots from the direct center of the box are not adjusted at all, while shots from a wide angle are given a higher adjusted distance.
So that is the shot location model for expected goals. I am happy to share the code with anyone who is interested, and I will explain some further pieces of this new system in future pieces.
All data provided by Opta unless otherwise noted.