Scraping By...

MFW I see StaleElementReferenceException again

A quick update to keep this blog fresh!

Collecting my own Tennis Event Data | Nadal-Djokovic EP-59 Data Story

Manual data collection is hard, no?

Having dabbled in sports analytics for 2+ years or so now, one thing that us hobbyist practitioners have largely taken for granted is the availability of data itself. While accessibility of more granular data types (e.g. event/tracking) is generally decent for football thanks to certain providers, the same can’t be said for my other favourite sport, Tennis. The summrary of this tale is that I spent about 2 weeks tracing every single shot of the Nadal vs Djokovic match at this year’s Roland Garros, managing to keep my sanity intact whilst gaining a deeper appreciation for both players’ tactical choices and execution. After some background about what compelled me to attempt this arduous task, I’ll share several lessons I took from this experience and then a visual-heavy data/analytics report of the match that I charted. (I’ve also put this data public on here.)

Stats Perform Pro Forum 2022

Going Pro..

A while back I had the chance to present a piece of hobby work titled “Pressing Times: Can data tell us when and how to navigate out of a counter press?” with my old friend Zhi Yuan at the 2022 Stats Perform Pro Forum held in London. For unfamiliar readers, the Pro Forum is a football/soccer analytics forum organised annually by the sports data and analytics company Stats Perform (football fans may be more familiar with their “previous” name Opta, thanks to the signature tweeting style of their OptaJoe account. Unmistakable.).

How purple are his patches? - Quantifying goalscoring consistency in Football

Finding Mr. Reliable

Berbatov’s 2010/11* season never fails to get an asterisk next to it when discussed amongst football fans and for good reason I suppose. While his 20 goals to share the Premier League Golden Boot with Carlos Tevez looks good on paper, the remarkable fact that 11 of those goals came in 3 matches (Liverpool, Blackburn, Birmingham) is a rather glaring blot on that record. Berbatov began the season on the front foot, scoring 3 goals in 4 before his hat-trick against the enemy. However, this was meekly followed up by a barren streak of 7 matches before he compensated with 5 goals against Blackburn. After his most consistent scoring run with 7 in his next 7, Berbatov’s United career arguably only went downhill. Just 2 then in his last 12, as Ferguson began to favour the partnership of a resurgent Wayne Rooney just behind the increasingly prominent Chicharito. Berbatov’s blank against Man City in the FA Cup Semi Final where he missed a series of chances proved to be the final straw, which culminated in Michael Owen (who had just 2 league goals all season) being chosen over him on United’s bench for the Champions League Final against Barcelona.

Quantifying Player Chemistry - Joint Expected Threat (JxT)

Who says those two can’t play together?

Not exactly one from the football commentary hall-off-fame, but devoted Man Utd fans may remember the above words from Rob Hawthorne following Carlos Tevez’s goal to make it 3-1 against Middlesbrough in October 2007 (friendly reminder that Nani scored an absolute banger for 1-0). Tevez’s placed finish followed an intelligent backheel assist from Wayne Rooney, whose situational awareness to draw 3 defenders to him was matched by Tevez’s anticipation of the play. Tevez’s arrival in the summer was met with skepticism over whether the duo could form an effective partnership due to their similar all-action and selfless style. By the end of that season, questions over their chemistry were well and truly put to bed as they, along with Cristiano Ronaldo in a fluid front-three, led Man Utd to the European Double and their most successful season since the millenium.

Scatter Plots for Tennis (Break Points)

These two sure know a thing or two about them.

Scatter points have emerged as one of the most popular methods for visualising football data recently. While not novel by any means (though I’m not sure about the first ever documented scatter plot), displaying complementary data on 2 axes really helps to bring out striking bits of information that aren’t so discernable when analysed separately e.g. correlations or clusters of data. To cite an example, the below plot which was produced by the Financial Times towards the end of the last PL season does very well to emphasise the unprecendented levels that both Man City and Liverpool hit relative to winners and runners ups from past seasons.

Best mates in the Premier League

Who’s played the most minutes with who?

“The iconic starting XIs who rarely played together”, written by BBC sport, was one of the more interesting retrospective football pieces I came across this year. While not exactly the inspiration for idea behind this post, the concept reminded me of this first little side project I did. A couple of summers ago, I thought about looking at the number of minutes played together between any pair of players. As a football fan I’ve definitely heard on numerous occasions commentators mentioning that “so-and-so are only starting together for the Xth time this season” and fans/pundits stressing the value of having a consistent centre-back partnership. While obviously we shouldn’t expect such a simple measure to correlate well with actual team performance or chemistry, it could offer a little insight into why some teams aren’t gelling so well. At worst, we get some fun titbits of information out of it!

Pagination