At the end of each week's column

Latest collection of data for analysis and insights.
Post Reply
phonenumber
Posts: 13
Joined: Sun Dec 22, 2024 3:52 am

At the end of each week's column

Post by phonenumber »

Eighteen months ago, I wrote a blog on how I have been using Apteco to answer questions posed in the Guardian's online weekly football statistics column. At that time, I'd had 6 of my contributions used. I thought I'd revisit this topic now that I've reached 25 contributions, (as well as numerous other contributed email list uk answers that don't get used) and share some of my thoughts on the techniques which I use regularly to solve problems.

In this blog, I will illustrate these techniques and give you answers to the questions.

The data
The Guardian's readers pose questions that they'd like answered in the 'can you help' section. The questions in this section can cover a huge range of topics including match results, performances of players and in-game events to name a few.

I have collected data on football match results only meaning that I can hope to answer only a subset of those posed. I update my data on an annual basis during each summer with the results from the previous season, so that I have all English league match data (from the first matches in 1888) up to the end of the most recently completed season for the top four divisions. This is a total of just under 206 thousand matches for 146 different teams.

The FastStats system is structured in such a way to allow easy analysis of results across a season, or for each team. So, each team will compete in one or multiple seasons. In each of those seasons, they will compete in multiple matches.

Useful derived variables / Predefined analysis

Image

The match data set is fairly limited as it only contains information on the two teams, the date of the match, the division it took place in and the scoreline. In preparing the initial data set it was simple to add variables for goals scored and conceded and the season the match took place in.

Within FastStats, I've created several derived variables to more easily answer questions. Some of these are only used in one-off answers, others are much more useful and are used very regularly.
Post Reply