Dataset Creation
The creation of a dataset was split into two parts.
The initial phase, involved looking at League One fixtures and creating a dataset within Excel for matches played in season 2016/17. The process to produce this dataset was manual, and involved sourcing data from various websites, which was then added into an Excel spreadsheet. This spreadsheet is named "league1_2016_17_v5.xlsx" and can be found in the github repository linked below.
The second phase, was linked to improving my understanding and learning of the python language. A python script was created (later to be converted into a jupyter notebook). With the aid of various other source files, the script when run, creates a dataset for Championship matches in the 2016/17 season.
The script (a jupyter/ipython notebook) can be found in my Github Repository here.
The two datasets produced, are then merged into one main dataset, which will then be used in the modelling phase.
See my Github Repository here, for a full list of the files used in the creation of the dataset.
The list of features produced by this python script, are as follows :-
Feature List
- Date : date of match HomeTeam
- Home Team AwayTeam
- Away Team Day_Eve : Is game a day or evening match ?
- Day Type : Is the game on a weekend or during week ?
- Holiday : Is the game played on a bank holiday ?
- Hol Type : Same as holiday.
- Capacity : Capacity of home teams ground
- Average Travelling Fans : Average number of travelling fans that away team takes (based on previous season)
- Cheapest Season T : Lowest Season ticket price for home team
- Home League Position : Current position at time of game, of home team
- Away League Position : Current position at time of game, of away team
- Form Home : Current form of the home team (based on last 5 matches)
- Form Away : Current form of the away team (based on last 5 matches)
- Distance : Distance between the home sides ground and the away team
- Temperature : Temperature on day of game , Weather Event
- Lowest Home Ticket Price : Lowest ticket price for a home fan
- Lowest Away Ticket Price : Lowest ticket price for an away fan
- Home PostCode : Postcode of home team
- Away PostCode : Postcode of away team
- Attendance : Attendance for the game
- Highest Home Ticket Price : Highest home ticket price that a fan can pay
The table below gives an example output from the dataset file :-
Date | HomeTeam | AwayTeam | Day_Eve | Day Type | Hol Type | Capacity | Average Travelling Fans | Cheapest Season T | Home League Position | Away League Position | Form Home | Form Away | Distance | Temperature | Lowest Home Ticket Price | Lowest Away Ticket Price | Attendance | Highest Home Ticket Price | |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
05/08/2016 | Fulham | Newcastle | E | 1 | 0 | 25700 | 3140 | 254 | 0 | 0 | 0 | 0 | 249 | 20.2 | 25 | 24 | 23922 | 45 | |
06/08/2016 | Birmingham | Cardiff | D | 0 | 0 | 30016 | 775 | 230 | 0 | 0 | 0 | 0 | 89 | 18.8 | 15 | 22 | 19833 | 40 | |
06/08/2016 | Blackburn | Norwich City | D | 0 | 0 | 31367 | 1661 | 279 | 0 | 0 | 0 | 0 | 175 | 17.9 | 18 | 20 | 12641 | 35 | |
06/08/2016 | Bristol City | Wigan Athletic | D | 0 | 0 | 21497 | 1284 | 299 | 0 | 0 | 0 | 0 | 145 | 19.1 | 25 | 20 | 17635 | 41 | |
06/08/2016 | Derby | Brighton & Hove Albion | D | 0 | 0 | 33597 | 1611 | 319 | 0 | 0 | 0 | 0 | 154 | 19.3 | 17.6 | 25 | 28749 | 33 |
In the next update, feature scaling will be looked at, and its possible application to the dataset created in this project.
In the next update, feature scaling will be looked at, and its possible application to the dataset created in this project.