During lockdown, I’ve made another effort at learning F#. This time I think I’ve had a bit more success. Processing data is something that we as developers do on a weekly or even daily basis, so it seems quite natural to practice that in F#. As a big football fan, I’ve decided to use the English Premier League results for season 2019/2020, as it’s a dataset I implicitly understand.
The EPL results set is available in CSV format from football-data.co.uk, and rather than having to parse it all by hand or hitting up CsvHelper and still having to write some C# code to actually use it, in F# we can use a Type Provider, specifically the CsvProvider from FSharp.Data.
Loading and parsing the data
FSharp.Data is easily added via NuGet, and using an .fsx
script, we can easily reference the assembly and open the namespace:
#r "../../.nuget/packages/fsharp.data/3.3.3/lib/netstandard2.0/FSharp.Data.dll"
open FSharp.Data
open System.Collections.Generic
I didn’t have any luck in the script with referencing a more local copy of the assembly, such as one in the /bin
folder, due to it complaining about not being able to find the FSharp.Data.DesignTime.dll
, but going directly to the assembly in the NuGet packages folder seems to work just fine. It is also worth noting that I’m writing this on a Mac (in VS Code), so your path syntax might vary. Also note that we also open the BCL System.Collections.Generic
namespace. We’ll need that later.
Next, comes the part that blows my mind. Here is how we generate a type which knows how to load and parse a CSV file of a given structure:
type Results = CsvProvider<"../../Downloads/epl1920.csv">
That’s it. It’s pretty amazing. The Results
type is now also type safe, and it’s had a guess at infering what the types are for each column of the data. We could probably do something similar to this in C# using CsvHelper
and either Castle.DynamicProxy
or some magic with the new Roslyn compiler, but I think it would take quite a bit of code to create something that came close to what this can do.
Skipping over some important stuff that we’ll get to in just short a while, we can now easily load the full results set:
Results.Load("../../Downloads/epl1920.csv")
This is fairly straightforward, and does exactly what it looks like. The data loaded from the file is available in a .Rows
property, that we’ll use shortly.
Parsing the data
All good so far, but now things get a little more complicated. Now we need to think somewhat about the data, and if you look in the file… it’s got a LOT of information. Mostly related to betting information for the match, but there is also quite a lot of information about the match itself. For the purposes of calculating the league, most of thie information in the file is redundant. In order to just get the information we need, we can define a Record to hold to that information. A Record in F# is somewhat analagous to a C# POCO class, but with automatic type safety and full equality comparisons out of the box.
type FullTimeResult = | Home | Away | Draw
type MatchResult = {HomeTeam : string; AwayTeam: string; HomeGoals: int; AwayGoals: int; Result: FullTimeResult}
The FullTimeResult
type is just like a C# enum, and is easier to read than the ‘A’, ‘H’ or ‘D’ we get from the CSV file for the FTR (Full Time Result) column. I think it also looks nicer to read when it comes to the pattern matching, but we’ll get to that. With those types defined, we can get to the real meat of this and actually parse the data:
let league = Results.Load("../../Downloads/epl1920.csv")
.Rows
|> Seq.map toMatchResult
|> Seq.fold processMatchResult (Dictionary<string, LeagueRow>())
|> Seq.sortByDescending (fun (KeyValue(_, v)) -> v.Points)
Here, we load the file as we discussed earlier, but now, we forward pipe the data returned from the .Rows
property to Seq.map
through the toMatchResult
function, which takes a Row
and extracts the data we’re interested in and returns a new MatchResult
. In C# this is the same as doing .Rows.Select(new MatchResult {...})
. Then, the resulting sequence of MatchResult
s is piped forward through the processMatchResult
function, using the scary sounding Seq.fold
, and it also passed a new instance of a BCL Dictionary
, with a string
key and a LeagueRow
type as the value. I’ve not yet mentioned the LeagueRow
type… it’s not super important to proceedings, it just a type which holds all the data you would expect to see in a football league table. For reference it’s included below in the full script.
Amazingly, those five lines load the file, process all the data, and provide an object which contains a fairly accurate version of the English Premier League table. Obviously things are a little more involved than that.
Examing the parsing in more detail
As you’ll recall, the there is a lot of data in the CSV file that is irrelevant when it comes to generating the league table. We can map all the data we need into the MatchResult
type, which we do by forward piping the data through Seq.map
and the toMatchResult
function:
let toMatchResult (row: Results.Row) =
let fullTimeResult =
if row.FTR = "H" then FullTimeResult.Home
elif row.FTR = "A" then FullTimeResult.Away
else FullTimeResult.Draw
{
HomeTeam = row.HomeTeam
AwayTeam = row.AwayTeam
HomeGoals = row.FTHG
AwayGoals = row.FTAG
Result = fullTimeResult
}
This is mostly just a simple mapping from the results row into the new MatchResult
type. You’ll notice we don’t need to explicitly ‘new’ anything up, don’t forget, we’re in a functional world now, so the MatchResult
is returned as a side affect of what we’re doing. We also define a nested method which processes the full time result using a simple if/else construct. I think I could also have used pattern matching, but it’s simple enough that I’m not going to worry about it.
Next, comes the scarying sounding fold. The method looks like this:
let processMatchResult (league : Dictionary<string, LeagueRow>) result =
match result.Result with
| Home -> updateHomeWin(league, result)
| Away -> updateAwayWin(league, result)
| Draw -> updateDraw(league, result)
league
What happens is that we tell Seq.fold
to use this method to do the folding, and we give it an initial state of a new and empty Dictionary<string, LeagueRow>()
. Seq.fold
carries the state over to each subseqent ‘fold’ over the sequence of MatchResults
it was piped. You’ll note that the final thing returned as a side effect of the method is the same dictionary which was passed in. This essentially forms the core of the algorithm to produce the league. The pattern matching of match <thing> with
is equivalent to a C# switch statement on steroids. I am barely scratching the surface of what can be done with pattern matching in F#.
The patten patch decides what kind of result we are dealing with, and delegates further processing to the relevant method. Here is the definition for updateHomeWin
. The other two methods are exactly the same, except they distribute the points/goals/wins/losses/draws accordingly, so I won’t go into those in detail.
let updateHomeWin (league : Dictionary<string, LeagueRow>, result : MatchResult) =
updateTeam(league, result.HomeTeam, 3, result.HomeGoals, result.AwayGoals, 1, 0, 0)
updateTeam(league, result.AwayTeam, 0, result.AwayGoals, result.HomeGoals, 0, 0, 1)
Each MatchResult
consists of two teams, and we have to update each entry in the league for both of these teams, with the correct number of points, goals for, goals against, win, draw and loss. The real part of this is in the updateTeam
function:
let updateTeam (league : Dictionary<string, LeagueRow>, team : string, points : int, forGoals : int, againstGoals: int, won : int, drawn, lost: int) =
if league.ContainsKey team then
let existing = league.[team]
let updated = {existing with Played = existing.Played + 1; Won = existing.Won + won; Drawn = existing.Drawn + drawn; Lost = existing.Lost + lost; For = existing.For + forGoals; Against = existing.Against + againstGoals; Points = existing.Points + points}
league.[team] <- updated
else
let leagueRow = {Team = team; Played = 1; Won = won; Drawn = drawn; Lost = lost; GD = 0; For = forGoals; Against = againstGoals; Points = points}
league.Add(team, leagueRow)
This is just a simple dictionary update where we check if a team already has an entry, and if so, update it, otherwise we create it. Things of note here are that whilst F# is mostly immutable, types from System.Collections.Generic
are mutable, which is how this whole thing works. I’m sure that someone much better at F# can come along and tell me how to do this with immutable F# collections. Also of note is the collection access of league.[team]
, which is different than in C#. We also update the value in the dictionary by using <-
.
After that, we can define a simple method to print out a row from the league for us, and then iterate through the entries in the dictionary, to get a league table:
let print league =
printfn "Team: %s | Played: %d | Won: %d | Lost: %d | Drawn: %d | For: %d | Against: %d | GD: %d | Points: %d" league.Team league.Played league.Won league.Lost league.Drawn league.For league.Against (league.For - league.Against) league.Points
league
|> Seq.iter (fun (KeyValue(_, v)) -> print v)
The KeyValue
is an active pattern, which matches values of KeyValuePair
objects from the BCL Dictionary, and this produces (with data correct as at the publication of this post):
Team: Liverpool | Played: 33 | Won: 29 | Lost: 2 | Drawn: 2 | For: 72 | Against: 25 | GD: 47 | Points: 89
Team: Man City | Played: 33 | Won: 21 | Lost: 9 | Drawn: 3 | For: 81 | Against: 34 | GD: 47 | Points: 66
Team: Leicester | Played: 33 | Won: 17 | Lost: 9 | Drawn: 7 | For: 63 | Against: 31 | GD: 32 | Points: 58
Team: Chelsea | Played: 33 | Won: 17 | Lost: 10 | Drawn: 6 | For: 60 | Against: 44 | GD: 16 | Points: 57
Team: Man United | Played: 33 | Won: 15 | Lost: 8 | Drawn: 10 | For: 56 | Against: 33 | GD: 23 | Points: 55
For completeness here is a gist of the full script: