{ "cells": [ { "attachments": {}, "cell_type": "markdown", "metadata": {}, "source": [ "" ] }, { "attachments": {}, "cell_type": "markdown", "metadata": {}, "source": [ "# Part II: Automatic Differentiation with Zygote.jl\n", "\n", "" ] }, { "attachments": {}, "cell_type": "markdown", "metadata": {}, "source": [ "## Why I personally use Julia\n", "\n", "" ] }, { "attachments": {}, "cell_type": "markdown", "metadata": {}, "source": [ "For me, this becomes particularly apparent in the context of automatic differentiation. Here, Julia's main AD system, the [`Zygote.jl` package](https://github.com/FluxML/Zygote.jl), offers a particularly flexible and straightforward way of getting gradients of essentially arbitrary pieces of code. I will illustrate this with a modelling example in a COVID-19 context in the following.\n", "\n", "The notebook is an adaptation of the notebooks in [this repository](https://github.com/maren-ha/DifferentiableProgrammingForStatisticalModeling) and slides from a presentation I gave about this at the Statistical Computing Workshop at Reisensburg Castle in July 2022." ] }, { "attachments": {}, "cell_type": "markdown", "metadata": {}, "source": [ "## Exemplary modelling challenge\n", "\n", "First, let's load and install the necessary packages as usual. The commented-out code will read from the Project.toml and Manifest.toml files and try to load the same environment as the one used by the author (i.e., the exact same version of Julia and its packages). This is good practice to ensure reproducibility, but its not necessary for the notebook to run and might result the user having to deal with installing different versions of a package. Your results should match the ones on this page; if they don't, try running the commented-out code.\n", "\n", "```julia" ] }, { "cell_type": "code", "execution_count": 21, "metadata": {}, "outputs": [], "source": [ "using Pkg;\n", "#Pkg.activate(\"aux/\")\n", "#Pkg.add([\"GLM\", \"ProgressMeter\", \"StatsBase\", \"Zygote\"])\n", "#Pkg.instantiate() # uncomment if you want to run this notebook yourself under the same environment as the author\n", "#Pkg.status()\n", "\n", "using CSV \n", "using DataFrames\n", "using GLM \n", "using LinearAlgebra\n", "using ProgressMeter\n", "using Random\n", "using Statistics\n", "using StatsBase\n", "using StatsPlots\n", "using VegaLite\n", "using Zygote " ] }, { "attachments": {}, "cell_type": "markdown", "metadata": {}, "source": [ "At the beginning of the pandemic, our institute was asked by the Robert Koch Institute, the main health authority that monitored the COVID-19 pandemic in Germany, and the Federal Ministry for Health, to build a prediction model for COVID-19 ICU demand. The data we were given was a newly-set-up intensive care registry, where each ICU in Germany was required to report daily their numbers of COVID-19 patients in ICUs and normal wards, alongside a bunch of other information. Even though daily reporting was compulsory, the data was still very messy, with many ICUs not reporting at all, and many others reporting only sporadically.\n", "\n", "To get an idea what the data looked like, let's look at an exemplary hospital from the registry during a period in spring - early summer 2020. \n", "\n", "Note that the values shown in `:currentcases` have been imputed based on last-observation-carried-forward whenever a report has been missing, as indicated by a $0$ in the `:reported`column. " ] }, { "cell_type": "code", "execution_count": 12, "metadata": {}, "outputs": [ { "data": { "text/html": [ "
10 rows × 3 columns
date | currentcases | reported | |
---|---|---|---|
Date | Float64 | Bool | |
1 | 2020-04-16 | 0.0 | 0 |
2 | 2020-04-17 | 0.0 | 1 |
3 | 2020-04-18 | 0.0 | 1 |
4 | 2020-04-19 | 0.0 | 1 |
5 | 2020-04-20 | 2.0 | 1 |
6 | 2020-04-21 | 0.0 | 1 |
7 | 2020-04-22 | 1.0 | 1 |
8 | 2020-04-23 | 2.0 | 1 |
9 | 2020-04-24 | 2.0 | 1 |
10 | 2020-04-25 | 2.0 | 0 |
5 rows × 6 columns
Model | SSE | FirstQuartile | Median | ThirdQuartile | Maximum | |
---|---|---|---|---|---|---|
String | Float64 | Float64 | Float64 | Float64 | Float64 | |
1 | meanmodel | 89.2388 | 0.0 | 0.0 | 0.002 | 9.0 |
2 | modmeanmodel | 84.3421 | 0.0 | 0.0 | 0.0 | 9.0 |
3 | zeromodel | 86.0 | 0.0 | 0.0 | 0.0 | 9.0 |
4 | linregmodel | 181.068 | 0.0 | 0.001 | 0.015 | 30.227 |
5 | incrmodel | 74.7457 | 0.0 | 0.0 | 0.001 | 6.079 |