A Brief Introduction to Infectious Disease Modelling: Simulating a Hypothetical COVID-19 Outbreak

By Enrique M. Saldarriaga


The COVID-19 pandemic has boosted the interest for mathematical models of infectious diseases. In this entry, I will briefly introduce some of these models and provide an R-code to simulate an outbreak of COVID-19.

These models synthesize multiple sources of information into equations that aim to model the evolution of a disease and make predictions. When used correctly, they can be incredibly powerful tools to explain a very chaotic and complex reality, to evaluate policy options to inform decision-making, to understand hidden mechanisms that drive an epidemic, and others.

Infectious diseases do not occur in isolation in each person.  They are transmitted through contact with a pathogen. Thus, there is a need to understand the mechanisms for a susceptible person to establish effective contact (i.e. contact that results in a transmission; sexually transmitted disease is a good example) with someone who is infected with that pathogen. On the population level, disease prevalence is considered a risk factor for the incidence: the higher the proportion of people living with a disease, the higher the likelihood that an infected person gets in contact with a susceptible person. This relationship between incidence and prevalence can be characterized using dynamic models. Here, the probability of getting infected is determined by the probability of contact with an infectious person (or animal in case of diseases transmitted by vectors, like malaria), which is given by the prevalence. A contact resulting in an infection is called a susceptible-infected effective contact.

Infectious and non-communicable disease models have substantial similarities: both can be compartmental or agent-based (microsimulation), as well as deterministic (static transition probabilities) or stochastic (transition probabilities are random draws of a specified distribution). In any case, the decision about which model to use is determined by the scope, purpose of the analysis, and many times, the target audience for results dissemination.

In the following section I will describe compartmental, deterministic, closed-cohort models. In a closed cohort model, we assume no deaths or births, but the population remains constant over time.

Model Types

The Susceptible-Infectious (SI) Model. This is the most basic infectious disease model. It is characterized by two state variables or compartments: Susceptible (S) and Infectious (I). Here we model one transition, and once all susceptibles are infected, the epidemic is over (no deaths in this model). The transition is driven by the transmission coefficient. This is a very important concept because regardless of model type, this parameter determines the rate at which people get infected. It is usually denoted by lambda, λ, and it is the product of the infectivity or probability of transmission per contact (ρ), the contact rate at a given period (c), and the prevalence of infected (I/N; where N  is the total population): λ = c * ρ * I/N. At any point in time, and for all model options, the number susceptible decreases by λ.

The Susceptible-Infectious-Recover (SIR) Model. In addition to susceptible and infected, the SIR model includes the recovered (R) compartment. R includes people that were infected and overcome the disease. The rate of transition is given by the inverse of disease duration, also known as the recovery rate (γ). Some diseases confer immunity (e.g. measles) after infection, but others do not. To capture this, a SIRS (susceptible-infected-recovered-susceptible) model would be more appropriate and allows those who don’t develop immunity to transition back to susceptible.

The Susceptible-Exposed-Infectious-Recovered, (SEIR) Model. This model adds an exposed (E) compartment. Exposed are all persons who have been infected but are not yet symptomatic, and more importantly, not yet infectious. Infectious persons are the only ones capable of spreading the disease, hence, an accurate count of them is very important. When using a SEIR model, the transition between S and E is given by lambda (λ) and the transition between E and I is given by the inverse of the latency or incubation period (σ).


COVID-19 Outbreak Example

I am going to simulate a COVID-19 outbreak using a SEIR model, depicted in the figure below. All parameters have been obtained from the MIDAS Network repository – an excellent and publicly available compilation of COVID-19 parameters.

Let’s model the transitions between compartments considering 1-timepoint increment:

By taking the partial derivative of these equations with respect to t, we obtain the changes in every compartment at any point in time:


With this in mind, let’s go to the R-code to see how to implement the simulation.

COVID-19 Example Results

We model an outbreak for 1 year, using the following parameters: c * ρ = 1.5, σ = 1/4.2, and γ = 1/20, for a population of 1 million where 1 persons were already infected. The following image describes the outbreak.


We can see a very steep increase in the number of infected, which peaks at 625,095 infections on the 37th day of the outbreak. As it is often pointed out, this rapid increase in cases can overload health systems, reducing the possibility of many people to access care.

How can we flatten the curve? One intervention to contain the COVID-19 pandemic was to increase the physical distance between people. The objective was to reduce the probability of an effective susceptible-infected contact. In modelling terms, this would directly reduce c * ρ  and therefore λ.

The following image shows the results of reducing c * ρ  to 0.6 instead of 1.5.


The peak of infection occurs later, on day 65, at a lower count as well: 550,446. This is an example of how effective behavioral changes can be to reduce the severity of an outbreak.

In this example we changed only one parameter. But one thing that amazes me about infectious disease modelling, is that (almost) every parameter driving the outbreak is susceptible to change given the right intervention. You can now use the R-code to see how variations in other parameters affect the outbreak and think about what kinds of interventions might produce such changes.

Suggested Readings

Vynnycky, E. & White, R. G. An introduction to infectious disease modelling. (Oxford University Press, 2010). BookSite

Garnett, G. An introduction to mathematical models in sexually transmitted disease epidemiology. Sex Transm Infect 78, 7–12 (2002).

Kretzschmar, M. Disease modeling for public health: added value, challenges, and institutional constraints. J Public Health Pol 41, 39–51 (2020).

Estimating elasticities from linear regressions

By Enrique Saldarriaga

This post aims to show how elasticities can be estimated directly from linear regressions.
Elasticity measures the association between two numeric variables expressed in percentages, whose interpretation is the percentage increase in one variable associated with a 1% change in another one. Elasticities have served economics for a long time, basically because they allow comparison between very different settings when changes in levels (e.g. dollars) are difficult to interpret. Take for instance a company that produces both cars and chocolate bars. If they want to know how changes in the price of both products would impact their demand, in order to determine which price to increase and which to maintain, the comparison would be a mess. A $100 change would mean nothing for the demand of cars but would destroy demand for chocolates. Similarly, a decrease in 100K people in the chocolate bars’ demand is probably just a dent, but the same amount could represent a significant portion of the car’s market. Changes in percentages are a way to standardize change and its consequences. In health economics, elasticities are becoming more common because they are able to show fair comparisons between variables with heterogenous behavior depending on the context.

The most common elasticities are price and income. The income elasticity of any variable would be the percentage change in that variable associated with a 1% change in income; the elasticity price would be the percentage change in a variable associated with a 1% change in its own price. In economics, that variable usually would be the demand of a given good. In health economics, that variable could be any number of things.

For example, let’s say we want to estimate the income elasticity of BMI. The elasticity would be expressed as:

\displaystyle \epsilon_I = \dfrac{\Delta \% BMI}{\Delta \% Income} = \dfrac{\Delta BMI}{\Delta Income}* \dfrac{Income}{BMI}

where ∆ stands for variation and shows the variation of BMI or Income from one point to another in their joint distribution. In order to make this change infinitesimal, and therefore estimate the elasticity in the whole distribution, we can express those changes with differentials: ε_I = dQ/dP*Income/BMI

Now, to obtain elasticities directly from linear regressions we should use the logarithmic forms of the variables:

\displaystyle ln(BMI) = \beta_0 + \beta_1 * ln(Income) = f(Income)

The BMI is a function of income. The function can be expressed as:

\displaystyle f(Income) = BMI = e^{ \beta_0 + \beta_1 * ln(Income)}

To find the change in BMI associated with a change in income we should derive this function by Income. Given the form of the function we use the chain rule:

\displaystyle f(x)=g(h(x)) ; f'(x)=g'(h(x))*h'(x)

Where: \displaystyle g(h) = e^h , h(Income) = \beta_0 + \beta_1 *ln(Income)

\displaystyle g'(h)= e^h ,h'(Income) = \beta_1 * \dfrac{1}{Income}


\displaystyle f'(Income) = e^{ \beta_0 + \beta_1 * ln(Income)} * \beta_1 * \dfrac{1}{Income}

\displaystyle \dfrac{\partial BMI}{\partial Income} = e^{ln(BMI)} * \beta_1 * \dfrac{1}{Income} = BMI * \beta_1 * \dfrac{1}{Income}

\displaystyle \beta_1 = \dfrac{ \partial BMI}{ \partial Income} * \dfrac{Income}{BMI}

Thus, the β_1 coefficient is the estimation of the income elasticity. The same procedure could be applied to find the price elasticity. It’s interesting to notice that if only the independent variable were in its logarithmic form, the derivation would be easier, and the coefficient would be:

\beta_1 = \dfrac{ \partial BMI}{ \partial Income} * \dfrac{Income}{1}

Then, to estimate the elasticity it would be necessary to divide the coefficient by a measure of BMI, probably the mean to account for the whole distribution, as Sisira et al. did in their paper. However, in this case would be necessary to carefully select a good average measure. This method would be useful if we prefer to interpret other covariates’ coefficients using BMI rather than ln⁡(BMI).