Generating Survival Curves from Study Data: An Application for Markov Models

By Mark Bounthavong

CHOICE Student Mark Bounthavong

In cost-effectiveness analysis (CEA), a life-time horizon is commonly used to simulate the overall costs and health effects of a chronic disease. Data for mortality comparing therapeutic treatments are normally derived from survival curves or Kaplan-Meier curves published in clinical trials. However, these Kaplan-Meier curves may only provide survival data up to a few months to a few years, reflecting the length of the trial.

In order to adapt these clinical trial data to a lifetime horizon for use in cost-effectiveness modeling, modelers must make assumptions about the curve and extrapolate beyond what was seen empirically. Luckily, extrapolation to a lifetime horizon is possible using a series of methods based on parametric survival models (e.g., Weibull, exponential). Performing these projections can be challenging without the appropriate data and software, which is why I wrote a tutorial that provides a practical, step-by-step guide to estimate a parameter method (Weibull) from a survival function for use in CEA models.

I split my tutorial into two parts, as described below.

Part 1 begins by providing a guide to:

  • Capture the coordinates of a published Kaplan-Meier curve and export the results into a *.CSV file
  • Estimate the survival function based on the coordinates from the previous step using a pre-built template
  • Generate a Weibull curve that closely resembles the survival function and whose parameters can be easily incorporated into a simple three-state Markov model

Part 2 concludes with a step-by-step guide to:

  • Describe how to incorporate the Weibull parameters into a Markov model
  • Compare the survival probability of the Markov model to the reference Kaplan-Meier curve to validate the method and catch any errors
  • Extrapolate the survival curve across a lifetime horizon

The tutorial requires using and transferring data across a couple of different software. You will need to have some familiarity with Excel to perform these parametric simulations. You should download and install the open source software “Engauge Digitizer” developed by Mark Mitchell, which can be found here. You should also download and install the latest version of R and RStudio to generate the parametric survival curve parameters.

Hoyle and Henley wrote a great paper on using data from a Kaplan-Meier curve to generate parameters for a parametric survival model, which can be found here. The tutorial makes use of their methods and supplemental file. Specifically, you will need to download their Excel Template to generate the parametric survival curve parameters.

I have created a public folder with the relevant files used in the tutorial here.

If you have any comments or notice any errors, please contact me at

