Test performance estimates without a gold standard: a short tutorial on JAGS

800px-Bayes'_Theorem_MMB_01
Photo credit: Flickr user mattbuck

One of the most unique applications of Bayesian statistics is in finding estimates for unknown values that depend upon other unknown values. By taking advantage of the Bayesian ability to integrate prior knowledge into its models, you can develop parameter estimates using priors that are little more than a guess.

This application of Bayesian statistics is commonly seen in diagnostics. When there isn’t a gold standard test that allows simple comparisons, Bayesian models are able to use data on test results to estimate the performance of these tests and the prevalence of the disease. Whether it’s a new test or a new population where the test is unproven, these analyses allow us to glimpse important aspects of diagnostic usage with only scant data.

The pioneering paper that developed these methods is titled “Bayesian estimation of disease prevalence and the parameters of diagnostic tests in the absence of a gold standard” by Lawrence Joseph, Theresa Gyorkos, and Louis Coupal. They collected the results of two tests for the Strongyloides parasite among Cambodian immigrants to Canada in the 70s. Since there was no knowledge of how common the parasite was in this group, they used an uninformative prior for its prevalence, but were able to solicit vague priors about the two tests’ performance from clinical experts. From these priors they built distributions which they then ran through a Gibbs sampler.

A Gibbs sampler is a program that runs repeated sampling to find the parameters – in our case, test performance and prevalence – that would make the most sense in light of the data we have. Because of the way that the sampler moves from parameter estimate to parameter estimate, it devotes most of its samples to high likelihood scenarios. Therefore, the parameter estimates are essentially histograms of the number of samples that the algorithm has run for each parameter value.

JAGS is a commonly used Gibbs sampler, and its name stands for “Just Another Gibbs Sampler.” It’s not the only one, but its got a convenient R interface and a lot of literature to support its use. I recently used JAGS in a tutorial on its R interface that recreates the Joseph, Gyorkos, and Coupal paper. You don’t need any datasets to run it, as you can easily simulate the inputs of the two Strongyloides tests.

The first part deals with gathering estimates from the two different parasite tests independently.  This means building models of the test results as Bernoulli samples of a distribution that depends on the tests’ sensitivity and specificity, as well as the disease prevalence.

The second half of the tutorial deals with learning to use the data from the two tests together. This is significantly more complex, as we need to model the joint probability of each possible combination of the two tests together. To do this, we’ll need to read in the results of the tests on each patient. However, since we’re reading in the results directly, we can’t assign a distribution to them. Rather, we’ll learn to create a likelihood that is directly observed from the data and to ensure that our new likelihood affects the model.

To learn more and see the full details, go check out the tutorial on my GitHub page and feel free to ask me any questions that come to mind!

Exit interview: CHOICE alumnus Solomon Lubinga

Editor’s note: This is the second in an ongoing series of interviews we’ve planned for the students graduating from the CHOICE Institute where we’ll get their thoughts on their grad school and dissertation experiences.

Solomon Lubinga is pharmacist and an applied health economist. After graduating with his PhD from the Comparative Health Outcomes, Policy, and Economics (CHOICE) Institute in 2017, he became a senior fellow at the CHOICE Institute at the University of Washington, working with Dr. Josh Carlson in collaboration with the Institute for Clinical and Economic Review (ICER). He is interested in decision modelling, value of information/implementation research as well as the econometric and health policy applications of discrete choice models.

lubinga_solomonFor more details on Solomon’s work, check out his personal webpage at http://www.jonslo.com

  • What was your dissertation about?

In my dissertation, I abstracted from

well-known economics and socio-psychology

decision theories to study the incentives that drive the uptake of medical male circumcision (MMC) for HIV prevention in Uganda. My hypothesis was that a model that would combine factors from both decision theories would not only more accurately predict MMC decisions, but also be a very powerful tool to predicting the potential impacts of different MMC demand creation strategies.

  • How did you arrive at that topic? When did you know that this is what you wanted to study?

I became interested in the intersection of economics and psychology early on in the PhD program. I suppose this was because of my own proclivity to act irrationally even though I considered myself a well-informed person. This led me to ask why individuals in lower income countries in general do not value preventive health interventions. This specific topic built on a prior contingent valuation study estimating willingness to pay (WTP) and willingness to accept payment (WTAP) for safe MMC among men in high-HIV-risk fishing communities in Uganda. The results of this analysis indicated low demand (WTP) and high potential incentive value (WTAP) for MMC, suggesting that a high WTAP (a defacto increase in MMC price) may result in an unfavorable incremental cost-effectiveness or benefit-to-cost ratio for MMC. I was therefore interested in studying the relative roles of economic and psychological incentives on demand for MMC.

  • What was your daily schedule like when you were working on your dissertation?

I never had a set schedule while I worked on my dissertation. I was also a teaching assistant (TA) for the online health economics course offered by CHOICE. I spent a lot of time in Uganda collecting my data. I would spend the day in the field (8:00am – 5:00pm) and the evenings (7:00pm – 11:00pm) performing my TA duties. It turned out that this was convenient (but challenging) because of the time difference between Uganda and the west coast. I also travelled to the UK twice for a choice modelling course, which was a great help with my dissertation. When I was in Seattle, I generally combined work on my dissertation with my teaching assistant responsibilities at the UW, with no set schedule. I simply gave what was more urgent the priority.

  • If you are willing to share, what was the timeline for your dissertation? And what determined that timeline?

I submitted my short proposal sometime in April, 2015. I defended my dissertation in August, 2017. Two major factors determined my timeline. First, the death of a close family member motivated me to take some personal time. Second, although I was fortunate to receive funding for my data collection activities, it took almost 8 months (between December 2015 and October 2016) to receive international clearance for the data collection activities.

  • How did you fund your dissertation?

As I mentioned, I was fortunate to receive funding for my data collection activities through a grant awarded to my dissertation advisor.

  • What will come next for you (or has come next for you)? What have you learned about finding work after school?

I am interested in academic positions in universities in the US, or other quasi-academic institutions (e.g., research institutes or global organizations that conduct academic-style research). As an international student, the main lesson I have learned is “to synchronize the completion of your studies with the job market cycle, especially if you are interested in academic positions in the US”.

Ecological Studies of Marijuana

If you live in one of the states which has legalized recreational marijuana or a state that is considering it, you have possibly seen one of the following billboards:

cannabis1
Available via http://www.wltx.com/article/news/local/verify/verify-do-states-that-legalize-marijuana-have-25-fewer-opioid-deaths/482535530
cannabis2
Credit Steven Lemons, available via https://frontpageconfidential.com/weedsmaps-billboards-marijuana-arizona/

The simple black background and white lettering makes them pop, but the statements themselves are even more captivating. The content covers contentious, hot-ticket topics: the opioid epidemic, health spending, and marijuana legalization. But what gets left out is the context:  for most readers, these statements imply causality, despite there being limited evidence for these relationships to date.

To the casual observer, these are impressive, exciting statements. A beneficial effect of a historically outlawed and much maligned substance is indeed fascinating! A more cautious observer might be wondering about the source of these claims, and, indeed the fine print appears to contain references! The more cautious observer might now be appeased.

But really, we should all pause here, for two reasons:

Firstly, these billboards are advertising. I will not get into further discussion of advertising, political or otherwise, however I will note that “Weedmaps,” the billboard producers, is poised to be your go-to search engine and rating site for marijuana strains and producers.

Secondly, causality is complex and elusive. The two studies cited on these particular billboards (Bachhuber et al. 2014; Bradford & Bradford 2017) are ecological in design, meaning they have aggregated data (in this case, states) as the unit of analysis. The variables in these analyses are features of the states, including the main variable of interest, implementation of medical cannabis laws (note that medical is missing from both billboards).  The research design is appropriate for questions about the average effects of medical cannabis laws on an outcome of interest (more on this later). But, these findings are subject to residual confounding on the state level. In addition, they are subject to the ecological fallacy in their interpretation, and as we all know, interpretation is what matters most.

Both studies include other state-level variables in the analyses that might explain the change in their outcomes over time, such as the implementation of state-wide Prescription Drug Monitoring Programs (PDMPs). PDMPs occurred in many states over the period studied, and based on similar analytic designs, may be largely responsible for improvements in opioid outcomes. Both studies account for PDMPs, and the first also considers several other opioid laws and policies that effectively restrict availability. These authors also performed several nice checks of robustness. For example, a secondary model was used to adjust for state-level linear time trends in outcome (i.e. including a random slope for state).  Authors note this technique may account for changes in concepts that are difficult to measure, such as attitudes, and other time-varying confounders. The study employed analysis of negative controls: death rates from other conditions supposedly not associated with cannabis (e.g. heart disease and septicemia), which authors would expect to remain unaffected by legalization.

Despite these nice checks, it is unlikely that these analyses accounted for all potential confounding variables, especially those that change over time.  And this is almost always the case, as it’s virtually impossible to observe, let alone control for, all sources of confounding. Adjusting for linear trends produced results that were only marginally statistically significant. Especially in dealing with states, the inclusion of a large number of explanatory variables quickly becomes a high-dimensional problem, where there are only 50 states with a few years of data, but many more variables than that. The question then becomes whether this residual confounding is enough to change our interpretation of these studies.

Interpretation of these studies (especially in the media) may suffer from the ecological fallacy, a logical fallacy when inference made on a group does not necessarily translate into inference on an individual’s behavior or risk. From these findings, we cannot make any inference about individuals’ patterns of opioids and cannabis use (i.e. substitution) and individuals’ underlying risk of negative opioid outcomes (i.e. substitution effects). In other words, we cannot link marijuana legality to the use patterns of individuals.

So where do we go from here?

The past decade has been something of an ecological study renaissance. This is not a bad thing. Such studies are useful for hypothesis generation, and population-level risk factors are very relevant in public health and medicine.  Population-level risk factors may be important effect modifiers or cause exposure to individual risk factor. Differences in state laws can make for great “natural experiments” where groups of people are “randomized” to an exposure by a natural process, and a pre-post assessment can be made.

But mostly importantly, it comes down to inference. Inference from these studies might inform marijuana policy but should not inform interventions on individuals. Lots of discussion has been generated by these studies, and there is a great deal of room for misinterpretation (sample headline: “How marijuana is saving lives in Colorado”).

On the bright side, the scientific community recognizes this problem, and it is likely that additional studies of the individual- and population-level effects will be undertaken.  A recent well-designed study from RAND (Powell et al. 2018) replicated Bachhuber et al., finding that adding more state-level variables and additional years of data to the model nullifies the effect of medical marijuana laws on opioid overdose mortality. Moreover, the authors identified that a more meaningful effect on opioid outcomes is achieved through protected and operational dispensaries, i.e. access, where the largest effect was seen during a time period of relatively lax regulation of dispensaries in California, Washington, and Colorado.

How to ensure that other new investigations will be high quality and unbiased is another question. Regardless, the tide for marijuana research appears to be turning. As more and more studies are published, it is imperative that researchers are clear about their analysis limitations, especially when their results might end up on a billboard.

Do prescription opioids cause unemployment or does unemployment cause prescription opioid abuse?

U.S. Employment and Opioids: Is There a Connection?

Janet Currie, Jonas Y. Jin, Molly Schnell. NBER Working Paper 24440. March 2018

http://www.nber.org/papers/w24440

SClark_PORPP-Headshot
UW CHOICE student Samantha Clark

Opioid abuse is one of the main public health challenges facing the US today. In 2016, the CDC reported that opioid-related overdose deaths had tripled from 1999 to 2014 and that drug overdoses, largely driven by this increase in opioid-related deaths, are now the leading cause of death among Americans under 50. A key component in developing effective policies to combat the opioid epidemic is understanding the mechanisms, both environmental and physiological, through which adherent opioid use transitions into abuse. A new National Bureau of Economic Research (NBER) working paper offers insight into one of these potential mechanisms: the link between employment and legitimate opioid prescription rates in working age adults (18-64). This study is unique in that it addresses temporality concerns in the relationship between these two variables (endogeneity due to reverse causality, in this case) by modeling the association in both directions and including instrumental variables (IV). My colleague Kangho provides a great primer on IV methods and interpretation here. Additionally, the use of detailed panel data from 2006-2014 and the inclusion of county-level fixed effects allowed the researchers to perform a more robust and detailed analysis than what had been done previously.

The authors first provide a thorough summary of the existing literature regarding the association between employment and legal opioid use. They note that while some researchers have found a link between unemployment and the use of opioids (see also here), others conclude that changes in economic conditions explain little of the variation in opioid overdoses and deaths. This conflicting evidence and the limitations of previous studies (use of cross-sectional data, etc.) make this analysis a timely addition to the literature.

The data that the authors used came primarily from two sources: employment information pulled from the government Quarterly Workforce Indicators (QWI) database and opioid prescription rates extracted from the QuintilesIMS dataset. Both sources included data disaggregated by county, gender, age group, and quarter, with industry also included in the QWI data. Prior to any analyses, the employment and opioid prescription variables were transformed into per capita measurements using population data from the 2010 U.S. Census. The data were then split into “low education” and “high education” groups using education level measurements from the 2000 U.S. Census. The outcome/predictor variables were also logged, presumably to account for non-linearity since OLS regression was used.

The authors ran numerous models, including standard OLS, OLS with IV, and OLS with IV and county-level fixed effects (all models contained fixed effects for year and quarter).  Each of these models was run bidirectionally, with opioid prescriptions per capita and employment-to-population ratio alternately serving as the dependent and independent variables. Acting under the assumption that there was a delay in the effect of the predictor on the outcome, the independent variable was lagged in each regression. The incorporation of county-level fixed effects allowed the authors to reduce omitted variable bias by controlling for heterogeneity across counties and observed and unobserved time-invariant factors.

In the analysis with employment-to-population ratio as the independent or predictor variable, a Bartik-style shift-share instrument was incorporated. Bartik-style shift-share instruments are commonly used in labor economics to generate an estimate of local labor demand that accounts for local industry composition but is based on national-level changes in industry. Use of this instrument enabled the authors to better isolate employment changes due to shifts in labor supply (which is what the authors hypothesize would be affected by increasing opioid prescriptions) from any demand-side driven fluctuations. This shift-share variable is an ideal IV for this analysis because it’s both highly correlated with employment and there is no direct link to opioid prescription rates.

Similarly, an IV for opioid prescriptions per capita in people aged 65 and older of the same gender was used in the model with opioid prescription rates as the independent or predictor variable. This allowed the authors to isolate the effect of opioid prescription rates on employment from local prescriber behavior, which likely has a large impact on the number of opioid prescriptions by county. Underlying the choice of this IV was the assumption that the location where elderly and working-age individuals get their opioid prescriptions is highly correlated. Additionally, it’s doubtful that there is any direct relationship between opioid prescriptions in the elderly and employment in working age adults.

The authors present results from each stage of the model-building process (OLS, OLS with just county-level fixed effects or IV, OLS with county-level fixed effects and IV), as well as descriptive statistics for the two outcome variables of interest. The detailed nature of the data enabled the authors to analyze the relationship between opioid prescriptions per capita and employment to population ratio by gender, age group, and education level of counties. Because both the dependent and independent variables were logged, the results correspond to elasticities (1% change in independent variable associated with β1% change in dependent variable). Additional analyses controlling for the potential confounder of percent insured in both models did not affect the main findings.

Results from the primary models of interest (those incorporating both IV and county-level fixed effects) diverged in terms of significance of findings. For the regression assessing the legal impact of opioid prescriptions on employment-to-population ratios, results for women indicate that a 10% increase in opioid prescriptions per capita would lead to an increase in employment of .38% in high-education counties and .52% in low-education counties, with no corresponding relationship in men. The authors interpret this positive relationship as suggesting that legally-prescribed opioids may be allowing women suffering from chronic pain to remain in the workforce longer. In the analysis examining employment-to-population ratios on opioid prescriptions per capita, the evidence was less consistent across model specifications and suggests that there isn’t a clear causal link between employment level and opioid prescriptions within counties.

The authors come to the overall conclusion that there isn’t a definitive relationship between opioid prescription and employment, and any causal relationship may not be bidirectional. They also note that opioid abuse in specific geographic areas could be more related to factors like longer-term economic disruptions and prescribing behavior. Despite the lack of strong results, this study has important implications from a policy perspective since the observed association between legal opioid use and employment indicates that policy interventions focused on the workplace might be effective.

The inconclusive study findings may also be a product of the opioid-use variable being restricted to legal prescriptions. Focusing instead on illegal opioid use is an interesting (although difficult) area for future research, as illicit use is largely confined to abusers, whose ability to maintain employment is more affected by opioid use.  This relationship would likely be stronger than that using legal opioid prescriptions since that measure captures functional users as well as abusers.

Understanding the potential risks and opportunities with naloxone

According to the Centers for Disease Control and Prevention (CDC) Annual Surveillance Report of Drug-Related Risks and Outcomes report, opioid-related overdose mortality has increase from 2.9 per 100,000 in 1999 to 10.4 in 2015. Several strategies have been implemented to address this opioid crisis, which including federal regulation on the drug supply through Prescription Drug Management Programs (PDMP), opioid overdose education and naloxone distribution programs, and Good Samaritan Laws to prevent bystanders from being arrested for possessing illicit drugs. Despite these, the rising rate of opioid-overdose mortality continues to increase.

Naloxone_2_(cropped)

A key strategy to help patients and their family/friends reverse opioid overdose is naloxone, an opioid reversal agent. However, debate about its use among non-emergency medical services handicaps its ability to make a greater impact on opioid-related mortality. As part of the United States (U.S.) Department of Veterans Affairs (VA), I’ve observed the struggles and rewards of getting naloxone into the hands of patients and their family and friends; educating providers about opioid overdose risk, recognition, and response; and promoting a culture of patient-centered care.

As a PhD candidate in the Comparative Health Outcomes, Policy, & Economics (CHOICE) Institute working on behavior changes regarding naloxone prescribing, I have been exposed to a number of research studies associated with naloxone safety and efficacy. Although there is ample evidence that naloxone is effective and safe in reversing opioid overdose, several limitations exists. In a recent paperin the Annals of Internal Medicine, Chou and colleagues identified several knowledge gaps about naloxone use by emergency medical services such as the best route of administration, titration to respiration versus consciousness, repeat dosing, and transportation after an opioid overdose event. These gaps do not, however, indicate that naloxone is ineffective. In fact, they highlight the importance of our limited understanding of the opioid overdose epidemic that plagues the United States.

My colleague Elizabeth M. Oliva, program director of the U.S. Veterans Health Administration Opioid Overdose Education and Naloxone Distribution (OEND) Program, and I co-authored an accompanying editorialon the findings from Chou and colleagues. In addition to identifying the limitations of Chou and colleagues paper, we reminded the readers that naloxone is still a necessary and critical strategy in preventing opioid overdose mortality, which should incorporate the patient’s caregivers and the layperson. Specifically, we write that “[F]uture investigations should examine whether naloxone delivery by [caregivers and laypersons] may have outcomes that are similar to, if not better than, waiting for EMS to arrive ‘in the nick of time.’” The role of caregivers and layperson in preventing opioid overdose remains controversial. Some states still do not have naloxone distribution programsand providers continue to harbor stigmaassociated with naloxone and illicit drug, which combines to aggravate the opioid crisis.

The potential for moral hazard behavior among patients at-risk for opioid-related overdose continually fuels the stigma regarding naloxone use. Moral hazard is the phenomenon where subjects assume a reckless/risky behavior with the knowledge that they are not responsible for the consequences of their actions. Hence, providers are unlikely to write for naloxone thinking that their patients, uninhibited by the consequences of opioid-related overdose, will adopt riskier behavior with opioids and illicit drugs.

Debate about the moral hazard issues generated from state laws on naloxone access continues to be fueled by conflicting evidence. Doleac and Mukherjee recently released an unpublished studythat implicated naloxone as a potential cause of increased opioid-related events, misuse, and social harm. Their findings indicate that state naloxone distribution laws and Good SamaritanLaws are causes of a moral hazard issue associated with naloxone. In other words, policies that are liberal in naloxone distribution induce risky behavior resulting in increased opioid-related events. Their findings are in direct conflict with other reports. In an unpublished National Bureau of Economic Research paper, Rees, et al reported no association between policies associated with naloxone distribution and opioid-related events despite using similar methods. Moreover, a recently accepted manuscriptby McClellan and colleagues reported that states with naloxone access laws and Good Samaritan Laws were significantly associated with reduced incidence of opioid overdose mortality.

These conflicting findings have sparked debate about the role of naloxone in the opioid crisis and the distribution of papers unvetted by peer review. Critics of the report by Doleac and Mukherjee have pointed out that the treatment variable (passage of state laws associated with naloxone distribution) have several limitations. Frank, Humphreys, and Pollack argued that the state laws regarding naloxone have different goals or intentions (e.g., providing naloxone to anyone, immunity laws associate with naloxone use), may not have immediate effects, and do not capture other policy-level effects such as Medicaid expansion, federal grants to increase naloxone purchases, and increased mental health services for substance use disorder. Critics conclude that naloxone laws have little to no effect on naloxone use and further discussion are required to understand this phenomenon.

As the debate regarding the use of naloxone to prevent opioid overdose mortality continues, it is clear that treatment of opioid use disorder has become a priority and a burden to the U.S. healthcare system. One thing is certain, naloxone is effective at saving lives threatened by opioid overdose and is an essential strategy for addressing the opioid crisis. Withholding this life-saving medication is antithesis to the overall goal of public health.

Is there still value in the p-value?

not sure if significantDoing science is expensive, so a study that reveals significant results yet cannot be replicated by other investigators, represents a lost opportunity to invest those resources elsewhere. At the same time, the pressure on researchers to publish is immense.

These are the tensions that underlie the current debate about how to resolve issues surrounding the use of the p-value and the infamous significance threshold of 0.05. This measurement was adopted in the early 20th century to indicate the probability that the observed results are obtained by chance variation, and the 0.05 threshold has been with it since the beginning, allowing researchers to declare as significant any effect they find that can cross that threshold.

This threshold was selected for convenience in a time when computation of the p-value was difficult to calculate. Our modern scientific tools have made calculation so easy, however, that it is hard to defend a 0.05 threshold as anything but arbitrary. A group of statisticians and researchers is trying to rehabilitate the p-value, at least for the time being, so that we can improve the reliability of results with minimal disruption to the scientific production system. They hope to do this by changing the threshold for statistical significance to 0.005.

In a new editorial in JAMA, Stanford researcher John Ioannidis, a famous critic of bias and irreproducibility in research, has come out in favor of this approach. His argument is pragmatic. In it, he acknowledges that misunderstandings of the p-value are common: many people believe that a result is worth acting on if it is supported by a significant p-value, without regard for the size of the effect or the uncertainty surrounding it.

Rather than reeducating everyone who ever needs to interpret scientific research, then, it is preferable to change our treatment of the threshold signaling statistical significance. Ioannidis also points to the success of genome-wide association studies, which improved in reproducibility after moving to a statistical significance threshold of p < 5 x 10-5.

As Ioannidis admits, this is an imperfect solution. The proposal has set off substantial debate within the American Statistical Association. Bayesians, for example, see it as perpetuating the same flawed practices that got us into the reproducibility crisis in the first place. In an unpublished but widely circulated article from 2017 entitled Abandon Statistical Significance [pdf warning], Blakely McShane, Andrew Gelman, and others point to several problems with lowering the significance threshold that make it unsuitable for medical research.

First, they point out that the whole idea of the null hypothesis is poorly suited to medical research. Virtually anything ingested by or done to the body has downstream effects on other processes, almost certainly including the ones that any given trial hopes to measure. Therefore, using the null hypothesis as a straw man takes away the focus on what a meaningful effect size might be and how certain we are about the effect size we calculate for a given treatment.

They also argue that the reporting of a single p-value hides important decisions made in the analytic process itself, including all the different ways that the data could have been analyzed. They propose reporting all analyses attempted, in an attempt to capture the “researcher degrees of freedom” – the choices made by the analyst that affect how the results are calculated and interpreted.

Beyond these methodological issues, lowering the significance threshold could increase the costs of clinical trials. If our allowance for Type I error is reduced by an order of magnitude, our required sample size roughly doubles, holding all other parameters equal. In a regulatory environment where it costs over a billion dollars to bring a drug to market, this need for increased recruitment could drive up costs (which would need to be passed on to the consumer) and delay the health benefits of market release for good drugs. It is unclear whether these potential cost increases will be offset by the savings of researchers producing more reliable, reproducible studies earlier in the development process.

It also remains to be seen whether the lower p-value’s increased sample size requirement might dissuade pharmaceutical companies from bringing products to market that have a low marginal benefit. After all, you need a larger sample size to detect smaller effects, and that would only be amplified under the new significance thresholds. Overall, the newly proposed significance threshold interacts with value considerations in ways that are hard to predict but potentially worth watching.

Generating Survival Curves from Study Data: An Application for Markov Models

By Mark Bounthavong

Mark_Headshot
CHOICE Student Mark Bounthavong

In cost-effectiveness analysis (CEA), a life-time horizon is commonly used to simulate the overall costs and health effects of a chronic disease. Data for mortality comparing therapeutic treatments are normally derived from survival curves or Kaplan-Meier curves published in clinical trials. However, these Kaplan-Meier curves may only provide survival data up to a few months to a few years, reflecting the length of the trial.

In order to adapt these clinical trial data to a lifetime horizon for use in cost-effectiveness modeling, modelers must make assumptions about the curve and extrapolate beyond what was seen empirically. Luckily, extrapolation to a lifetime horizon is possible using a series of methods based on parametric survival models (e.g., Weibull, exponential). Performing these projections can be challenging without the appropriate data and software, which is why I wrote a tutorial that provides a practical, step-by-step guide to estimate a parameter method (Weibull) from a survival function for use in CEA models.

I split my tutorial into two parts, as described below.

Part 1 begins by providing a guide to:

  • Capture the coordinates of a published Kaplan-Meier curve and export the results into a *.CSV file
  • Estimate the survival function based on the coordinates from the previous step using a pre-built template
  • Generate a Weibull curve that closely resembles the survival function and whose parameters can be easily incorporated into a simple three-state Markov model

Part 2 concludes with a step-by-step guide to:

  • Describe how to incorporate the Weibull parameters into a Markov model
  • Compare the survival probability of the Markov model to the reference Kaplan-Meier curve to validate the method and catch any errors
  • Extrapolate the survival curve across a lifetime horizon

The tutorial requires using and transferring data across a couple of different software. You will need to have some familiarity with Excel to perform these parametric simulations. You should download and install the open source software “Engauge Digitizer” developed by Mark Mitchell, which can be found here. You should also download and install the latest version of R and RStudio to generate the parametric survival curve parameters.

Hoyle and Henley wrote a great paper on using data from a Kaplan-Meier curve to generate parameters for a parametric survival model, which can be found here. The tutorial makes use of their methods and supplemental file. Specifically, you will need to download their Excel Template to generate the parametric survival curve parameters.

I have created a public folder with the relevant files used in the tutorial here.

If you have any comments or notice any errors, please contact me at mbounth@uw.edu