ISPOR’s Special Task Force on US Value Assessment Frameworks: A summary of dissenting opinions from four stakeholder groups

By Elizabeth Brouwer


IsporLogo2018bg

The International Society for Pharmacoeconomics and Outcomes Research (ISPOR) recently published an issue of their Value in Health (VIH) journal featuring reports on Value Assessment Frameworks. This marks the culmination of a Spring 2016 initiative “to inform the shift toward a value-driven health care system by promoting the development and dissemination of high-quality, unbiased value assessment frameworks, by considering key methodological issues in defining and applying value frameworks to health care resource allocation decisions.” (VIH Editor’s note) The task force summarized and published their findings in a 7-part series, touching on the most important facets of value assessment. Several faculty of the CHOICE Institute at the University of Washington authored portions of the report, including Louis Garrison, Anirban Basu and Scott Ramsey.

In the spirit of open dialogue, the journal also published commentaries representing the perspectives of four stakeholder groups: payers (in this case, private insurance groups), patient advocates, academia, and the pharmaceutical industry. While supportive of value assessment in theory, each commentary critiqued aspects of the task force’s report, highlighting the contentious nature of value assessment in the US health care sector.

Three common themes emerged, however, among the dissenting opinions:

  1. Commenters saw CEA as a flawed tool, on which the task force placed too much emphasis

All commentaries except the academic perspective bemoaned the task force’s reliance on cost-effectiveness analysis. Payers, represented in an interview of two private insurance company CEOs, claimed that they do not have a choice on whether to cover most new drugs. If it’s useful at all, then, CEA informs the ways that payers distinguish between drugs of the same class. The insurers went on to claim that they are more interested in the way that CEA can highlight high-value uses for new drugs, as most are expected to be expensive regardless.

Patient advocates also saw CEA as a limited tool and were opposed to any value framework overly dependent on the cost per QALY paradigm.  The commentary equated CEAs to clinical trials—while informative, they imperfectly reflect how a drug will fare in the real world. Industry representatives, largely representing the PhRMA Foundation, agreed that the perspective provided by CEAs is too narrow and shouldn’t be the cornerstone for value assessment, at least in the context of coverage and reimbursement decisions.

  1. Commenters disagreed with how the task force measured benefits (the QALY)

All four commentaries noted the limitations the quality-adjusted life-year (QALY). The patient advocates and the insurance CEOs both claimed that the QALY did not reflect their definition of health benefits. The insurance representatives reminded us that their businesses don’t give weight to societal value because it is not in their business model. Similarly, the patient advocate said the QALY did not reflect patient preferences, where value is more broadly defined. The QALY, for example, does not adequately capture the influence of health care on functionality, ability to work, or family life. The patient advocate noted that while the task force identified these flaws and their methodological difficulties, it stopped short of recommending or taking any action to address them.

Industry advocates wrote that what makes the QALY useful—it’s ability to make comparisons across most health care conditions and settings—is also what makes it ill-suited for use in a complex health care system. Individual parts of the care continuum cannot be considered in isolation. They also noted that the QALY is discriminatory to vulnerable populations and was not reflective of their customers’ preferences.

Mark Sculpher, Professor at the University of York representing health economic theory and academia, defended the QALY to an extent, noting that the measure is the most suitable available unit for measuring health. He acknowledged the QALY’s limitations in capturing all the benefits of health care, however, and noted that decision makers and not economists should be the ones defining benefit.

 

  1. Commenters noticed a disconnect between the reports and social/political realities

Commenters seemed disappointed that the task force did not go further in directing the practical application of value assessment frameworks within the US health care sector. The academic representative wrote that, while economic underpinnings are important, ultimately value frameworks need to be useful to, and reflect the values of, the decision makers. He argued that decision-makers’ buy-in is invaluable, as they hold the power to implement and execute resource allocation. Economics can provide a foundation for this but should not be the source of judgement relating to value if the US is going to take-up value assessment frameworks to inform decisions.

Patient advocates and industry representatives went further in their criticism, saying the task force seemed disconnected from the existing health care climate. The patient advocate author felt the task force ignored the social and political realities in which health care decisions are made. Industry representatives pointed out that current policy, written in the Patient Protection and Affordable Care Act (PPACA), prohibited a QALY-based CEA because most decision makers in the US believe it inappropriate for use in health care decision making. Both groups wondered why the task force continued to rely on CEA methodology when it had been prohibited by the public sector.

 

The United States will continue to grapple with value assessment as it seeks to balance innovation with budgetary constraints. The ISPOR task force ultimately succeeded in its mission, which was never to specify a definitive and consensual value assessment framework, but instead to consider “key methodological issues in defining and applying value frameworks to health care resource allocation decisions.”

The commentaries also succeeded in their purpose: highlighting the ongoing tensions in creating value assessment frameworks that stakeholders can use. There is a need to improve tools that value health care to assure broader uptake, along with a need to accept flawed tools until we have better alternatives. The commentaries also underscore a chicken-and-egg phenomenon within health care policy. Value assessment frameworks need to align with the goals of decision-makers, but decision-makers also need value frameworks to help set goals.

Ultimately, Mark Sculpher may have summarized it best in his commentary. Value assessment frameworks ultimately seek to model the value of health care technology and services. But as Box’s adage reminds us: although all models are wrong, some are useful. How to make value assessment frameworks most useful moving forward remains a lively, complex conversation.

Reminders About Propensity Scores

Propensity score (PS)-based models are everywhere these days.  While these methods are useful for controlling for unobserved confounders in observational data and for reducing dimensionality in big datasets, it is imperative that analysts should use good judgement when applying and interpreting PS analyses. This is the topic of my recent methods article in ISPOR’s Value and Outcomes Spotlight.

I became interested in PS methods during my Master’s thesis work on statin drug use and heart structure and function, which has just been published in Pharmacoepidemiology and Drug Safety. To estimate long-term associations between these two variables, I used the Multi-Ethnic Study of Atherosclerosis (MESA), an observational cohort of approximately 6000 individuals with rich covariates, subclinical measures of cardiovascular disease, and clinical outcomes over 10+ years of follow-up. We initially used traditional multivariable linear regression to estimate the association between statin initiation and progression of left ventricular mass over time but found that using PS methods allowed for better control for unobserved confounding. After we generated PS for the probability of starting a statin, we used matching procedures to match initiators and non-initiators, and estimated an average treatment effect in the treated. Estimates from both traditional regressions and PS-matching procedures found a small, dose-dependent protective effect of statins against left ventricular structural dysfunction. This finding of very modest association contrasts with findings from much smaller, short-term studies.

I did my original analyses using Stata, where there are a few packages for PS including psmatch2 and teffects. My analysis used psmatch2, which is generally considered inferior to teffects because it does not provide proper standard errors. I got around this limitation, however, by bootstrapping confidence intervals, which were all conservative compared with teffects confidence intervals.

pscores1
Figure 1: Propensity score overlap among 835 statin initiators and 1559 non-initiators in the Multi-Ethnic Study of Atherosclerosis (MESA)

Recently, I gathered the gumption to redo some of the aforementioned analysis in R. Coding in R is a newly acquired skill of mine, and I wanted to harness some of R’s functionality to build nicer figures. I found this R tutorial from Simon Ejdemyr on propensity score methods in R to be particularly useful. Rebuilding my propensity scores with a logistic model that included approximately 30 covariates and 2389 participant observations, I first wanted to check the region of common support. The region of common support is the overlap between the distributions of PS for the exposed versus unexposed, which indicates the comparability of the two groups. Sometimes, despite fitting the model with every variable you can, PS overlap can be quite bad and matching can’t be done. But I was able to get acceptable overlap on values of PS for statin initiators and non-initiators (see Figure 1). Using the R package MatchIt to do nearest neighbor matching with replacement, my matched dataset was reduced to 1670, where all statin initiators matched. I also checked covariate balance conditional on PS in statin initiator and non-initiator groups. Examples are in Figure 2.  In these plots, the LOWESS smoother is effectively calculating a mean of the covariate level at the propensity score. I expect the means for statin initiators and non-initiators to be similar, so the smooths should be close. In the ends of the age distribution, I see some separation, which is likely to be normal tail behavior. Formal statistical tests can also be used to test covariates balance in the newly matched groups.

pscores2
Figure 2: LOWESS smooth of covariate balance for systolic blood pressure (left) and age (right) across statin initiators and non-initiator groups (matched data)

Please see my website for additional info about my work.