Theresa Mariero Olasveengen, Robin J. Prescott, Jo Kramer-Johansen
In this issue of Resuscitation, Yannopoulos and colleagues report results from a post hoc analysis of their previously published NIH PRIMED-trial. As we are growing accustomed to in cardiac arrest research, the NIH PRIMED-trial was a large randomized controlled clinical trial unable to prove any survival benefit from a promising intervention. Indeed the small and non-significant difference favoured the sham device. Recent years have seen many efforts to demonstrate improved survival through randomized controlled trials of treatment options such as mechanical chest compression devices,, , drugs,, , and hypothermia (as opposed to temperature management). Many of these treatments were widely used prior to these trials, and practice remains largely unaffected by neutral study results. Enormous resources have been allocated to perform these trials, but rather than question the validity of interventions tested, fundamental questions about study design and methodology are being raised. How can we design studies with enough statistical power to provide definitive answers? Do we need to be more selective in patient inclusions, to avoid “thinning out” our results with cardiac arrest victims beyond resuscitation and the patients that will respond to the initial treatment however it is performed, and ensure a targeted selection of patients that can be exposed to and thus possibly respond to our intervention? Should patients who receive suboptimal CPR be excluded from randomized trials to ensure we are measuring the true potential any interventions, or do we want the interventions to be proven effective in pragmatic trials where quality of CPR is less controlled? And what is quality of CPR, does every aspect of measured CPR performance affect survival or the effect of an intervention, or might even different aspects of CPR performance affect different interventions in different ways?
As the authors of this current paper illustrate, the general problem of studying the effect of an intervention in a heterogeneous patient population is more complicated than simply “thinning out” the results with patients beyond resuscitation. A piece of the “why-all-cardiac-arrest-trials-are-neutral”-puzzle might be the presence of subgroups within the study populations with significantly different risks and benefits from the tested interventions, masking any potential positive and/or negative effects within the entire study population. Such heterogeneity in treatment effect or “effect modification” occurs when effect sizes vary in size and/or direction between identifiable subgroups. The most intuitive example could perhaps be a targeted treatment for an underlying cause. It seems reasonable that thrombolytic therapy or percutaneous revascularization of an acutely occluded coronary artery could prove beneficial during cardiac arrest for a selected subgroup. However, if any of these treatments are tested in an unselected population, risk and unwanted side-effects among patients without an acutely occluded coronary artery might cancel out the positive effects seen in patients that could be expected to benefit.
Yannopoulos and colleagues suggest such effect modification might explain the lack of effect in their original randomized controlled trial. By stratifying patients with high quality CPR and thereby arguably lower risk of bad outcome, they find statistically significant benefit from the impedance threshold devise (ITD). As overall results were slightly in the opposite direction, the other side of the coin is therefore that the stratum with poor CPR quality had worse outcome with the ITD. The clinical implications of these results are not obvious. At a minimum these analyses demonstrate the importance of measuring CPR quality during any cardiac arrest trial. And although high quality CPR has been increasingly emphasized in recent guidelines, interpreting associations between CPR quality parameters and clinical outcome is tricky. One such example is the landmark paper showing increased chest compression fraction is independently predictive of survival. When reviewing the data it seems that outcome improves steadily until chest compressions are delivered 80% of the time. Once this “threshold” is achieved, survival drops. Rather than conclude that too much CPR is harmful, it is a common interpretation that the patients who are easy to resuscitate will have shorter resuscitation efforts characterized by frequent interruptions commonly seen during the initial resuscitation phase where paramedics might be busy performing the initial defibrillation, securing the airway, establishing intravenous access, optimizing the patient position, etc. Patients most likely to survive would then have poor CPR quality compared to patients with prolonged and futile CPR efforts where average values of CPR quality would be better. Similarly, patients with non-shockable rhythms and poor prognosis would be expected to have better average CPR quality compared to VF patients with shorter resuscitations and pauses to defibrillate. Providing one standard for adequate CPR quality might not be enough.
The intent-to-treat principle in a well-conducted randomized controlled trial is the gold standard for avoiding bias. We might wish to randomize only patients who receive optimal CPR, but to date we have no way of rigorously ensuring optimal CPR quality across different EMS systems and between individual patients. This is highlighted by the fact that even in what we would expect to be highly motivated and well-functioning study sites, more than half of included patients received substandard CPR. Out-of-hospital cardiac arrest trials are, by their nature, almost inevitably pragmatic trials, evaluating any benefit a non-research EMS system could expect to replicate by implementing the same interventions. Consequently, variable CPR quality and variable effect of the ITD is expected.
Any treatment option that might increase survival among a subgroup with lower risk of death at the expense of another subgroup with higher risk of death would clearly be unethical. However, the idea that risk stratification guides treatment choices is not new. In resuscitation such stratification is used to guide the extremes; when to abstain from or terminate resuscitation, and when to initiate expensive and resource demanding extra-corporal life support. All the shades in between are largely unexplored. In the future we might have prediction models indicating greater risk of cardiac instability or cerebral insult that might warrant specific treatments. But these treatments need to be tested a priori. Exploring possible effect modifiers in resuscitation research might be the key to deriving the pathophysiological hypothesis and empiric data we need to design better trials.
There are many stones left to turn, both in terms of study design and more targeted interventions. But to ensure we are operating in the best interest of our patients, our science needs to be based on a priori tested hypothesis, not post hoc analysis. In other words, the results from Yannopoulos and colleagues are not actionable and should not guide current practice. To argue that they should is ethically dubious. Having said that, post hoc analyses are invaluable in providing ideas for alternative mechanisms and study designs, and should therefore be read with great enthusiasm to inform future research.
Olasveengen, Prescott and Kramer-Johansen have no conflicts of interest.