Metrics in Clinical Medicine and Clinical Trials Should Look Beyond the P Value

June 30, 2022

Maurie Markman, MD

Article

In Partnership With:

From evaluating clinical outcomes to determining payment for services, measurements are important in health care.

Maurie Markman, MD

From evaluating clinical outcomes to determining payment for services, measurements are important in health care. Today, it would be difficult to envision a scenario where this would not be the case. Although experience is highly relevant in the clinical domain—from a surgeon’s decision to remove or not remove a patient’s large vascular tumorous mass to a medical oncologist’s recommendation that a patient to try 1 more antineoplastic regimen—the clinical value of these critical but subjective individual determinations must also be examined based on the objective results. For example, did the patients highlighted above suffer excessive measurable adverse effects (AEs) vs the time they spent out of the hospital? What was the ultimate duration of their measured survival?

One can rigorously debate the specific role the patient/family vs the health care system/payor should have in the assessment of “value.” In the opinion of this commentator, it is difficult to argue against the hypothesis that such an evaluation should be rationally based, at least in part, on an examination of objectively measured outcomes.

However, when moving beyond the individual, a discussion of the role of metrics becomes even more complex and potentially problematic. How does a claimed “statistically superior or inferior” result of a clinical trial examining a diagnostic or therapeutic treatment approach in carefully selected patients with cancer relate to an individual seen in clinical practice? For example, factors such as age, ethnic background, known or potentially unknown relevant comorbidities, physiologic measurements, life experiences, nutrition, family support, psychological profile, and willingness to assume and understand risk might have an influence on the relevance of a trial result for an individual or for the providers advising their patients.

It also must be acknowledged that observed metrics in a clinical trial setting can be intentionally or unintentionally manipulated to achieve a superior outcome. This commonly occurs through the exclusion of particular groups of individuals from clinical trials using criteria such as age, prior therapy, comorbidities, and medication use, among others.

The anticipated claim for such actions by study sponsors and regulators is to say that they are attempting to protect participants from serious AEs. These decisions to exclude individuals provide the opportunity for a treatment strategy to be approved by a regulatory agency and/or have results published in the peer-reviewed literature before it is objectively known if this regimen is safe for individuals with clinical features that were not permitted in the study population.

This concern is far from uncommon in oncology studies, which routinely exclude or subtly discourage admission of individuals with, for example, a history of cardiac events or mild-to-moderate renal dysfunction, or those requiring a variety of pharmaceutical agents for chronic nononcologic related illnesses.

Further, despite the recognized value and need for randomized trials in oncology, they are the most susceptible to the concerns highlighted above. This is in part because of the requirement to have a population of patients that is as homogeneous as possible to isolate the therapeutic effects, such as time to disease progression or overall survival, and the AE profile of the approach being evaluated. Imbalances within the randomly assigned populations can complicate the objective interpretation of a study end point.

The protocol for randomized trials is appreciated, but it must not be forgotten that the actual aim of such trials should be to evaluate the potential benefit and risk of therapy in patients who are reflective of the real-world setting, after the strategy leaves the well-contained universe of a clinical trial. How does the community of patients with cancer benefit if large populations of individuals are denied access to potentially effective strategies? Should providers make educated guesses regarding efficacy vs safety and appropriate dosing because of a lack of essential trial-based data?

A final question concerns the legitimacy of the authority for randomized trials as the gold standard for objectively determining clinical relevance, using what has been quite arbitrarily labeled statistical significance. For example, there has been considerable discussion in the medical literature regarding the lack of understanding, misinterpretation, and inadequate reporting of P values in such efforts.¹

As one commentator pointedly noted: “Fundamentally, statistical inference using P values involves mathematical attempts to facilitate the development of explanatory theory in the context of random error. However, P values provide only a particular mathematical description of a specific data set and not a comprehensive scientific explanation of cause-and-effect relationships in a target population.”²

A second commentator was even more direct in criticizing common misconceptions regarding this statistical test: “A P value of 0.05 does not mean that there is a 95% chance that a given hypothesis is correct. Instead, it signifies that if the null hypothesis is true, and all other assumptions made are valid, there is a 5% chance of obtaining a result at least as extreme as the one observed. And a P value cannot indicate the importance of a finding; for instance, a drug can have a statistically significant effect on a patient’s blood glucose levels without having a therapeutic effect.”³

The danger of relying on a P value to determine statistical significance and to subsequently convert this number to define clinical benefit was highlighted in a recent report evaluating the results of randomized controlled trials (RCTs) for various interventions for COVID-19.⁴ The investigators found that “a relatively small number of events (a median of 4) would be required to change the results of COVID-19 RCTs from statistically significant to not significant,” calling into question the reliability of any single study to define both statistical significance and clinical utility.

The potential absurdity of employing P values to define clinical relevance in oncology is highlighted by a relatively recent article in a high-impact medical journal. The retrospective review sought to compare 258 patients who received complementary medicine (0.0136% of the total population) with 1,901,557 patients in the control group to reach its claim of statistical significance and impressive P values (< .001). The authors wrote, “patients who received CM [complementary medicine] were more likely to refuse additional CCT [conventional cancer treatment] and had a higher risk of death.”⁴

There is much to criticize regarding this report, including the definition of complementary medicine employed by the authors. Apparently, this paper’s reviewers and the journal editor were impressed with this P value and ignored the profoundly small, highly biased sample size and patient population and the highly debatable conclusions.

Whether it was Mark Twain or Benjamin Disraeli who said it first, there remains much truth to the expression, “There are three kinds of lies: lies, damned lies, and statistics.”

References

Chavalarias D, Wallach JD, Li AHT, Ioannidis JPA. Evolution of reporting P values in the biomedical literature, 1990-2015. JAMA. 2016;315(11):1141-1148. doi:10.1001/jama.2016.1952
Kyriacou DN. The enduring evolution of the P value. JAMA. 2016;315(11):1113-1115. doi:10.1001/jama.2016.2152
Baker M. Statisticians issue warning on P values. Nature. 2016;531(7593):151. doi:10.1038/nature.2016.19503
Johnson SB, Park HS, Gross CP, Yu JB. Complementary medicine, refusal of conventional cancer therapy, and survival among patients with curable cancers. JAMA Oncol. 2018;4(10):1375-1381. doi:10.1001/jamaon-col2018.2487