Maurie Markman, MD
The term statistically significant
is almost certainly beautiful music to the ears of clinical investigators and pharma/biotech companies. This trial-related event likely means the opportunity to publish in a high-impact medical journal and, in the case of an industry-sponsored effort, may lead to regulatory approval or an increase in sales. And why challenge this scenario?
After all, although the world of statistics is a rather foreign place to most clinicians, we all know what the word significant
implies. Webster’s New Collegiate Dictionary
defines the term as “having meaning.” And who would argue that having meaning is not a good thing?
However, concern develops when one inquires how the most common test of significance, the P
value, is used in clinical investigative efforts and whether at times this is more harmful than helpful within the domain of cancer medicine. The P
value is a reasonable mathematical test that originated from a simple desire to show the difference between an observed experimental outcome and what might have been expected by chance alone. One might rationally suggest that today this simple test has been hijacked by those who desire a single absolute answer to the question of significance and clinical benefit.
<.05, there is a statistically significant difference in outcomes between 2 or more arms of a randomized trial, and if P
>.05, this difference is insignificant. How much easier than this can it be?
To highlight just 1 aspect of this disquieting, overly simplistic reasoning regarding statistical testing, consider a report where 522 consulting biostatisticians were asked (390 responded) whether they had received what they considered to be “inappropriate requests to modify/falsify/underreport a statistical analysis to favorably enhance the study outcome.”1
Remarkably, 20% of the respondents reported such concerning requests.
How influential—one might even suggest magical—is this P
<.05 number in clinical cancer research? Consider, for a moment, the efforts undertaken in a recent report of a phase III randomized trial that compared the cytotoxic drug combination of carboplatin and paclitaxel with these same 2 agents plus bevacizumab [Avastin] as second-line treatment of ovarian cancer.2
value for overall survival (OS) in the group receiving bevacizumab was not statistically significant (P
= .056), but by identifying “incorrect treatment-free interval stratification data for 45 patients,” the investigators found it was possible to adjust the P
value downward to P
= .0447, thus creating a “statistically significant” survival outcome. Remarkably, this paper concluded that this so-called sensitivity analysis “indicates that this [experimental drug combination] might be an important addition to the therapeutic armamentarium in these patients.”
The fundamental question to be asked is: Would the beneficial value of this therapeutic intervention be objectively less certain if P
= .056? Why not simply report the survival data (both progression-free and overall) and median landmark outcomes (1 year, 2 year, etc) and let clinicians and patients decide for themselves whether the results are clinically relevant?
Why would one assume that simply because it is possible to manipulate data to alter a P
>.05 result to P
<.05, somehow truth is now revealed
. A final critical question is this: Does a P
value of <.05 or >.05 meaningfully define the potential utility of a strategy for an individual patient or group of patients?
In this discussion, one should acknowledge that members of the academic statistician community have raised their own objections to the so-called P
<.05 gold standard, with some suggesting the definition of statistical significance should be made even more stringent (P
However, it is relevant to inquire whether making it even more difficult for a novel cancer therapeutic strategy to achieve the lofty goal of statistical significance helps patients with cancer or simply makes the math more impressive.
The basic issue here is the critical distinction between statistical significance and clinical relevance or value. Simply because a given outcome has achieved the statistical gold standard does not guarantee it has clinical value. To appreciate the lack of direct linkage, one need search no further than the remarkable conclusion from a phase III randomized trial that compared erlotinib (Tarceva) plus gemcitabine versus erlotinib alone in the management of metastatic pancreatic cancer.4 In this study, there was a statistically significant (P
= .038) improvement in OS associated with adding erlotinib, but this translated to a median survival difference of only 10 days (6.24 vs 5.91 months) between the study populations.