New Criteria Needed to Evaluate Immunotherapy Responses

Most patients who appear to progress after they begin receiving anticancer immunotherapy will never exhibit a response to therapy.

Jeffrey Weber, MD, PhD

Most patients who appear to progress after they begin receiving anticancer immunotherapy will never exhibit a response to therapy. They will continue to progress, just as quickly as they would with no treatment and just as predicted by the Response Evaluation Criteria in Standard Tumors (RECIST) that have long guided clinical trial evaluations and treatment decisions. The trajectory is different, however, for a small minority of patients.

These patients seem to progress significantly after they begin treatment and sometimes keep on progressing for more than a month before they finally respond to the medication. The percentage of patients who follow this pattern varies among medications, tumor types, and other factors, but the average is probably somewhere between 4% and 5%, according to Jeffrey S. Weber, MD, PhD, deputy director of the Laura and Isaac Perlmutter Cancer Center, codirector of the Melanoma Program, and head of Experimental Therapeutics at New York University Langone Medical Center.

Delayed response, in other words, is hardly common, but it is common enough for many leading investigators to suspect that new immunotherapy trials would be better judged—and patient treatment would be better guided—by a newer rubric called the immune-related Response Criteria (irRC), which accounts for the phenomenon, rather than by RECIST standards that do not.

The FDA has partially accommodated this viewpoint. It allows the use of irRC for deciding how long to keep trial patients on a particular treatment, but it requires extra safety precautions that would not be needed if investigators used RECIST to time treatment decisions. The FDA also allows the use of irRC as a secondary endpoint in many immunotherapy trials.

The agency still gives greater weight, however, to results measured by RECIST version 1.1, and it plans to keep on favoring RECIST unless future research can prove irRC to be a superior proxy for extended survival.

In theory, the issue should matter little to clinical oncologists. Formal response evaluation criteria exist to guide research trials, not everyday practice. Indeed, standard patients rarely undergo enough testing for the timely application of either irRC or RECIST to treatment decisions. That said, there is plenty of anecdotal evidence that what doctors believe about trial standards can influence how they treat patients. Oncologists who intuitively support the use of irRC seem more likely to keep their own patients on immunotherapy after some apparent progression than oncologists who side with RECIST (or who do not even know of the challenge to RECIST).

“Definitive evidence in favor of the predictive power of either RECIST or irRC would certainly make life easier for investigators, who must currently work with both standards, but I’m not sure it would have any impact whatsoever on either new drug approvals or new label indications. The FDA is already allowing us to use irRC enough that most late responders do show up somewhere in the literature as responders,” said Weber, who thinks irRC is almost a more accurate survival proxy but also that its superiority is limited by the relatively small number of patients who seem to progress before responding.

The Start of the IRRC

“I’m more concerned about individual practitioners stopping treatment too early,” Weber added. “The assumption that any lesion growth or new lesion signals the failure of one treatment and the need for another has been around for decades. It is a fundamental assumption of RECIST, and that assumption appeared fully justified until immunotherapy came along. I suspect that most practitioners still abide by it for the most part and that it will take some time for the ideas that underlie irRC to be broadly accepted.”RECIST and irRC differ in many particulars, but the most important conceptual differences probably lie in their methods for measuring and their timetables for conceding disease progression. Any new lesions or significant growth in 5 or fewer carefully tracked “target lesions” constitutes progression under RECIST. The irRC, on the other hand, is more concerned with total tumor burden. It calls on investigators to track up to 15 lesions, and if the shrinkage of some lesions equals or exceeds growth elsewhere—even if that growth involves totally new lesions—irRC reports stable disease (SD) or responsive disease.

Even when aggregate tumor burden does grow, irRC does not conclude that a patient has progressed unless the initial findings are confirmed by a second assessment performed at least 4 weeks later. RECIST requires no confirmation to diagnose progression. A single test showing a single new lesion is enough.

RECIST was first published in 2000 and updated to version 1.1 in 2009. Investigators who wish to implement the criteria begin with baseline measurement of relatively large tumor lesions and lymph nodes, notation of smaller lesions and lymph nodes, and the subsequent calculation of total disease burden. Investigators then select up to 5 target lesions that seem representative of the patient’s total lesion population (and that are in a good place for subsequent measurements). Patients then begin treatment, stick with it for some preselected time (generally every 6-8 weeks in phase II trials), and then undergo a second round of disease evaluation.¹ RECIST standards currently define the possible results of those re-examinations as the terms included in TABLE 1.

Table 1. Recist Criteria¹

The speed and finality with which RECIST deemed treatments to have failed reflected the way cancer treatments had typically worked in clinical trials, right up until BristolMyers Squibb began testing ipilimumab (Yervoy), a cytotoxic lymphocyte antigen-4 inhibitor. Patients who were going to respond to any given treatment responded shortly after treatment began. Any progression, whether or not it came after a response, was typically thought to signal permanent treatment failure.

Oddly, the realization that things might be different with ipilimumab in particular, and with immunotherapy in general, arose from an insult. One imminent investigator taunted another for misreading a brain scan and documenting the development of a nonexistent brain tumor. The second investigator retorted that the first must be too blind to notice the patient’s very obvious brain tumor.

Rachel Humphrey, MD, who was then in charge of immunotherapy development at Bristol-Myers Squibb, said little while the argument raged in front of her. However, she found it difficult to believe that either of the very capable investigators would totally misread an image, so she asked her team to investigate further, and they found something totally unexpected: Neither investigator had erred. A brain tumor that had developed during an early-stage trial of ipilimumab, a tumor that was very visible on one scan, had simply vanished by the time the patient was scanned again.

Was that single patient a total fluke? Humphrey decided to find out.

“We gathered all our research data on ipilimumab to look for signs of responses coming after progression. We also asked the primary trial investigators whether they’d seen anyone get worse before getting better,” said Humphrey, who is now chief medical officer at CytomX Therapeutics in South San Francisco, California.

“They responded with a small-but-significant number of very vivid anecdotes, the most impressive of which featured the patient they dubbed Liver Guy to protect his privacy,” said Humphrey. “He started on ipilimumab at Memorial Sloan Kettering with no liver lesions, but his first follow-up scan showed lesions all over his liver.

His doctor wanted to discontinue treatment, but Liver Guy begged to continue on the grounds that, lesions or no lesions, he felt much improved. The liver lesions eventually disappeared, as did all the cancer, and he lived cancer-free for nearly a decade after his first treatment.”

Humphrey and her team believed that, taken in aggregate, the data and the anecdotes conclusively demonstrated that a significant minority of immunotherapy responders started off by progressing in ways never envisioned by traditional RECIST standards. They decided, therefore, to create new response criteria that would be based on older standards from the World Health Organization and engineered for immunotherapy.

Shortly thereafter, Humphrey assembled the primary investigators from her company’s ongoing ipilimumab trials, all among the world’s leading immunotherapy investigators, and she presented both the data demonstrating the need for different response criteria and the very first version of what became the irRC.

Humphrey felt she had made a slam-dunk case, but when she had finished, she looked around and saw something surprising: a decidedly hostile audience.

“They acknowledged that they had all heard cases of confusing responses but not enough to warrant a new set of criteria. They accused me and my team of trying to inflate the benefit of the drug using novel criteria,” she said. Yet the research community eventually came around.

In 2004 and 2005, approximately 200 oncologists, immunotherapists, and regulators got together to discuss their experience with immunotherapy treatments. That group recommended that a new set of response criteria be designed and retroactively tested via existing trial data in hopes that they would prove more predictive of actual outcomes than do RECIST data.

“Guidelines for the Evaluation of Immune Therapy Activity in Solid Tumors: Immune-Related Response Criteria” appeared in the December 2009 edition of Clinical Cancer Research. ² The current version of RECIST had appeared just a few months earlier, but the new paper explained why RECIST did not work well in the evaluation of the ipilimumab trial results.

The paper stated, “Ipilimumab monotherapy resulted in 4 distinct response patterns: (a) shrinkage in baseline lesions, without new lesions; (b) durable SD (in some patients followed by a slow, steady decline in total tumor burden); (c) response after an increase in total tumor burden; and (d) response in the presence of new lesions. All patterns were associated with favorable survival,” but only the first 2 would be judged as favorable outcomes by RECIST, even though RECIST is designed to be a proxy for survival.²

Table 2. Immune-Related Response Criteria²

The Pseudoprogression Phenomenon

The paper went on to introduce a new set of response criteria (TABLE 2) that would have been a far better proxy for the actual survival of ipilimumab. It did not, however, argue that these new criteria, the irRC, would necessarily be better for evaluating all immunotherapy trials. It merely recommended that investigators prospectively test the new criteria in future trials.Prospective analysis conducted since then has generally found irRC to be superior to RECIST as a response proxy for patients with cancer in immunotherapy trials, and the biggest reason is usually the inability of RECIST to account for the people who seemed to progress before responding.

Consensus opinion now holds that such patients are not actually progressing after they begin treatment or even when responding in a delayed fashion. They are instead responding in an unusual way that is now called pseudoprogression before they respond in more conventional ways. Thus, RECIST is still considered correct in its assumption that patients cannot respond after progression, but incorrect in its methods of detecting true progression.

A much-cited analysis of a stage Ib trial of pembrolizumab (Keytruda) found that 24 of 327 patients with melanoma with at least 28 weeks of imaging experienced either early or delayed pseudoprogression.³Among 592 patients who survived ≥12 weeks, 84 (14%) experienced progressive disease (PD) per RECIST but not per irRC. The 2-year overall survival rates were 77.6% in patients with non-PD per both criteria (n = 331), 37.5% in patients with PD per RECIST v1.1 only (n = 84), and 17.3% in patients with PD per both criteria (n = 177).

“Based on survival analysis, conventional RECIST might underestimate the benefit of pembrolizumab in approximately 15% of patients,” F. Stephen Hodi, MD, of Dana-Farber Cancer Institute, and his co-authors wrote in the Journal of Clinical Oncology. ³ “Modified criteria that permit treatment beyond initial progression per RECIST v1.1 might prevent premature cessation of treatment.”

Findings like that have led the FDA to allow investigators to keep trial patients on immunotherapies longer than RECIST criteria would advocate, but the agency demands they take extra precautions with such patients.

“To limit the potential risks associated with continuing a treatment beyond initial documentation of disease progression, trials using such modifications to the response criteria will incorporate safety considerations in the protocol to identify patients suitable for this approach (eg, absence of signs or symptoms of clinically significant disease progression, stable performance status, and absence of rapid disease progression at critical anatomical sites requiring urgent medical intervention),” an FDA spokesperson wrote in response to e-mailed questions.

The FDA is theoretically willing to fully embrace irRC or any other set of response criteria that prove themselves better than existing criteria, but agency officials declined to provide specific answers about what evidence they would need to consider irRC fully validated.

“There is certainly great interest for us in the use of efficacy endpoints that can provide earlier evidence of clinical activity. The FDA and others are examining different ways to measure clinical activity and efficacy for potential future use,” the spokesperson wrote. “The FDA is working with stakeholders on standardization (eg, iRECIST) and prospective validation of novel response criteria that account for atypical response patterns with the potential to more fully capture the clinical benefit of immuno-oncology products.”

As the FDA continues to work, mostly behind the scenes, with investigators and pharmaceutical companies, the real question for many oncologists is when to discontinue immunotherapy on real-world patients. There is, unfortunately, no incontrovertible evidence, no study that randomized patients with different cancers and different immunotherapies to RECIST or irRC treatment guidance and then compared survivals.

Weber, the NYU clinician and investigator, advocates a course that combines traditional measures of progression, the patient’s self-perceived well-being, and a number of plausible alternatives. “You’re probably not going to have as many tests and scans for regular patients as you have for trial patients, but you take what you have and then you talk to patients about how they feel,” Weber said. “If the scans look uniformly terrible, then the treatment probably isn’t working and the patient will probably feel terrible.

Pseudoprogression tends to show up as growth in some lesions and shrinkage elsewhere. Growth everywhere is almost always real progression. If you don’t have any decent alternative treatments, and the patient reports feeling great, then you might continue anyway, just on the very remote chance you have another Liver Guy. But usually it seems best to keep at it with mixed signs and discontinue if you see steady progression.”

References

Eisenhauer EA, Therasse P, Bogaerts J, et al. New response evaluation criteria in solid tumours: revised RECIST guideline (version 1.1). Eur J Cancer. 2009:45(2):228-247. doi: 10.1016/j. ejca.2008.10.026.
Wolchok JD, Hoos A, O’Day S, et al. Guidelines for the evaluation of immune therapy activity in solid tumors: immune-related response criteria. Clin Cancer Res. 2009;15(23): 7412-7420. doi: 10.1158/1078-0432.CCR-09-1624.
Hodi FS, Hwu W-J, Kefford R, et al. Evaluation of immune-related response criteria and RECIST v1.1 in patients with advanced melanoma treated with pembrolizumab. J Clin Oncol. 2016;34(13):1510-1517. doi: 10.1200/JCO.2015.64.0391.