Commentary|Videos|March 27, 2026

Dr Hirsch on Validation of an AI-Assisted PD-L1 Scoring Algorithm Against Blueprint Study Data in Lung Cancer

Fred R. Hirsch, MD, PhD, discusses research validating an AI-assisted PD-L1 scoring algorithm against original Blueprint study data in lung cancer.

“The AI platform is at least [as good] as manual [assessment], in certain cases, better, and when I see better or equal, that [pertains to] the score against the consensus of pathologists.”

Fred R. Hirsch, MD, PhD, professor of medicine, hematology, and medical oncology, and of pathology, molecular and cell based medicine at Icahn School of Medicine at Mount Sinai, discussed research validating an AI-assisted PD-L1 scoring algorithm against original Blueprint study data that were presented at the 2026 European Lung Cancer Congress.

Investigators evaluated the performance of an AI-driven platform for assessing PD-L1 expression in tumor samples, comparing its accuracy with that of expert pathologists. The study builds upon prior work from the Blueprint Phase 2 project, which established a benchmark for PD-L1 immunohistochemistry (IHC) interpretation through consensus scoring by 24 experienced pathologists, Hirsch explained.

In this comparison, the previously generated dataset from Blueprint Phase 2 served as the reference standard, Hirsch said. Each case underwent rigorous manual review, with consensus determinations representing a highly curated gold standard. The AI platform was then applied to the same material, enabling a direct head-to-head evaluation between computational and human interpretation.

Findings from the study demonstrated that the AI platform achieved performance that was at least equivalent to manual assessment and, in certain scenarios, exceeded it. Importantly, equal or better performance was defined by agreement with the consensus scores established by the panel of pathologists, Hirsch noted. In other words, the AI system was not being compared with individual readers, but rather to a highly refined collective judgment, making the results particularly compelling.

Notably, the AI platform showed the most pronounced improvement in cases involving the SP142 assay, a diagnostic test commonly associated with the use of atezolizumab (Tecentriq). This assay has historically posed challenges in interpretation due to variability in staining patterns and scoring complexity, Hirsch stated. The enhanced agreement observed with AI in this subset suggests that computational tools may be especially valuable in standardizing difficult or subjective assessments.

Overall, investigators concluded that AI-driven PD-L1 scoring is not only feasible but may also offer incremental benefits over traditional manual evaluation. By matching or slightly surpassing expert consensus, the platform highlights its potential role in improving reproducibility and consistency in biomarker assessment. These findings underscore a broader shift toward integrating AI into pathology workflows, particularly in areas where diagnostic variability can impact treatment decisions, Hirsch concluded.


Related to this article