Adding AI Analysis to Targeted Transcriptome Represents Potential Aid for Differential Diagnosis

Kyle Doherty ;

Publication

Article

March 1, 2023

Oncology Live®

Vol. 24/No. 4

Volume 24

Pages: 51

Adding AI Analysis to Targeted Transcriptome Represents Potential Aid for Differential Diagnosis

Author(s):

Kyle Doherty

As artificial intelligence in cancer care continues to be better understood and refined, the number of potential clinical applications is rapidly expanding.

Andrew Pecora, MD

As artificial intelligence (AI) in cancer care continues to be better understood and refined, the number of potential clinical applications is rapidly expanding. An area of interest is combining the efficiency and accuracy of AI with already established clinical techniques to better understand the underlying biology of cancer in order to make quick and effective treatment decisions.

In a study published in The American Journal of Pathology, Zhang et al aimed to assess AI integration with targeted transcriptome for the differential diagnosis of hematologic cancers and solid tumors. They concluded that the approach was useful for the classification and ultimate diagnosis of a variety of cancer types.¹

“Next-generation sequencing [NGS] is able to interrogate not only the DNA of a malignancy, but also its transcription through RNA, giving us insights into how to best treat patients,” Andrew Pecora, MD, chief innovations officer and vice president of Cancer Services at Regional Cancer Care Associates in Hackensack, New Jersey, and a coauthor of the study, said in an interview with OncologyLive®. “Parallel to that, you can’t treat a patient properly unless you know exactly what they have. Many times, pathology is not conclusive as to what is the underlying true diagnosis. We wanted to see whether we could mimic the accuracy of [using] DNA and RNA transcription to make a proper diagnosis. You can’t do that manually because the amount of data you get from a NGS report is enormous—it’s hundreds of millions, if not billions, of pieces of data. A person can never look through that, but machine learning and AI can. We have married information technology at the highest level with genomics at the highest level.”

Between November 2018 and November 2021, investigators collected 5450 bone marrow and formalin-fixed, paraffin-embedded samples and included hematologic neoplasms (n = 2606) and solid tumors (n = 2038). Leukemia, myelodysplasia, and normal tissue samples were collected from fresh bone marrow, and lymphomas and solid tumors were based on formalin-fixed, paraffin-embedded cancer samples. Morphologic analysis, flow cytometry, immunohistochemistry, and molecular profiling of the DNA and RNA were used to confirm the tumor diagnoses within in 72 hours of collection. NGS was performed via a targeted 1408-gene panel. The most common sample diagnoses included lung cancer (n = 794), diffuse large B-cell lymphoma (n = 746), and myelodysplastic syndrome (MDS; n = 316). For comparative purposes, 782 samples of normal bone marrow and 24 normal lymph node samples were collected. In total, 20 hematologic neoplasm and 24 solid tumor subtypes were identified.

Following NGS, data were fed into an AI machine learning algorithm, which used RNA expression data to determine a diagnosis to discern the difference between diagnostic classes. Investigators applied the geometric mean naïve Bayes method, designed to address the numeric underflow that occurs with a high-dimensional problem.

Findings from the study showed that targeted transcriptome in combination with the AI algorithm mostly displayed high accuracy with sensitivity and specificity when differentiating between any 2 disease classes. The highest area under the curve (AUC) was observed when the algorithm was asked to distinguish between sarcoma vs gastrointestinal stromal tumor (1.00; 95% CI, 0.997-1.00), pancreas vs esophageal (0.999; 95% CI, 0.990-1.00), and breast vs colorectal (0.997; 95% CI, 0.991-1.00). For a large majority of comparisons, the AUC was above 0.90.

Overall, the sensitivity and specificity of the algorithm were also very high, with most comparisons boasting a sensitivity of above 90% and a specificity of at least 90%. The highest sensitivity ratings were seen between breast vs ovarian (100%), sarcoma vs ovarian (99.2%), and marginal vs chronic lymphocytic leukemia (98.7%). The highest sensitivity figures occurred when comparing sarcoma vs gastrointestinal stromal tumor (100%), Hodgkin vs lymph node (100%), and pancreas vs esophageal (98.9%).

Conversely, the lowest AUC was observed between normal vs MDS, at 0.831 (95% CI, 0.8010.861). The sensitivity and specificity ratings for this comparison were also low compared with the rest of the study, at 78.1% and 75.3%, respectively. Study authors noted that significant overlap between these 2 specific classes was likely the cause of the decreased reliability.

To determine the efficacy of the approach at distinguishing among multiple classes to achieve a differential diagnosis, investigators used all 1408 biomarkers without selection to determine a ranked score to predict the likelihood of a given diagnostic class. More than 3000 cases were used to train the machine learning algorithm and the approach was subsequently evaluated using 1415 cases.

Differential diagnosis efficacy varied among disease classes, with most disease classifications being accurately diagnosed on the first attempt at least 51% of the time. In acute lymphoblastic leukemia (ALL) all 26 cases were accurately given a first-choice diagnosis with a positive predictive value (PPV) of 84%; among 17 chronic myeloid leukemia cases, none were correctly diagnosed the first time. Other disease classifications receiving the most accurate first choice diagnoses were colorectal cancer (n = 83/101; PPV, 79%), brain cancer (n = 12/16; PPV, 75%), lung cancer (n = 177/201; PPV, 73%), and diffuse large B-cell lymphoma (n = 127/149; PPV, 73%).

“The key fundamental finding is that you can use DNA analysis and RNA transcription to make diagnoses,” Pecora said. “That was not known before, to the extent that we were able to show it. It means that for all of cancer, and possibly in the future other diseases, the ability to make a much more precise diagnosis [that is] much more information content rich...will not only give you a diagnosis, but insights into why the person has a disease, and how best to treat them.”

To further evaluate the diagnostic accuracy of the approach, investigators generalized the diagnostic classes into 5 groups: lymphoid, myeloid, carcinoma (including brain tumors), sarcoma, and normal. Overall, the algorithm was much more effective in correctly placing a given sample into 1 of these general diagnostic categories on the f irst attempt; 84% of cases were correctly given a first-choice diagnosis.

More specifically, carcinoma (n = 427), lymphoid (n = 427), and myeloid (n = 295), cases were given a correct first-choice diagnosis at a rate of 94%, 91%, and 87%, respectively. Sensitivity ratings for these groups were 81% (95% CI, 77%-84%), 77% (95% CI, 72%-81%), and 44% (95% CI, 38%-49%), respectively. The specificity figures were 95% (95% CI, 92%-96%), 88% (95% CI, 86%-90%), and 77% (95% CI, 75%-80%), respectively.

Discussing their findings, study authors wrote that their approach is not intended to replace clinical decision-making by pathologists and clinicians but could be used to aid decision-making and add objectivity, efficiency, and reproducibility to the process. Cited limitations of the research included the exclusion of precise molecular mutations and chromosomal abnormalities in the algorithms, adding that integrating these abnormalities would probably significantly improve prediction accuracy. Other limitations included too few cases in some instances, some samples being in the early stages of disease, and variation in tumor fraction.

Looking ahead, encouraging findings from research such as this could incentivize entities such as diagnostic and biotechnology companies to invest in the space, Pecora said, adding that this could lead to more efficient development of better therapeutics with better outcomes as well as better tools to aid in the proper diagnosis and understanding the genetic reasons a patient has a given disease with unique characteristics.

“We’ve talked a lot over the past decade about 3 fields of science coming together: biologic sciences, material sciences, and computational science,” Pecora said. “When these fields ultimately come together, we’re talking about changing the whole paradigm of how we approach and treat human disease. In fact, if you take it to its natural extension, you might be on the precipice of changing the human experience. [It’s] no different than over a century ago when we first discovered antibiotics. This is that big of a deal.

Download Issue PDF

Articles in this issue