Machine Learning Develops Into a Powerful Tool in Cancer Care

OncologyLiveVol. 23/No. 13
Volume 23
Issue 13

The expansion of technology includes machine learning, which is quickly emerging as a dynamic aid for clinicians in every aspect of cancer care, from diagnosis to treatment decisions.

John F.  McDonald, PhD

John F. McDonald, PhD

Technology has rapidly expanded to touch nearly every aspect of health care. This expansion includes machine learning technology, which is quickly emerging as a dynamic aid for clinicians in every aspect of cancer care, from diagnosis to treatment decisions.

Machine learning is a branch of artificial intelligence made up of algorithms that can make predictions based on exposure to data. The capabilities of machine learning models are continually being fine-tuned and improved as computing power and the amount of available data continue to quickly advance.1

“One way to make predictions, not just in science but anywhere, is through correlations,” John F. McDonald, PhD, said in an interview with OncologyLive®. “[Machine learning technology] can look for correlations between a molecular biomarker and whether a patient has cancer or not. The same thing applies to [determining] what the optimal therapy is for an individual patient. You can examine the genomic profiles of the tumors of patients who have responded well to chemotherapy vs the genomic profiles of individuals who have not, looking for correlations. Machine learning is looking for correlations embedded within very huge data sets.”

McDonald is a professor in the Parker H. Petit Institute for Bioengineering and Bioscience at Georgia Tech University in Atlanta.

McDonald went on to explain that machine learning models work by analyzing features, which are the individual components of interest that are predetermined by the computer scientist who created the model. Examples of features used in oncologic machine learning models include metabolic profiles, genes, and bases of DNA, McDonald explained.

There are several different computational approaches that can be used in a machine learning- based model to aid in making predictions and treatment decisions in cancer care, including linear models, decision trees, and ensemble methods.2

Linear models determine risk using a weighted linear combination of patient features. Decision trees divide the feature space into subgroups with similar outcome predictions and the ensemble approach generates predictions using several individual trees.

Applying Machine Learning Techniques to Community Practice

Machine learning will never completely replace the insights and experience of a human clinician, but it is increasingly becoming a valuable asset for clinicians in several areas of oncology. The approach has been tested in many aspects of cancer care and has produced impressive results that warrant further investigation and adoption by clinicians.

McDonald was a part of a team of investigators who evaluated a machine learning-based model known as the ‘ensemble-based machine-learning algorithm (ELAFT).’ The system combined 4 algorithm types to make single predictions.3

In the study, investigators sought to predict which of 7 chemotherapeutic agents would be the most effective therapy for a given cancer type. The predictive model was applied to 15 cancer types using data from 499 independent cell lines. The chemotherapies included were carboplatin, cisplatin, docetaxel, doxorubicin, gefitinib, gemcitabine, and paclitaxel.3

The model was evaluated based on accuracy, precision, and recall. Accuracy was defined as the fraction of cancer cell lines the algorithm correctly predicted a response for compared with the actual response to the agent observed in other studies. Precision was determined by the fraction of cell lines the model predicted as sensitive to a given agent divided by all cell lines it predicted as sensitive. Recall was calculated by determining the proportion of cell lines the model accurately predicted as sensitive to response to an agent among all the cell lines that were sensitive.

The model achieved an accuracy of 92.38% across cancer types. Recall was also very high at 98.21% and the precision figure was 85.52%. Investigators validated the results by comparing them to a clinical data set of the 7 chemotherapeutic agents, either alone or in combination, in 23 patients with ovarian cancer. The ovarian cancer data were provided by the Ovarian Cancer Institute.

“This kind of machine learning approach could be very useful in generating a decision tool to help physicians,” McDonald said. “Even though it is not going to necessarily be 100% accurate; it is only as good as the data set you put in it. At this point in time, we do not fully understand the cause-and-effect relationships in cancer. I see machine learning as an interim solution that could give us some predictive accuracy, even if we do not understand the underlying cause-and-effect relationships.”

Machine learning technology has also been used to effectively predict the metastatic status of patients with cancer. Green et al recently presented data showing that the ensemble machine learning–based predictive model XGBoost significantly outperformed an approach that relied solely on the International Classification of Diseases, 10th Revision, Clinical Modification (ICD-10-CM) metastasis diagnosis codes in terms of predicting the metastatic status of patients with breast cancer or lung cancer.

Investigators used data from 28,043 patients, pooled from certified tumor register abstraction and electronic health records of 2 large health systems. The patients were split into training (n = 22,434) and testing (n = 5609) groups. XGBoost was then ‘trained’ by analyzing approximately 750 features of the patient data in the training group, which were collected at the time of diagnosis or after.

For patients with breast cancer who were found to have a negative metastatic status, the predictive model achieved 97% precision and 98% recall compared with 91% and 94%, respectively with the ICD-10-CM diagnosis codes. Among patients with lung cancer with a negative metastatic status, the precision and recall were 83% and 90% vs 66% and 93% with the machine learning model and ICD-10-CM codes, respectively (TABLE4).

Patients with breast cancer who were found to have a positive metastatic status experienced a significant benefit in precision and recall with the machine learning model compared with the ICD-10-CM approach; 90% vs 68% and 86% vs 58%, respectively. Benefits with XGBoost were also seen among patients with lung cancer with a positive metastatic status: Precision and recall were 89% and 81%, respectively, compared with 87% and 52% achieved with the ICD-10-CM approach.

Study authors concluded that the approach could be applied to other cancer types and is scalable. In the future, the model could also be applied to smaller, time-sensitive pieces of data that could be used by clinicians to help determine if and when a patient’s metastatic status has changed.

Additionally, mortality prediction has become an area of interest for investigators looking to apply machine learning methods. Ye et al implemented a machine learning model into community practice setting capable of predicting the 90-day mortality risk of patients with metastatic cancer. Clinicians then used these predictions to attempt to facilitate advance care planning earlier than when it would have occurred without the aid of the tool, ultimately increasing prognostic awareness and hospice enrollment.5

The study evaluated the model in 889 patients from 5 participating practices. Seven practices comprising 774 patients included in the study did not implement the machine learning tool and served as the control arm. The median age in both arms of the study was 74 years and the median metastatic lengths were approximately the same at 249 and 250 days, respectively.

Patients in the 5 participating practices were scored every 2 weeks to gauge mortality risk. The machine learning system then used this data, as well as additional data from electronic medical records and historical claims, to evaluate the need for advance care planning in each patient.

The machine learning tool was tested over an 11-month span and was able to generate 10,910 unique predictions from each patient, including 1663 high-risk predictions.

The weighted mean of advance care planning use was 34.4% (range, 19.4%-55.8%) in the participating practices compared with 14.0% (range, 7.4%-31.0%) in the nonparticipating practices, which was a 2.5-fold change in favor of the intervention (P = .03).

Investigators noted that implementation of the tool is an ongoing effort and that more practices have already planned to adopt it.

They also wrote that the intervention’s impact on hospice enrollment, emergency department visits, and hospital admission will be evaluated in future studies.

Challenges Remain Despite Some Robust Findings

McDonald noted that machine learning in cancer care is still very much in its infancy and significant barriers remain to the widespread adoption of the approach. One of the major challenges is the availability of reliable data needed to make accurate predictions. Data must be accurately entered into electronic health records and these records must have the capability to be easily shared on a large scale to ensure maximum accuracy and efficacy of machine learning approaches. Although small data sets can be useful, machine learning is more effective in finding correlations in large sets.1,2

Additionally, clinicians cannot assume that predictions obtained with machine learning are always 100% accurate. At times, the models can identify patterns and subgroups within data where there is no clear predictive outcome. Predictions should always be reviewed and double-checked for clinical validity.1,2

The amount of information required in electronic health records to make machine learning effective could also become burdensome for clinicians. Delegation of administrative responsibilities will be key to prevent burnout in this area.1

Machine learning may also present a substantial learning curve for some clinicians, although the technology is increasingly becoming more integrated and intuitive. However, the benefits of machine learning approaches will likely outweigh the shortcomings and it will continue to emerge as a powerful complementary tool in cancer care.

“[Machine learning] is going to get better as we get more and more data,” McDonald concluded. “As it gets better, clinicians will start adopting it and utilizing it. As it becomes clear that [machine learning] is very accurate, it could become a decision tool [that is used] right off the bat. But that is going to take a while. That is not going to happen right away.”


  1. Nagy M, Radakovich N, Nazha A. Machine learning in oncology: what should clinicians know? JCO Clinical Cancer Informatics. 2020;4:799810. doi:10.1200/CCI.20.00049
  2. Bertsimas D, Wiberg H. Machine learning in oncology: methods, applications, and challenges. JCO Clinical Cancer Informatics. 2020;4:885-894. doi:10.1200/CCI.20.00072
  3. Lanka J, Housley SN, Benigno BB, McDonald JF. ELAFT: an ensemble-based machine-learning algorithm that predicts anti-cancer drug responses with high accuracy. J Onc Res. 2021;4(1):1-11. doi:10.31829-2637-6148.jor2021-4(1)-111
  4. Green F, Huang HT, Lerman M, et al. Using machine learning on real-world data to predict metastatic status. J Clin Oncol. 2022;40(suppl 16):1550. doi:10.1200/JCO.2022.40.16_suppl.1550
  5. Ye P, Butler BM, Vo DM, et al. The initial outcome of deploying a mortality prediction tool at community oncology practices. J Clin Oncol. 2022;40(suppl 16):1521. doi:10.1200/JCO.2022.40.16_suppl.1521
Related Videos