IBM's Watson Achieves High Concordance in Tumor Board Test

Author(s):

The artificial intelligence computer program Watson for Oncology (WFO) achieved a high degree of concordance with tumor board recommendations in a double-blinded validation study in Bengaluru, India, according to results presented at the 2016 San Antonio Breast Cancer Symposium (SABCS).

SP Somashekhar, MBBS, MS, MCH, FRCS

In the study of cases involving 638 patients with breast cancer treated at Manipal Comprehensive Cancer Center, 90% of WFO’s recommendations for standard treatment (REC) or consideration (FC) were concordant with the recommendations of the tumor board. A group of 12-15 oncologists met weekly to review cases, entered data into the WFO system, and then analyzed the degree of concordance between WFO’s recommendations and those of the tumor board, as well as the time it took the oncologists to generate their recommendations.

The degree of concordance varied according to the type of breast cancer, lead study author SP Somashekhar, MBBS, MS, MCH, FRCS, said in his presentation at SABCS. WFO recommendations were concordant nearly 80% of the time in nonmetastatic disease, but only 45% of the time in metastatic cases. In cases of triple-negative breast cancer, WFO agreed with physicians 68% of the time, but in HER2/neu-negative cases, WFO’s recommendations matched the physicians’ recommendations only 35% of the time.

In cases of discordance between WFO’s recommendations and those of the tumor board, tumor board decisions were changed 63% of the time (n = 100) following review.

The study’s authors concluded that the broader divergence between WFO’s recommendations and those of the tumor board could be attributed to the greater number of treatment options available for patients with HER2/neu-negative breast cancer.

“Including HER2/neu cases opens up many more treatments and variables for consideration,” Somashekhar, chairman of the Manipal Comprehensive Cancer Center, explained. “This increases demands on human thinking capacity. More complicated cases led to more divergent opinions on the recommended treatment.”

Physicians took longer to weigh the available treatment options and come to a recommendation compared to WFO, although the doctors were able to work faster as they gained familiarity with cases. Somashekhar said it took doctors an average of 20 minutes initially. As they improved, the mean time dropped to about 12 minutes. By comparison, WFO achieved a median time of 40 seconds to capture and analyze data and give a treatment recommendation.

The study authors said that although WFO recommendations often led the tumor board to reconsider their decisions, the computer program remains a support tool for physicians and cannot replace the “human touch” needed to act upon the many factors of patient engagement that go beyond data analysis.

“We are dealing with human beings, and the context and preferences of each individual patient, the patient—physician relationship, and the human touch and empathy are very important,” Somashekhar said. “It’s always going to be the decision of the treating oncologist and patient to determine what is truly the best option for the patient.”

One reason why physicians do not have to worry about Watson replacing them is that they can perform better at 1-on-1 assessments. For example, whereas in metastatic disease, Watson tended to recommend conservatively based on best available evidence, physicians were more likely to select an aggressive chemo regimen to achieve a high level of response, Somashekhar said. This explains some of the discordance in recommendations, he added.

In the study, WFO analyzed >100 patient attributes for breast cancer and provided a ranking of treatment options according to REC, FC, and “not recommended” (N-REC). The recommendations were backed by data from recent trials, and oncologists were able to click on options listed by Watson to find out more about the recommendations and the reasons for them. The cases were at most 3 years old.

Somashekhar said the study was not designed to evaluate why differences in recommendations occurred, the inferiority or superiority of recommendations, or the impact of WFO on workflow. He said WFO, developed by IBM, is a promising tool that warrants consideration in a variety of other clinical settings and study designs.

Doctors at Memorial Sloan Kettering Cancer Center (MSK) helped to program WFO to enable it to make recommendations on cancer treatment. The system extracts and assesses large amounts of structured and unstructured data from medical records through natural language processing and machine learning. In addition to breast tumors, it is also capable of making recommendations for lung and colorectal cancers.

WFO’s concordance with MSK oncologists’ opinions has been tested in 2 previous studies, showing agreement 90% of the time in one and 50% of the time in another. Doctors in Thailand have been using the system for more than a year, and IBM announced this past summer that it was expanding the program to China, where it was expected to be of high value to doctors in rural centers who don’t have access to resources available to doctors in centralized clinics.

Watson, which has been in use in the Manipal hospital system for 6 months, has proven valuable in controlling cancer clinic costs, because it helps to eliminate bias and errors. “This is something that would ensure that we arrive at the right decision first,” Somashekhar said.

Somashekhar SP. Double blinded validation study to assess performance of IBM artificial intelligence platform Watson for oncology in comparison with Manipal multidisciplinary tumor board—first study of 638 breast cancer cases. Presented at: San Antonio Breast Cancer Symposium, Friday, Dec. 9, 2016; San Antonio, TX. Abstract S6-07

<<<