The Promise of Big Data Gets Closer to Fulfillment

Oncology Business News®, October 2014, Volume 3, Issue 5

It seems like big data is everywhere these days. It's being touted as a solution for physicians and practices who would like to participate in Accountable Care Organizations (ACOs) and accountable care-like contracts with payers.

Anne-Marie Meyer,PhD

It seems like big data is everywhere these days. It’s being touted as a solution for physicians and practices who would like to participate in Accountable Care Organizations (ACOs) and accountable care-like contracts with payers. But what is it and how can it help practicing oncologists and hematologists?

Big data is a term used to describe both structured and unstructured data that can be obtained from multiple sources: patient records, insurer databases, physician offices, hospitals, Medicare/Medicaid records, and even personal sensors and smartphones. The data that are collected can be used to help physicians make more intelligent decisions about quality, treatment protocols, reimbursement, and even improve their own practice’s efficiencies. With the advances in information technology and data computing, population-based research can take advantage of these linked databases that can lead to discoveries and insights.1

Big data promises “to deliver better research at the population level for comparative effectiveness and outcomes research—and to better understand issues concerning disparities in care, access, and burden of disease,” said Anne-Marie Meyer, PhD, an assistant professor in the department of epidemiology at the Gillings School of Global Public Health at the University of North Carolina. Meyer is the faculty director of the North Carolina’s Integrated Cancer Information and Surveillance System (ICISS).

ICISS assembles, links, and harmonizes big data to facilitate high impact, cancer-focused research that spans all facets of the cancer continuum by relating data sets, systems, and methods. One advantage big data may have over other forms of data collections—for example, clinical trials—is access. Meyer points out that patients who are enrolled in a clinical trial have access to care. “These are patients who are already interfacing with the health care system in a very high-quality way,” said Meyer.

With big data, however, researchers can “look at populations who are not represented in clinical trials and who may not be represented in cohort studies or other types of clinical or epidemiologic design studies,” she emphasized.

Peter Yu, MD, FACP, FASCO, a practicing hematologist/ oncologist at the Palo Alto Medical Foundation in California and president of the American Society of Clinical Oncology (ASCO) concurs.

“We know that only 3% of cancer patients go on clinical trials and hence that data is nonrepresentative of the entire cancer population,” he said. “For example, clinical trials tend to draw a younger patient population with fewer comorbid medical problems. By studying how closely the findings of clinical trials are replicated in real world patients, we will be better able to determine the true value of treatments and understand what is the best way to adapt those treatments, potentially leading to better overall results,” said Yu.


In today’s healthcare reform environment, practicing oncologists and hematologists are being asked to document that they are delivering quality care to their patients.

“In order to provide and demonstrate value-based care, oncologists need to understand and have access to data that are not currently available, said Bobby Green, MD, vice president of oncology for Flatiron Health, a health care technology company based in New York City.

The problem with these data is that it exists in different forms, is often unstructured, and is stored in different places in the treatment continuum. Some reside in a practice’s electronic medical record or billing system, some in a hospital system, and some with the payers.

“All this information is stored in silos,” said Green. “It’s in all these different areas and sometimes a lot of it sits in the unstructured notes of the physician, making it difficult to collect and analyze.” Green said that the silos have no inherent reason to share, and that it’s very important when sharing data to strictly adhere to patient privacy laws, but that this creates barriers. “That’s the landscape that we have to navigate. So it’s much easier to share what you bought on Amazon, rather than any part of a patient’s medical records—and appropriately so,” he added. “But we have to find a way to aggregate the data, and at the same time, respect patient privacy.”

It’s an overwhelming challenge to collect and analyze big data. But finding a way to use it can be a powerful tool on many levels: clinical, operational, and financial.

“For clinicians, if they are going to be making clinical decisions, they will want to know the source of the data, if it’s complete, if it’s trustworthy, and if it’s accurate,” said Sean Hogan, vice president of health care at IBM.

For the benefits of big data to be recognized, information needs to flow freely, so that the patient sees his data, and his physician can see that information. “But we want to make sure that the data are secure and privacy is not violated,” said Hogan. “We are just on the edge of allowing systems to talk or connect to one another,” adds Hogan.

Improving Practice Efficiencies

Big data clearly has the potential to benefit patient care, but it can also benefit the individual practice by improving practice efficiencies from an operational standpoint said Green. “There is a lot of consolidation of small practices into larger hospital systems” said Green. “We want to enable practices to use data to provide business intelligence and insight into how they are functioning from a financial and operational standpoint. We view big data and analytics as a tool that helps practices thrive and survive.”

The clinical information collected can also be used to improve best practices, as well as practice operations, said Hogan. An infusion center might have streams of data coming in through patient monitors. The data stream in without being populated in a database. In effect the provider can review the data “on the fly, as it comes in,” Hogan said. A physician could be alerted to any complications immediately, and a therapeutic plan of action can be quickly formulated and put into effect.

“The data could be used from an operational standpoint also,” said Hogan. “The data could provide insight into how to adequately staff the center,” he said. “So this is an operational question that uses big data to come up with an answer.”

Health systems, insurers, and private practices are being asked to manage populations at risk. “So the question is ‘Where is the best place to deploy your limited resources?’” Hogan asked. In this instance, big data can shed some light, provide guidance, and business intelligence.

4 Examples

A mix of organizations and businesses are proposing platforms that can aggregate this data, provide insights to help clinicians, and uncover patterns to improve care.

A brief overview of 4 examples from the American Society of Clinical Oncology’s CancerLinQ initiative, Cancer Outcomes Tracking and Analysis (COTA), Flatiron Health’s OncoAnalytics, and IBM’s Watson follows.

There are three issues that characterize big data— volume, variety, and velocity—said Yu, the ASCO president. Before the emergence of big data, the mechanisms used to collect information were static— think registries or individual patient records.

ASCO’s platform, called CancerLinQ, is designed to “accept data from any electronic health record regardless of which vendor was used to create it,” said Yu. He cautions that it is not a registry, in which specific pieces of data are reported to the registry, like a patient’s histology, or staging.

With CancerLinQ “the entire health record, both structured and unstructured data, is absorbed,” he said. CancerlinQ collects all the data from all the sources in real time. The data are not collected once a month or once a year and then compiled.

“CancerLinQ will evolve in multiple versions that ASCO plans on updating every 3 to 6 months, with an expected initial launch in 2015,” said Yu.

Cancer Outcomes, Tracking and Analysis

In the interest of full disclosure, Andrew Pecora, MD, is executive chairman and founder of Cancer Outcomes, Tracking and Analysis (COTA) and also editor-in-chief of Oncology Business Management. COTA is also a real-time, cloud-based platform that provides point-of-service clinical and financial information to the oncologist, after sorting patients by clinical and molecular phenotyping. Once sorted and categorized, COTA can analyze the data for a variety of relevant and actionable clinical and cost outcomes.

As a platform built by oncologists and hematologists, COTA will enable clinicians to group patient characteristics with much greater specificity so the appropriate care can be delivered to each patient and oncologists can enter into value-based reimbursement models.


Flatiron Health’s founders come from a technology, not a medical background, and approached the problem of big data from that standpoint. They noticed that most cancer centers, physicians, and researchers did not have access to the most basic of data and analytical tools that other industries take for granted, and they decided to do something.

“Our mission is to take the world’s oncology data, organize it, and make it useful for patients, doctors, life sciences, and researchers,” said Green. Flatiron is helping oncologists understand “their patient population, how patients in a given scenario are treated elsewhere, and what the outcomes are,” said Green. The goal is to leverage the data that sits in these disparate systems to make it readily available to the oncologist so that he can practice better and more value-based medicine.

“Historically, no one has approached using the data from this angle,” said Green. “Technology has focused on health from the physician documentation standpoint for billing purposes, not necessarily focusing on patient care and making the delivery of care better,” he continued.


At its core, IBM’s Watson approaches problems using contextual language and deals with probability and uncertainty, said Hogan. It approaches problems, not with a definitive answer, but rather an array of answers and provides a confidence level for each possible answer.

In many ways, it works much like a clinician who is searching for a diagnosis but only is presented with a partial list of symptoms. From those symptoms a clinician has to piece together the whole picture, Hogan said.

In addition to clinical trial data collected, literature searches, and EMRs, Watson also gathers institution-specific knowledge from cancer centers like Memorial Sloan Kettering or the University of Texas MD Anderson Cancer Center, said Hogan.

“Watson can pull all that information together in a point-of-care concept,” Hogan said.

Progress Made, But Concerns Remain

Proponents of big data say it is here to stay, and 2015 will be a year of significant strides in the healthcare IT arena. But there’s still plenty of progress that needs to be made, so keep in mind that there are shortfalls associated with this tool.

For one, tools that are based on big data can be gamed, and the results of big data analysis may be less robust. Consider Google’s Flu Trends—once a poster child for big data. In 2009, Google reported that by analyzing flu-related search queries, it had been able to detect the spread of the flu as accurately as and more quickly than the Centers for Disease Control and Prevention. A few years later, though, Google Flu Trends began to falter. In the past 2 years, it has made more bad predictions than good ones. Also, the Google search engine constantly changes, so patterns in data collected at one time do not necessarily apply to data collected at another. Another worry is the risk of too many correlations.

Statisticians point out that if you examine 2 variables 100 times, there is a chance of identifying 5 bogus correlations between the variables that can appear statistically significant. This requires careful analysis because the magnitude of big data can amplify these types of errors.

Meyer also points out that data sets also have inherent strengths and limitations. “The sensitivity and specificity of every variable in every table could have a different intent initially when compared to how we are going to use it now,” said Meyer. It’s a problem of primary data collection versus secondary analysis or use she said. “Data could be collected for one purpose, but down the line, it could be analyzed for another purpose, other than the intent that it was initially collected. It’s important to incorporate that complexity,” she said.

As progress is made in this field, there are still questions remaining. Perhaps the biggest one is can big data deliver on all its promises? For now, the answer remains to be seen.


  1. Ward JS, Barker A. Undefined by data: a survey of big data definitions. Published September 20, 2013.