Kentucky Taps Into Data Repositories for an Attack on its Extraordinary Cancer Burden

Open access to unbiased data repositories of molecular and clinical data will be essential to accelerating future discoveries in precision medicine.

Eric B. Durbin, DrPH, MS

Director

Cancer Research Informatics Shared Resource Facility

Markey Cancer Center

University of Kentucky

Kentucky leads the nation in cancer incidence and mortality. Therefore, our state has more to gain than any other from the advances being made in precision medicine and mutation-targeted treatment in a growing variety of cancer types. Immune checkpoint inhibition therapy targeting the PD-1/PD-L1 pathway has moved well beyond melanoma and is now showing promise for treatment in triple-negative breast cancer. Targeted chemotherapy and immunotherapy rely on next-generation sequencing (NGS) to identify the genomic mutations that render tumor cells vulnerable to these approaches. Mounting evidence suggests that research will continue to reveal somatic mutations that may be targetable. Open access to unbiased data repositories of molecular and clinical data will be essential to accelerating future discoveries in precision medicine. Genomic data sharing is a cornerstone of the Cancer Moonshot initiative.

Federally sponsored genomic data repositories are proving indispensable for research. For example, The Cancer Genome Atlas (TCGA) has collected more than 11,000 cases across 33 tumor types and generated a comprehensive dataset describing the molecular changes that occur in cancer. However, regardless of its size, the TCGA is made up of an arbitrary, biased sample of cases that does not represent the tumors of any underlying population. A truly representative repository requires either a random sampling of all cases in a population or, better yet, a census of all sequenced tumors.

We are undertaking an ambitious endeavor to assimilate the genomic test data from all tumors sequenced for patients included in the Kentucky Cancer Registry (KCR). For a number of reasons, central cancer registries are natural sources of unbiased population-based genomic data. First, state laws establish them as public health authorities with the power to collect data for all patients with cancer. Registries collect a standardized comprehensive set of diagnostic, treatment, and outcome data for all patients with cancer.

In addition, data are routinely collected from all hospitals, clinics, and oncology practices that order genomic testing. Genomic data are needed by registries to adequately categorize tumors according to modern standards. Central registries work closely with institutional review boards (IRBs) and have the infrastructure in place to protect patient confidentiality, a necessity for sharing de-identified data for cancer research.

KCR was established by Kentucky state law in 1990 and currently enjoys support from the Centers for Disease Control and Prevention National Program of Cancer Registries (NPCR) and the National Cancer Institute Surveillance, Epidemiology, and End Results (SEER) Program. In May 2018, KCR was awarded a prestigious competitive renewal from SEER, representing an expected $31 million investment over a 10-year period. A stated goal of our SEER proposal was to establish population-based reporting of genomic test data for Kentucky.

To accomplish this ambitious goal, we are building upon the successes achieved in developing a statewide electronic pathology (e-path) reporting system. Implementation of standardized e-path reporting to KCR began in 2004 and has grown to include reporting from at least 54 pathology laboratories that diagnose specimens from Kentucky patients with cancer. As a result, over 10,000 e-path reports are securely transmitted to KCR and evaluated each month. KCR’s population-based e-path repository permits the registry to serve as a virtual tissue repository (VTR), which is used to identify unbiased cohorts of historical tumors for researchers. KCR obtains the formalin-fixed paraffin-embedded tissue specimens from labs for NGS testing.

Population-based electronic reporting of NGS test results stands to transform KCR’s capacity to track the molecular profiles of Kentucky patients in real time. Molecular profiles, e-path reports, and registry data are being integrated to create KCR’s Cancer Research Data Commons (CRDC). Another long-term goal is to incorporate response to treatment, disease progression, and recurrence data to provide better support for predictive modeling. These data will better inform oncologists and molecular tumor boards about treatment outcomes for patients with similar molecular profiles in the registry. Access to the data is strictly governed under appropriate IRB-approved protocols.

Rather than seeking data from individual healthcare providers, KCR is working directly with genomic sequencing laboratories. Foundation Medicine is the f irst provider to transmit genomic test data to KCR. Importantly, in addition to reporting gene mutation data in extensible markup language format, Foundation Medicine is providing KCR with raw data in the form of binary sequence alignment map (BAM) files. The raw data are critical for research. The BAM files contain an array of additional mutation data across each tumor genome that may not be well understood today but may emerge as significant as knowledge evolves. The raw data will permit data mining of historical tumor profiles with new bioinformatics pipelines. Because of their size (about 3 terabytes per report, on average), raw data files present a significant data storage challenge for KCR. We are investing in cloud-based resources to provide adequate storage and computational processing for future needs.

KCR provides access to de-identified summary data from the CRDC through a portal developed with Tableau Software. Registered users may obtain case and mutation counts by selecting a variety of KCR and genomic report criteria. For example, a researcher or clinician can readily identify the number of non—small cell lung cancer tumors that express TP53 and LRP1B mutations. KCR also offers access to record-level mutation and clinical data for IRB-approved studies through an interface developed with LabKey Server.

Other NGS providers working with KCR include the University of Kentucky Genomic Core Laboratory and the Oncology Research Information Exchange Network. Discussions are also underway with Caris Life Sciences, Guardant Health, and Tempus. Our plan is to systemati-cally implement secure data feeds from all providers testing tumors for Kentucky patients. Additional omics datasets we are working to incorporate include metabolomics, pathomics (from digital pathology images), radiomics (from diagnostic radiology images), and proteomics.

These efforts have already paid dividends: New hypoth-eses have been generated about resistance to treatment in breast cancer cases, and there have been improvements in decision making by the molecular tumor board at the University of Kentucky’s Markey Cancer Center. Kentucky’s approach is designed to serve as a model for the SEER and NPCR registries. We believe that by collaborating with other cancer data—sharing initiatives, this open access to data will further accelerate discoveries needed to improve cancer outcomes in Kentucky and beyond.