Researchers at the Sidney Kimmel Cancer Center at Thomas Jefferson University discover that isomiRs (small non-coding RNA molecules produced in a cell) are powerful enough to classify accurately samples that can belong to 32-different types of cancer. The research was recently published in the journal Nucleic Acids Research
Work in recent years has revealed that microRNAs (miRNAs), small non-coding RNA molecules of ~22 nucleotides exist as multiple variants in a cell. These variants are called ‘isomiRs’ and are produced by cells in a regimented manner. In 2014, Isidore Rigoutsos’s lab was the first to show that in healthy individuals the abundance of isomiRs is modulated by a person’s sex, population origin, and race, indicating that isomiR expression is a type of transcriptomic heterogeneity that contributes to the diversity among humans. In 2015, in follow-up work in breast cancer, the team showed that isomiR abundance is also modulated by tissue state and disease subtype.
“Our early analyses were hinting that, in addition to all the other dependencies, isomiR production had a tissue-specific aspect to it as well,” says Dr. Rigoutsos, Director of the Center for Computational Medicine at Thomas Jefferson University. “This made us think that, perhaps, we might be able to leverage that tissue-dependence to distinguish simultaneously among multiple cancers.”
Widely present in animals, plants, and some viruses, miRNAs are short regulatory RNAs. Since their original discovery in 1993, miRNAs have been shown to be important and very potent regulators of gene expression in numerous cellular contexts. Not surprisingly, many human conditions and diseases have been associated with disruptions of miRNA abundance. For many years, it was believed that each miRNA locus of the genome produced a single miRNA molecule. But as more powerful deep sequencing technologies became commonplace, many research groups started noticing that for each miRNA, multiple variants co-exist in a cell. These variants (isomiRs) generally differ in abundance from one another. At the sequence level, isomiRs of the same miRNA differ very slightly in either their 5´ end, their 3´ end, or both. In the early days following the discovery of isomiRs, most researchers continued to focus on the isoform that had been reported originally in the literature for each miRNA and dismissed all its other variants.
“It helps to think of each miRNA locus on the genome as producing a ‘cloud’ of co-existing isomiRs. Some miRNA loci tend to always produce small clouds (i.e. a small number of different isomiRs). Other miRNA loci tend to produce large clouds that comprise high numbers of co-present isomiRs. When we first compared two different tissues, we noticed that clouds associated with the same miRNA locus differed both in terms of how many and also which isomiRs they contained. We thought that this was a valuable observation that might provide an insight into human disease”, says team leader Rigoutsos, “and set out to determine whether we could leverage it across multiple cancers”.
In their new Nucleic Acids Research study, the team examined the isomiR profiles from more than 10,000 tumor samples and 32 different cancer types from the Cancer Genome Atlas (TCGA) repository. The team used an iterative approach, carrying out multiple rounds of training and testing a multi-label classifier. During the training portion of each round, they randomly chose 60% of the samples available for each cancer type to train the classifier. They then tested the classifier with the remaining 40% of the samples from each cancer type. This training and testing process was repeated a total of 1,000 times. The team showed that they could successfully classify the unseen test samples with an average sensitivity of 90% and a false discovery rate of 3% or less. The team also used their classifier to categorize other non-TCGA datasets that were generated using deep sequencing or microarrays, and was able to label them with analogous accuracy. Notably, the team’s approach explicitly ignores the actual abundance levels of all isomiRs. Instead, an isomiR is simply called ‘present,’ if its abundance places it in the top 20% of the isomiR population; otherwise, it is called ‘absent.’ This seemingly unusual choice has the benefit of making the team’s approach potentially applicable to serum samples where it has proven difficult to identify molecules that can be used to “normalize” abundance.
A somewhat counterintuitive result that emerged from this analysis is that isomiRs with the highest ability to discriminate among different cancers have not been the best studied in the literature. Moreover, the analysis also showed that isomiRs produced by many miRNA loci with proven biological importance in multiple cancers are surprisingly poor cancer biomarkers. Taken together, these results indicate that the pan-cancer biomarker tool offers a new, unbiased approach to classifying cancer samples, which does not depend on earlier research or preconceived notions of how a given cancer arises and progresses.