Press Release

Article

AI Model Learns Generalized 'Language' of Regulatory Genomics, Predicts Cellular Stories

Key Takeaways

  • EpiBERT, an AI model, predicts gene expression by analyzing genomic sequences and chromatin accessibility across various human cell types.
  • The model identifies regulatory elements and their influence on gene expression, creating a generalizable "grammar" for cellular processes.
SHOW MORE

The model, called EpiBERT, was inspired by BERT, a deep learning model designed to understand and generate human-like language.

Bradley Bernstein, MD, PhD

Bradley Bernstein, MD, PhD

A team of investigators from Dana-Farber Cancer Institute, The Broad Institute of MIT and Harvard, Google, and Columbia University have created an artificial intelligence model that can predict which genes are expressed in any type of human cell. The model, called EpiBERT, was inspired by BERT, a deep learning model designed to understand and generate human-like language. EpiBERT was trained on data from hundreds of human cell types in multiple phases. It was fed the genomic sequence, which is 3 billion base pairs long, along with maps of chromatin accessibility that inform which of these sequences are unwound from the chromosome and read by the cell. The model was first trained to learn the relationship between DNA sequence and chromatin accessibility across large chunks of the genome in a specific cell type. It then uses these learned relationships to predict which genes were active in the corresponding cell type. It accurately identified regulatory elements – parts of the genome recognized by transcription factors – and their influence on gene expression across many cell types, building a “grammar” that is generalizable and predictable. This grammar-building process can be likened to the way a large language model, such as ChatGPT, learns to build meaningful sentences and paragraphs from many examples of text. The EpiBERT model can process accessibility and predict functional bases as well as RNA expression for a never-before-seen cell type.

Significance: Every cell in the body has the same genome sequence, so the difference between two types of cells is not the genes in the genome, but which genes are turned on, when, and how much. Approximately 20% of the genome codes for regulatory elements determine which genes are turned on, but very little is known about where those codes are in the genome, what their instructions look like, or how mutations affect function in a cell. EpiBERT will shed light on how genes are regulated in cells and, potentially, how that cell’s regulatory system can be mutated in ways that lead to diseases such as cancer.

Funding: The Broad Institute, the Novo Nordisk Foundation, the National Genome Research Institute, the Sharf Green Cancer Research Fund, the Richard and Nancy Lubin Family, and the American Cancer Society. Tensor Processing Unit (TPU) access and support provided by Google.

Newsletter

Stay up to date on the most recent and practice-changing oncology data

Related Videos
Elizabeth Lee, MD, a gynecologic oncologist and the gynecologic oncology program's liaison to the Center for Cancer Therapeutics Innovation at Dana-Farber Cancer Institute
Daniel J. DeAngelo, MD, PhD, chief of the Division of Leukemia and an institute physician at Dana-Farber Cancer Institute, as well as a professor of medicine at Harvard Medical School
Lakshmi Nayak, MD
Eric S. Winer, MD, clinical director, Adult Leukemia, Dana-Farber Cancer Institute; and assistant professor, medicine, Harvard Medical School
Toni Choueiri, MD, director, Lank Center for Genitourinary Oncology, co-leader, kidney cancer program, Dana-Farber Cancer Institute; Jerome and Nancy Kohlberg Chair, professor, medicine, Harvard Medical School
Daniel DeAngelo, MD, PhD, discusses how the shift away from chemotherapy has affected the management of chronic lymphocytic leukemia.
Bradley McGregor, MD,
Bradley McGregor, MD,
Daniel DeAngelo, MD, PhD, discusses how the shift away from chemotherapy has affected the management of chronic lymphocytic leukemia.
Daniel DeAngelo, MD, PhD, discusses how the shift away from chemotherapy has affected the management of chronic lymphocytic leukemia.