Folding@home: How Distributed Computing Is Unraveling the Mysteries of Protein Folding

In October 2000, a team of researchers at Stanford University led by Vijay Pande, PhD, debuted a distributed computing project designed to simulate the protein folding process.

In October 2000, a team of researchers at Stanford University led by Vijay Pande, PhD, debuted a distributed computing project designed to simulate the protein folding process. These simulations were developed to complement and aid formal experiments. Dubbed Folding@home (FAH), the project uses autonomous individual computer processors around the world to run complex computing algorithms that would otherwise take a single supercomputer millions of times longer. In its original format, with the help of Adam Beberg’s distributed computing advice, Dr Pande’s algorithms were integrated into a screensaver PC users could download that would run when their machine idled. To date, the project has received computational results from over 4.51 million devices and has over 400,000 active donor clients throughout the world using both desktop computers and PlayStation 3 consoles. This distributed supercomputer is the largest and most efficient of its kind and has contributed to the research of biomedical problems such as Alzheimer’s disease, Parkinson’s disease, cancer, and many cancer-related syndromes. In Dr Pande’s words, “The simulation world now can give us insights that we couldn’t get experimentally and these insights can be taken from the simulations and put directly into the real world.” More than 60 research papers have been written directly from the project’s results. To see the project’s eye-catching simulations, visit the Pande group’s YouTube presence.

Importance of Protein Folding

Proteins are essential to the body’s stability and function—they take shape as enzymes, structural elements, and antibodies. As enzymes, they are the catalysts for the body’s biochemical reactions; as structural components, they are the main building blocks of bones, blood vessels, and muscles; as antibodies, they identify and help eliminate foreign bodies. There are literally thousands of different proteins and each type has its own function. Each potential function not only requires a protein to be composed of a particular sequence of amino acids, the protein must also take a specific shape. Proteins essentially assemble themselves using a process called “folding.” One of the goals of the FAH project is to simulate the protein folding process to better understand how proteins are able to fold so quickly—some as fast as a microsecond—and how they generally fold so reliably. This project also helps to unravel the mysteries of what happens when proteins take shape incorrectly, called “misfolding.” This malformation is particularly destructive because it encourages the misfolding of other proteins, which can result in an aggregation of toxicity. This aggregate material is thought to contribute to diseases such as Alzheimer’s disease, cystic fibrosis, and many cancers. The group’s first cancer-related finding was published in 2005 and was focused on protein 53 (p53). Approximately 50% of cancers involve a mutation in p53, a tumor- suppressing guardian cell. When p53 breaks down or folds incorrectly or too slowly, it is unable to perform its vital function and cells with damaged DNA are neither repaired nor destroyed. This area of the project’s research is aiming to predict the relevant mutations.

The Need for Extraordinary Processing Power

The speed at which proteins fold makes using traditional research methods utilizing a single computer’s simulation capabilities completely unrealistic. It takes about 24 hours to simulate one nanosecond of the process, but proteins fold on the microsecond time scale; thus, it would take 10,000 CPU days (or 30 CPU years) to simulate the folding of one result. Also, proteins often do not transition directly from their unfolded state into their intended folded state. Pande compares the process to parallel parking, where multiple trajectory adjustments and realignments are often necessary, lengthening the time needed to accomplish the goal.

While it may seem counterintuitive that the simulation of a linear process could be distributed over thousands of individual computers, the project’s research methods seem perfectly fitted to its resources. The group has developed multiple ways of simulating the protein folding process, including a method called Markov State Models. This adaptive sampling tool is used to simultaneously simulate multiple stages of the process using a kinetic model built for the specific process. The team determines the series of a process’ states, then the rates between the states. The project’s clients’ machines are used to complete these rate calculations—each individual processor calculates tiny portions of the folding process at a time. Data is periodically collected from each client by a connection to the FAH server to share its work units, in the form of data packets used to perform the calculations. When all of the data are collected, the research team identifies what states are reasonable and initiates further calculation of those states’ rates. This research method’s emphasis on states and clustering is perfectly fitted to the project’s division of labor.

Petascale Computing: The Next Frontier

Petascale computing is a relatively new frontier in the scientific research tool arsenal and overcomes the timescale barrier in the study of molecular dynamics. The processing speed of this type of system is measured in FLOPS, which stands for Floating point Operations Per Second. The FAH computing cluster is measured in PFLOPS (petafl ops), meaning quadrillions (10¹⁵ or 1000 trillion) of operations per second. In December 2009, FAH was the fastest cluster of its kind, reporting over 7.8 petaflops of processing power, with nearly one-third of the processing power contributed just by donor clients running on PlayStation 3 systems. Another large portion of the power was contributed by clients using nVidia and ATI’s GPU-based client designed for the project.

The exceptional efficiency of the distributed computing system is twofold: it saves an extraordinary amount of time and also greatly

reduces operating expenses. The cooling of one centralized supercomputer capable of the sheer volume of these types of calculations would be incredibly expensive, so each client’s contribution is measured not only in additional processing speed, but also in maintenance savings. Besides the satisfaction of contributing to a worthy endeavor, one of the incentives for volunteering is

generated by the project’s point system, which stirs up friendly competition among teams of volunteers and directly benefits the project. Teams can monitor their progress on the project’s Website, which updates the stats every hour. Further, contributing to the project is simple and painless. While the client’s machine idles, it connects to the project’s server and grabs a bit of data to process, works on it, and uploads the results. Because the program is designed to be the lowest priority process on the client’s computer or console, any programs run by the user will have first dibs on CPU time.

Other Distributed Computing Projects

The FAH project is one of a growing number of distributed computing programs that allow users to generate meaningful results with minimal cost and effort using idle computer processing power. Unlike FAH, many of the larger distributed computing projects are powered by BOINC, the Berkeley Open Infrastructure for Network Computing, which was originally created to support the security integrity of the SETI@home project. One such program is Rosetta@home, which also researches protein misfolding diseases, but does so less exclusively than FAH. Taking yet another approach, the Help Conquer Cancer project is dedicated to enhancing the results of protein X-ray crystallography, which is used to identify diagnostic markers and determine which proteins may have a functional relationship with cancer. For more information about the project and team, visit this project’s main site.

Distributed computing technology has been applied to numerous areas of study, including art, mathematics, artificial intelligence, medicine, gaming, and physics. WorldCommunityGrid.org, a resource sponsored by IBM, offers a varied list of research programs such as FightAIDS@home, Discovering Dengue Drugs — Together, and Nutritious Rice for the World, which all use distributed computing technology to advance their varied areas of research. This particular network’s donors can choose to participate in just one of the programs or contribute to several simultaneously. For another extensive list of distributed computing projects.

Next Steps for FAH

Dr Pande sees the future of the project eventually including a way for clients to store information remotely, tentatively called Storage@home. With that much data being amassed, storage capacity needs to scale with the project’s capabilities. The volunteer-based storage system would also significantly cut operational expenses. Just as an example, their backup costs for the data volume they collect in a single year total $25,000.

If you are interested in contributing to FAH, the team suggests downloading and running their free client software. Downloads are available for Windows, Linux, and Mac users. If you own a PlayStation 3 (version 1.6 or later), FAH is included in the “Life with PlayStation” application. Work units have been set to take approximately 8 hours of processing time on the PlayStation 3, which enables the user to run the process overnight and yield a meaningful result. To learn more about the project and how to enable your idle electronic equipment to contribute, visit the project’s Website.