Computers turn up biotechnology: Anything from faster processors—like this System z10 mainframe processor from IBM(left)—to better software give scientists a closer look at nature to better understand it and to learn how to use it.
A couple of years ago, Merck and the Moffitt Cancer Center in Tampa, Fla., took on a gigantic data task. Working together they set out to build a dynamic database of information on cancer patients. Moffitt began collecting information on normal tissue and tumors—from the center’s Tampa-area clinics and hospitals recruited for this collaboration—that then flows into one of Merck’s global data centers where researchers perform molecular, genomic and proteomic profiling on the samples. "You need the clinical and medical context of the patients associated with these samples, as well as the outcome information on how they responded to specific drugs," says Martin Leach, executive director of basic research and biomarker information technology at Merck. Connecting such a collection of data creates a complex challenge in information technology (IT).
"The challenge," according to Leach, "is to get all of that flowing from multiple hospitals and systems in a standardized way." To accomplish that Leach and his colleagues, plus partners at Moffitt, built a unique information pipeline. Data get encoded at Moffitt and securely transmitted to Merck’s clinical-data repository. "Then, in-house tools integrate this clinical information with molecular-profiling information.
That lets oncologists identify genetic signatures associated with a drug response or the lack of a response," Leach observes. As a result, this information can be used to develop more-specific cancer therapeutics and also aim them at the patients who are the most likely to benefit from particular drugs.
Already, putting IT and computation together increases the ability of biotechnology experts to explore a new realm of possibilities, from pharmaceuticals to disease modeling and beyond. In the future computational tools could even change the fundamental approach to science.
As shown in the Merck–Moffitt collaboration, much of IT’s task involves connectivity. "We must connect scientists and provide them with technology so they work seamlessly with other scientists in China or Japan or at any other site," Leach says. Moreover that connectivity covers a range of forms, including sharing software, providing audio and video conferencing and letting companies integrate silos of data to ask complex questions. Leach describes one possible task: "Give me all of the gene-expression data on specific tumors of a particular size and include information about related samples in our biobank." He adds, "IT must make this easy to ask and provide a rapid response."
Despite the improvements in IT, hurdles remain. "Our needs are still growing," says Ajay Royyuru, head of IBM Research’s Computational Biology Center in Yorktown Heights, N.Y. "We have not reached a point where we have a maturity in needs and solutions." He sees three places where computing must improve to be more useful in biotechnology.
The first is data. "We are generating more data than yesterday, and we will generate more data tomorrow." To handle that, Royyuru wants smart data storage and analytics that are distributed, standardized and that transform raw data into new knowledge. "We are nowhere near maturity here," he says.
At Merck Leach also sees the need for handling more data, and doing it now. He says that Merck’s IT system manages 1.5 to 2 petabytes, which is 1.5 to 2 million gigabytes of data, and it’s growing at a rate of 30 to 40 percent a year. "Next-generation sequencers already generate 4 terabytes in one run," Leach says. "That’s how explosive the data growth is."
Beyond dealing with the growth in data, Royyuru says that the scale of computing also matters. "The magnitude of computing is holding us back. With more power, we’d make more progress." He adds, "Tomorrow’s needs in biotechnology will require 1,000 or 10,000 times today’s peak computing."
Finally Royyuru believes that scientists need to know more about how to use computing to better understand biology. "How can I model disease or the response to a drug?" he asks. "That area is not limited by computing or IT, but by what information we have about the biological system." (See sidebar "Simulating Pig Pandemics.")
The need for more powerful computing seems about as old as computing itself. Whether scientists obtained the latest abacus or the day’s fastest supercomputer, they always wanted the same thing—more.
In some cases a company gets more computing power by building it. That’s been Merck’s strategy. In fact, one of Merck’s machines, an IBM supercomputer, even ended up on the list of the top-500 computers in the world—ranked at 458 on the June 2008 list.
Instead of building one gigantically powerful machine, some researchers opt instead to link lots of smaller ones into clusters or grids. As an example IBM created the World Community Grid. This grid relies on people around the world contributing unused compute cycles from the personal computers on their desks. As of early February 2009 the World Community Grid consisted of nearly 1.2 million computers connected together. Anyone from a public or not-for-profit organization can apply for time on this grid. Some of the ongoing projects include models for increasing rice yields and quality, searches for new drugs for dengue hemorrhagic fever, and studying protein folding. "This is not costing anyone a whole lot," says Royyuru. "It’s taking what is out there and capitalizing on it, making sure that it gets used."
Reaching out even farther, scientists can also select cloud computing, which is a range of computer resources—including processing power and storage—and applications that can be rented. As an example Amazon provides its Elastic Compute Cloud, or EC2. When asked how much computing power or storage a customer can use with this cloud, Peter De Santis, EC2’s general manager, says, "It’s not really limited." He adds, "A customer can get access to supercomputing power. For example, using 10,000 cores is not absurd."
With cloud computing a user can pay for the amount of power or storage needed, and vary that over time—all without buying any hardware. Using the cloud, however, demands some skill, although De Santis says that any IT person can figure out this technology, and it should become increasingly easier to use as it evolves.
One thing that is clear is the growing number of cloud users. For instance Eli Lilly and Company now rents on-demand servers and storage from Amazon’s cloud. In addition scientists from Harvard Medical School’s Laboratory for Personalized Medicine use EC2 and Amazon Simple Storage Service, or S3, to create models and run simulations. Portability adds to the attraction. Amazon’s cloud services can be used anywhere. Scientists at the Max Planck Institute in Munich, Germany, already use it, and De Santis says that there has been interest from around the world, even in developing countries. "All you need to use it is an Internet connection," he says.
To place even more opportunities in its cloud, Amazon added public data sets that can be analyzed with Amazon’s cloud-computing tools. Anyone who wants to upload such a data set for public viewing can do so at no cost.
Amazon is not alone in the cloud. Other companies—such as IBM and Microsoft—also offer cloud-computing resources.
Even with supercomputers and clouds of cores, IT scientists want more, much more. Some things on an IT expert’s wish list sound simple enough, like better search capabilities. "We need some robust tool that will search across databases and repositories," Leach says. "There is still a gap in general search capabilities. Even though Merck is using one of the leading technologies available right now, better ways to organize the output from a search, and improved ways to find what we really want are still needed."
Leach also sees a growing trend toward in silico research. "What if I could compute the entire chemical space?" he asks. He can’t and doesn’t expect to, since that consists of about 1060 molecules. "That’s almost infinity, but what if you could compute the tractable chemistry?" Leach asks. Maybe IT experts and computer scientists could develop some way to model all of the compounds that might fit a certain target. Then, that knowledge could be used for screening molecules in a computer instead of in multiwall plates. (See sidebar "Modeling Antibody Interactions.")
Pushing the in silico opportunities even farther, Royyuru says, "There’s the potential for computation to transform how we ask questions and how we seek answers to them." For instance, models could be used to see how some system might work. Royyuru used such modeling to explore the signaling mechanism that connects p53, a protein, with cancer. Then mechanisms that appear in a model can be tested with traditional biological methods. "That is," Royyuru says, "hypothesis generation through computation could help to hypothesize better experiments."
That scenario completes the range of IT’s breadth—from creating information to transmitting it, and playing some role in every step along the way.
Pharmaceutical companies already use some modeling, for example to determine whether the shape of a potential compound fits with a target in a way that could block a disease. Nonetheless, high-performance computing could push such simulations even farther ahead. "We’re looking at how proteins function," says Ajay Royyuru, head of IBM Research’s computational biology center in Yorktown Heights, N.Y. For instance, proteins called hemagglutinins stick out from the influenza virus and help the virus bind to host cells. "What happens if there is a mutation in this protein?" Royyuru asks.
To find out, Royyuru and his colleagues turn to molecular dynamics. Such simulations can reveal whether a specific mutation in hemagglutinin could disguise the virus from antibodies generated by previous forms of the flu. "We could eventually test mutants ahead of vaccine production," he says, "but we’re not there yet."
In the Midwest of the United States, pigs on four small farms get fed garbage carrying the infectious virus that causes foot-and-mouth disease. To keep this disease from spreading wildly, livestock authorities destroy every pig within one kilometer of the infected farms. Luckily, the experts knew that killing more pigs—out to three or even five kilometers—would not have stopped the spread any faster.
Actually not one pig died in this scenario. This is just an example run on the North American Animal Disease Spread Model, or NAADSM, which was created as a collaboration between the Canadian Food Inspection Agency, Colorado State University, the Ontario Ministry of Agriculture, Food and Rural Affairs, the U.S. Department of Agriculture (USDA) and the University of Guelph. "In the simplest terms," says Kimberly N. Forde-Folle, an analytical epidemiologist with USDA, "NAADSM is a computer program designed to simulate the spread of infectious diseases in livestock and the potential effect of control measures."
Researchers use the NAADSM ahead of trouble, not during it. "After determining the potential pathways of disease introduction, this program can be used to try to determine the consequences of a disease introduced through a certain pathology and species in a certain area," says Tracey Lynn, director of USDA’s Center for Emerging Issues.
In the past, researchers used the NAADSM to model avian influenza in commercial poultry operations. "We developed a scenario that could be used by emergency responders to help prepare for an outbreak," Forde-Folle says.
This program is also being used to study pharmaceutical needs. For example one project will estimate the number of vaccine doses that would be required if foot-and-mouth disease were to enter the United States. The results from these models and others could help a range of users. "A number of outputs from these models are useful for policy makers," Lynn says. "It could provide an estimation of how long an outbreak might last, how many animals might be affected, how many might die." She adds, "Knowing those estimates can help planners think through all of the potential impacts that might happen and what contingency planning they can do."
For anyone interested in putting this program to work, it’s available for free (www.naadsm.org).