Machine learning has been demonstrated to markedly accelerate the design of microbes that produce biofuel. The approach devised at the U.S. Department of Energy’s Lawrence Berkeley National Laboratory is faster than current methods used to predict the behavior of pathways, and promises to speed up the development of biomolecules for many applications in addition to commercially viable biofuels, such as drugs that fight antibiotic-resistant infections and crops that withstand drought.
A computer algorithm was developed with abundant data about the proteins and metabolites in a biofuel-producing microbial pathway, but without information about how the pathway actually works. The tool then uses data from previous experiments to learn how the pathway will behave and was used to automatically predict the amount of biofuel produced by pathways that have been added to E. coli cells.
Researchers are investigating ways to re-engineer pathways and import them from one microbe to another to the benefit of medicine, energy, manufacturing and agriculture. With the advent of gene-editing tool CRISPR-Cas9 and other new synthetic biology capabilities, such research can be performed with remarkable precision.
However, it is very difficult to predict how a pathway will behave when it's re-engineered, and the accepted means of predicting a pathway's dynamics requires a maze of differential equations that describe how the components in the system change over time. Months are required to develop these kinetic models, and the resulting predictions don't always match experimental findings.
The machine learning approach uses data to train a computer algorithm to make predictions, allowing scientists to quickly predict the function of a pathway even if its mechanisms are poorly understood — as long as there are enough data to work with.
The technique was tested on pathways added to E. coli cells, including one designed to produce a bio-based jet fuel called limonene and another that produces a gasoline replacement called isopentenol. Previous experiments provided a trove of data related to how different versions of the pathways function in various E. coli strains. Some of the strains have a pathway that produces small amounts of either limonene or isopentenol, while other strains have a version that produces large amounts of the biofuels.
The algorithm used the data to teach itself how the concentrations of metabolites in these pathways change over time, and how much biofuel the pathways produce. It learned these dynamics by analyzing data from the two experimentally known pathways that produce small and large amounts of biofuels.
This knowledge was used to predict the behavior of a third set of "mystery" pathways the algorithm had never seen before. Accurate predictions of the biofuel-production profiles for the mystery pathways were generated and the machine learning-derived results were observed to outperform kinetic models.
The research is published in the journal Nature Systems Biology and Applications.