New software helps create better drugs

Amy J. Born | December 23, 2019

Purdue University researchers have created a new system, called Lemon, for rapid mining of biomolecular interaction data to use with machine learning methods for the design of drugs. Source: Purdue University/Gaurav ChopraA new framework for data mining, known as Lemon and developed at Purdue University, improves machine learning models for the process of drug development. Lemon allows for better mining of the Protein Data Base (PDB) which includes more than 140,000 biomolecular structures.

Machine learning can reduce the amount of time it takes to sort through large amounts of data, but still requires a framework for the computer to quickly analyze it, said Gaurav Chopra, assistant professor of analytical and physical chemistry in Purdue's College of Science. The Lemon software platform, a C++11 library with Python bindings, mines the PDB in about six minutes applying a workflow on an 8-core machine, compared to about 290 minutes to load the traditional mmCIF files. Users can write and develop custom functions in order to generate unique benchmarking datasets that can be accessed by the scientific community.

"We created Lemon as a one-stop-shop to quickly mine the entire data bank and pull out the useful biological information that is key for developing drugs," said Jonathan Fine, a Ph.D. student in chemistry who helped develop the platform.

The name comes from its original purpose which was to create benchmarking sets for drug design software and identify the "lemons," biomolecular interactions that cannot be modeled well in the PDB.

Lemon is available for free on GitHub and detailed documentation can be found here. The work is also published in the journal Bioinformatics.