The use of deep learning in the study of genomics has been limited because published models typically work with fixed data types and are only able to answer one specific question. The process of analyzing genomics data currently begins with the time-consuming steps of formatting and preparing the enormous data sets necessary for deep learning models. This equates to building the needed tools from scratch each time data is swapped out to answer a different question. A new universal programming tool called Janggu streamlines the process so scientists can focus on the biological question at hand, rather than the technical aspects of programming the data.

Janggu helps with data aquisition and evaluation of deep learning models in genomics. Data can be loaded from various standard genomics file formats, including FASTA, BED, BAM and bigWig. The output predictions can be converted back to coverage tracks and exported to bigWig files. Source: Wolfgang KoppJanggu helps with data aquisition and evaluation of deep learning models in genomics. Data can be loaded from various standard genomics file formats, including FASTA, BED, BAM and bigWig. The output predictions can be converted back to coverage tracks and exported to bigWig files. Source: Wolfgang KoppDeveloped by researchers from the Max Delbrueck Center for Molecular Medicine in the Helmholtz Association (MDC), Janggu converts different genomics data types into a universal format. The data can be plugged into any machine learning or deep learning model that uses python, a widely used programming language.

The MDC researchers, led by Dr. Altuna Akalin, head of the Bioinformatics and Omics Data Science research group, were motivated by their own frustrations around the time it took to format data when investigating biology and medicine questions. Separating the data extraction and formatting from the analysis makes it easier to exchange, combine or reuse sections of data. One challenge was balancing flexibility and usability, said Dr. Wolfgang Kopp, an MDC scientist. "If it is too flexible, people will be drowned in different options and it will be difficult to get started."

In addition to the front-end benefits of Janguu, the programming tool includes visualization of results following the deep learning analysis and can evaluate what the model has learned. "One of the most interesting applications is predicting the effect of mutations on gene regulation," Akalin said. "This is exciting because now we can start understanding individual genomes, for instance, we can pinpoint genetic variants that cause regulatory changes, or we can interpret regulatory mutations occurring in tumors."

Kopp created tutorials, sample datasets and case studies to help others use Janggu.

The name Janggu comes from a traditional Korean drum. The drum's shape is that of an hourglass turned on its side. The large sections of the hourglass represent the two areas of Janggu's focus: pre-processing of genomics data and the results visualization and model evaluation. The narrow connector in the middle represents whatever type of deep learning model researchers wish to use.

An article in Nature Communications demonstrates Janggu's versatility, such as its ability to predict binding sites from DNA sequences and chromatin accessibility, as well as for classification and regression tasks.