
At the Joint Genome Institute (JGI), a national user facility at Berkeley Lab, researchers—working closely with Science IT and leveraging high-performance computing (HPC) resources—are developing powerful tools to analyze the vast amounts of data generated by genomic sequencing. The explosion of data—often reaching terabytes per study—requires scalable computing solutions to keep up with the rapid pace of discovery.
For the past two years, the JGI has partnered with Science IT in a deep research collaboration to build an AI-powered model capable of transforming genomic research. This effort goes beyond simply providing computational infrastructure; it involves active collaboration, weekly meetings, and iterative refinement to turn experimental ideas into functional AI tools.
AI Meets High-Performance Computing
One groundbreaking initiative is GenomeOcean, an AI-powered model designed to learn the “natural language” of genomes. Just as ChatGPT can analyze text to predict words and phrases, GenomeOcean reads vast amounts of genomic sequences to uncover patterns and relationships within DNA.
This model was the result of a collaborative effort between researchers from Berkeley Lab and Northwestern University, and it was originally developed using NERSC’s Perlmutter supercomputer, where its large-scale GPU capabilities enabled pretraining. Once the pretraining phase was complete, the Science IT team built the necessary infrastructure to support AI-powered genomic predictions, deploying the model using the LBNL institutional Lawrencium cluster.
How AI is Transforming Genomics
The GenomeOcean model is more than just a tool for reading genetic sequences—it can also be used to write and predict genes. Just like an autocomplete function in text messaging, the model can fill in missing pieces of genetic code based on known patterns. This is particularly valuable for synthetic biology, where scientists design new biological pathways for applications such as sustainable biofuels, pharmaceuticals, and environmental solutions.

For example, researchers at the Joint BioEnergy Institute (JBEI), a DOE Bioenergy Research Center managed by Berkeley Lab, are considering GenomeOcean to improve the design of biological systems. By analyzing massive datasets, the AI model can suggest new gene sequences that enhance productivity and efficiency in engineered biological pathways. This has the potential to accelerate discoveries and reduce trial-and-error in lab experiments.
The Role of Science IT: Beyond Infrastructure
A key aspect of this work, led by Zhong Wang at the JGI, is the deep integration of AI and HPC, made possible through a long-term collaboration between JGI and Science IT. While Science IT traditionally provides IT support, this collaboration highlights its unique role in active research support. Rather than simply offering computational resources, the team has worked alongside researchers to implement AI models, refine proof-of-concept experiments, and troubleshoot challenges through weekly meetings ranging from one to three hours.
“What makes this collaboration unique is not just the technology, but the long-term commitment. Science IT isn’t just providing infrastructure; we’re embedded in the research itself. If solving these large-scale AI challenges takes years, we’ll be there every step of the way, working alongside scientists to refine, improve, and innovate,” said Gary Jung, Head of the Science IT Department at Berkeley Lab.
A crucial aspect of AI-driven genomic research is the computing power required to train and deploy models. While GPUs (graphics processing units) at Science IT are ideal for AI inference (predictions), the scale is not large enough for the pretraining phase, which requires high-performance supercomputers like Perlmutter. This hybrid approach ensures that the GenomeOcean team operates efficiently across different computational environments, balancing raw power with real-world usability.
The Future of AI in Genomics
As more genomes are sequenced, the demand for advanced computational tools continues to grow. The ability to catalog, analyze, and predict genomic functions will be essential for future discoveries in medicine, agriculture, and environmental science. By combining AI, big data, and high-performance computing, scientists are unlocking new possibilities in understanding life itself.
GenomeOcean represents an exciting step toward a future where AI can help scientists read, write, and decode the language of life, making groundbreaking discoveries more accessible than ever before. But equally important is the hybrid collaboration model that makes such projects possible—where Science IT plays a long-term role, not just in supporting infrastructure but in driving scientific breakthroughs.