Berkeley Lab intern, Samantha Alonso tackled a challenging project at the Lab—archiving a massive collection of historical photos by using machine learning (ML) in a new way. She presented the findings in poster form at the SACNAS NDiSTEM conference last month to great excitement. This is an example of the type of opportunities interns have at the Berkeley Lab to problem solve in innovative ways.
Berkeley Lab Photo Archive
Samantha started her Berkeley Lab summer internship program in Summer 2023, through the Workforce Development and Education (WD&E), Community College Internship (CCI) program that she heard about from a friend. After joining, she connected with Thor Swift, lead photographer, Creative Services Office at the Lab, to develop machine learning processes to address the challenges of merging a physical film archive into a digital asset management system.
With over 1.3 million original film images, in a variety of formats, dating back to the Lab’s founding in 1937, Berkeley Lab’s archive offers a unique glimpse into the evolution of atomic energy and other pioneering scientific research. This includes photos of high-profile figures like Dr. Seaborg and Ernest Lawrence, the scientific devices, and the people who built and made the scientific achievements possible.
The task was daunting: working on a subset of the film archive with more than 250,000+ images that needed to be categorized and metatag making them accessible in digital form. Many of the original film transparencies were stored in envelopes with handwritten captions, making the conversion process complex.
Cross Team Collaboration
Knowing that this task would take an enormous amount of time and cost to sort and manually enter the image information for this complex project, her supervisor Tammy Campbell suggested Samantha connect with Fengchen Liu, senior researcher in the Science IT department to look into any AI/ML models that could help read, categorize, and tag the images efficiently. The solution involved optical character recognition (OCR) software, which extracted information from hand or typewritten captions and organized it into a searchable digital archive.
Samantha’s work focused on using ML to automate the otherwise manual process of describing and categorizing the images. As she continued, she faced the challenge of not only handling both color and black-and-white images, which required different techniques, but also efficiently handling the diversity of images within the historical image archive. Fengchen provided guidance, initially giving Samantha a high-level overview and a few coding examples to get started.
Under Fengchen’s guidance, as well as research from articles and existing methodologies, Samantha adapted ML tools to fit the specific needs of Berkeley Lab’s archive. Developing this solution required continuous refinement and feedback from Thor and Fengchen, ultimately resulting in a hierarchical structure capable of accurately categorizing the archive’s diverse set of images.
“Samantha’s project served as a prime example of “team science,” a collaborative approach where diverse experts from different fields work together to solve complex problems,” commented Gary Jung, Science IT Department Head at Berkeley Lab. “Fengchen gave Samantha the tools, and she ran with it, doing an excellent job integrating ML and photography, showing the power of interdisciplinary teamwork to reveal the scientific history of this Laboratory.”
Making an impact as an intern at the Lab and beyond
Samantha’s project highlights the unique opportunities that Berkeley Lab provides to interns. Interns are not only given access to cutting-edge technology but are also encouraged to innovate and take ownership of their projects. In this case, Samantha’s work helped solve a long-standing problem, preserving a critical scientific archive for future generations.
With the success of Samantha’s work, there are plans to continue with the remaining 900,000 black-and-white negatives. Her project’s potential is enormous, and with the support of the IT Creative Services Offices and the ScienceIT team, Samantha’s work might one day be published in a scientific journal, contributing to the larger field of digital asset management.
“The value of a National Laboratory is the power to provide people with the ability to mentor, learn and collaboratively contribute to the scientific discoveries in support of this nation and the world,” said Tammera Campbell, IT Support Services Group Lead. “This is a perfect example of a young person being given the opportunity to excel, solve a problem and reveal the scientific history of our country. Samantha’s efforts support the valuable cause of scientific storytelling, a story to be preserved and not lost.”
Samantha’s experience at Berkeley Lab reflects the power of collaboration in science. Through innovative ML solutions, she has helped safeguard a treasure trove of history, while opening doors for future interns to make their mark in scientific research.
Describing her experience, Samantha said, “Creating this solution was both a learning journey and a fulfilling experience. I feel honored to contribute to preserving the Lab’s history in a way that can help future researchers.”