Working at the ALS generates huge amounts of data, and for many years this has caused users to have to carry hard drives and USB drives between the ALS and their home institutions for acquisition and analysis of experimental data. To avoid the physical transport of data and to make real-time analysis possible, staff at the ALS, ESnet, and Berkeley Lab’s IT Division have collaborated to implement several best practices that allow the fast and secure transfer of data over the network to a users home institution. A case study, performed by ESNet, highlights the work of IT Division staff, Susan James, Yong Qin, and Karen Fernsler to build the Data Transfer Node and 10GBE network, integrate it with the data acquisition system and implement the Globus Online data transfer tools. The end result shows the improved workflow and data export for the x-ray tomography beamline.
Setting Up and Implementing Network Data Transfer
For researchers planning to use network data transfer, the following resources are available for assistance in setting up and implementing the workflow:
- To speak with a beamline scientist who has implemented the tools described below, contact Dula Parkinson.
- To obtain and use the best equipment to build a Data Transfer Node (DTN) or for software tools such as Globus Online, contact the High Performance Computing Services Group by sending email to hpcshelp@lbl.gov
- To connect your beamline to the Lab’s fast ScienceDMZ network, or to debug networking issues at LBNL, contact lblnet@lbl.gov
- To debug national network issues, or to find contact information for offsite campus or IT groups, contact engage@es.net
To Achieve Faster Data Transfer
There are three main points for users and system administrators to consider:
1) Using the right file transfer tools
Instead of FTP or scp, use tools that have been designed specifically for high-speed data transfer. We recommendGridFTP or Globus Online. GridFTP is good if you want to automate transfers, but requires significant setup. Globus Online has a graphical user interface and is easy to use. Using a fast transfer tool is the simplest thing you can do to increase data transfer speeds. LBNL extensively uses both of these transfer tools and provides an overview from the 2014 LabTech workshop, with information on how to get additional help.
2) Using capable file transfer servers
Data can only be transferred as fast as it can be read from the source disk and written to the destination disk. Most systems aren’t tuned for high speed data transfer out of the box. Systems tuned for high speed data transfer are called Data Transfer Nodes (DTNs). Beamline 8.3.2 has recently implemented such a DTN based on the reference specification provided by ESnet, which, along with a new network designed by ESnet and LBLnet, has resulted in a more than 10-fold improvement in data transfer speeds.
3) Ensuring that the end-to-end network isn’t the bottleneck
If you are using fast data transfer tools between two fast data transfer nodes, the final thing to ensure is that the end-to-end network is not impeding the transfer. This becomes even more important over long distances. The need to resend just a small amount of data can dramatically increase transfer times. Unfortunately, this can also be the most complicated area to understand and correct. There are three main areas to consider:
Use capable network switches
For big, long distance data transfers, packet loss is a significant problem. Network switches (sometimes called hubs) are a notorious cause of retransmitted data. This can happen when there are several network connections on one side of the switch that share a single connection on the other side. In this case it’s important to have switches with enough memory to store packets from one connection long enough to allow the packets from other connections to move through the switch. LBNL or home institution networking professionals can recommend good switches for your environment and scientific application.
Avoid firewalls
Firewalls are a common device used to secure networks. Because they generally look at every packet that flows through them, they can create bottlenecks for big science data transfers. There is a secure, alternate approach to using firewalls commonly referred as the ScienceDMZ. It works by establishing a fast, dedicated, but secure path around the firewall. You’ll generally need one at both facilities you are transferring data between. LBNL personnel can help you use the lab’s ScienceDMZ. ESnet personnel may also be able to provide some help implementing a ScienceDMZ at your home institution. See the help contacts above.
Use a “healthy” network path
It is extremely difficult to know which network path your data is taking between LBNL and your home institution and/or whether that path is “healthy.” This issue is best left to the networking professionals (see above) after ensuring that all of the critical items above are not the problem (good data transfer tools and nodes, good switches and no firewalls). While network debugging is beyond the scope of this brief article, one of the tools ESnet finds indispensable in network path analysis is perfSONAR.
Involve Your Local Experts!
If Network Data Transfer would significantly increase your productivity but you don’t run your data servers yourself, please get your system and network administrators involved in the process.