An undulator is a periodic structure of magnets through which an electron bunch goes radiating synchrotron light towards the forward direction. It has been widely used at synchrotron light facilities, such as the Advanced Light Source (ALS, US), European Synchrotron Radiation Facility (ESRF, EU), etc. In spectrum calculation, the sampled trajectories of electrons in a bunch are traced through the measured magnetic field given by a table, then the associated radiation is calculated for each trajectory and integrated at a given observation point downstream of the undulator structure. The spectrum calculation of undulator radiation is becoming more and more compute intensive primarily due to the dramatic increase in the sample number of electrons in a bunch.
Today, HPC Services staffer Yong Qin will be presenting his work during a poster session at this week’s GPU Technology Conference, GTC 2013, in San Jose, California. Yong’s work demonstrates how data parallelism can be applied to spectrum calculation of undulator radiation, which is widely used at synchrotron light facilities across the world. This poster presents the algorithm design and performance optimization details for NVIDIA Fermi GPUs. Performance data from multiple optimization efforts and algorithms will be compared. Advanced topics, such as multiple GPUs, hybrid computing will also be demonstrated on how to further improve the field performance. An overall more than 400 of parallel speedup is achieved on one Fermi C2050 GPU with 448 cores, with close to linear scalability.