Lawrencium Emergency Upgrade
-
Date & Time: Thursday, April 30, from 12 PM to 5 PM
-
Purpose: To maintain the security of the entire system, we need to apply the security patch on the Operating System and reboot all the nodes across the cluster. Jobs will be canceled, data access and transfer won’t be possible, and Open On Demand access will be unavailable. We expect this work to be done by the end of today.
-
Date & Time: Tuesday, April 14, from 9 AM to 5 PM
-
Purpose: Our team will work with DDN, the vendor for the Lustre file system on /global/scratch, to perform critical hardware maintenance. This includes replacing a core system drive and resolving network connectivity issues to improve overall system stability.
- Due to a cooling failure in Building 50B, we powered down the cluster. No estimate of timeline to return yet. Updates will be shared as we progress.
- Due to hardware storage, LRC-xfer and Globus are down. Users won’t be able to transfer data until it’s fixed.
- A brief interruption to identity services, particularly those related to Shibboleth, Grouper, identity linking, password reset, group creation, and RADIUS. Users may experience delays when logging in to the terminal and the OOD web portal during the maintenance window 4-7pm on October 6th and 8th.
- SSH access has been fixed, and most of the down nodes are back online with scratch access back in place.
- SSH access to cluster login nodes is currently having an unplanned service interruption, resulting in failed or slow logins. (Resolved)
- The Scratch file system may not be accessible on some compute nodes, and some compute nodes are still offline at the moment.
- Due to technical difficulties in the datacenter upgrade, the power will resume back on 3rd September, and after two days of rigorous testing on the system, the cluster will be back online on 5th September by 5 pm
- As a part of a DOE-funded infrastructure modernization project, all the power utilities in the datacenter(50B-2275) will be powered down.
- We are checking the root cause of the issue and working towards resolving it as soon as possible. Possibly, the configuration of the switch is altered during LBLNet maintenance work over the weekend.
- Users will experience brief connection issues on the cluster. Jobs shall continue to run.
- We are seeing improvement in the filesystem response. The issues related to OOD and Globus are resolved. Slurm reservation is released
- We are working closely with our storage vendor, DDN, for diagnosis and solutions to bring the file system back to normal. Slurm reservation will stay until work is complete.
- Users won’t be able to submit new jobs, and already submitted jobs won’t start until the work is complete and the reservation is released
- Cooling issue at the datacenter affected the filesystem. We are still working on resolving some lingering issues. lrc-xfer was restarted. Please get in touch with us if you encounter any issues.
- Users may experience unresponsive interactive commands on the Scratch file system.
- The IdM team upgraded machines supporting identity services, which affected cluster authentication. Logins may take up to 30 seconds.
- The scheduled decommissioning date is March 7th. If you have been using LR3 for your jobs, please migrate your workload to other partitions
- Due to the impact on compute nodes and filesystem you might have noticed problems in job completion
- HPC services, including Open OnDemand and Globus, have been returned
- Due to an electrical shutdown planned by PIMD and facilities at Building 50B, we need to power down the Lawrencium cluster at noon on December 1st. The cluster will be back online on December 11th at 5 p.m.
- User now have full access to the cluster
- User will experience slow I/O until it recovers from restart
- User will experience slow I/O until it recovers from restart
- User will experience disruption for about 45 minutes
- Users are now able to log in and access data. There will be a short downtime on Oct 25th, from 9 am to noon, to fix the file system issue.
- Due to network issue login nodes, OOD and DTN are unreachable. We are working on the fix. We appreciate your patience during the process.
- Users using any email besides the LBL domain cannot log in to the OOD portal. We are working on the fix and will have an update as soon as the issue is fixed.
- System-wide Migration to Rocky-8 OS is complete
- Downtime is scheduled for Rocky-8 OS system-wide rollout.
- OOD was offline on Monday and was affected by file system-related issues.
- We have contacted the vendor again and will keep you posted.
- Globus service is not available.
- Slurm upgrade and system maintenance
- An investigation is going on
- lrc-xfer, as well as Globus Google Cloud Storage collection and the Globus AWS S3 Collection will be offline for about 15 minutes.
- Data center power outage and scratch file system upgrade.
- Data center power outage and cluster maintenance work July 20-24
- Lawrencium users may have experienced sluggish slurm commands and a delay in scheduling jobs today (June 12th), this is due to a new configure on the firewall we recently applied. The problem should have been resolved now. If you continue to see any issues, please email us at hpcshelp@lbl.gov.
- Slurm upgrade is complete
- Slurm upgrade 8am Dec 13th – 5pm Dec 14th
- The parallel file system scratch is back to service 10:30 AM Dec 5
- We are working to restore the service 8:00 AM Nov 3
- Slurm upgrade is complete, the Lawrencium cluster is returned to service at 5:30 PM on Oct 6
- Slurm upgrade 8am Oct 4th – 5pm Oct 6
- Slurm patching complete 5:00pm Tuesday, April 26
- Slurm patches 8:00am Tuesday, April 26 – 5:00pm Tuesday, April 26
- Address problems related to MPI jobs, including poor performance with OpenMPI, and issues with pre-compiled IntelMPI binaries.
- Service restored at 5:30pm Wednesday, Feb 2
- Slurm upgrade 8:00am Tuesday, Feb 1 – 5:00pm Thursday, Feb 3
- Service restored at 1:40pm on Sunday, 12/19 2021
- The lawrencium login authentication server is currently down. Users can’t login. We have contacted IDM team at LBNL and will keep you posted. Service is resumed.
- There is a scheduled power outage in 50B-2275 to accommodate testing of the existing Emergency Power Off system. All of the HPC computing systems will be shutdown and unavailable starting at 5:00pm on Friday, 12/17. We anticipate returning the systems to production on Sunday, 12/19 by 5:00pm.
Lawrencium Maintenance on April 14
Lawrencium cluster powered down due to power overload at the data center
lrc-xfer and globus down October 14th
IDM will undergo brief maintenance on October 6th and 8th
Lawrencium Cluster back to normal
Temporary Service Interruption Affecting SSH Access
Lawrencium Downtime extended until 5th September 2025
Lawrencium Upcoming Downtime 19th August 8 am to 29th August 2025 5pm
Lawrencium data transfer node is unreachable 16th June, 2025
LBLNet performing network devices maintenance in data center 14th June, 2025
Lawrencium scratch file system repair work complete. 5:41 AM May 13th, 2025
Lawrencium scratch file system repair work continues. 11:00 AM May 8th, 2025
Slurm reservation on compute node for Lawrencium scratch file system repair work. 1:30 PM May 7th, 2025
Lawrencium scratch file system is back. 12 PM, May 7th, 2025
Lawrencium scratch file system down. May 6th, 2025
Lawrencium login delays April 2nd, 2025
LR3 partition is retiring. February 27th, 2025
The lab-wide power disruption around 2.30 PM on Sunday, Jan 26, affected the Lawrencium Supercluster. January 28th, 2025
Lawrencium Supercluster is back online after a week-long downtime. December 11th, 2024
Lawrencium Downtime (Dec 1- Dec 11 ) due to a Power Outage at Building 50B. November 27th, 2024
The software, firmware, and operating system upgrade on the scratch file system is complete. November 26th, 2024
Two-day file system downtime from 8 a.m. Monday, November 25, until 5 p.m. Tuesday, November 26th 2024. November 23th, 2024
Scratch file system: Back online 3 PM November 14th, 2024
Scratch file system: Restart of scratch file system at 3 PM November 14th, 2024
Network issue partially resolved, Week-long Downtime canceled, October 24th, 2024
Network issue on the cluster, October 24th 2024
Open OnDemand login issue for LBL affiliate users Friday, July 5th 2024
Lawrencium Supercluster Back in Service Wednesday, 3rd 2024 5pm.
Lawrencium Downtime schedules for three days, Monday, July 1st – Wednesday, 3rd 2024
Open OnDemand is back to service on Tuesday, May 21
Globus is back in service Tuesday, Nov 28
Globus is still off Monday, Nov 27
Lawrencium is back online on Nov 22nd.
Lawrencium downtime scheduled Nov 21-22
LR7 partition is back online. 3:30pm, Friday, Sep 1st
Nodes in LR7 partition are offline due to elevated temperature from the low coolant level 3:30pm, Wednesday, Aug 23rd.
The lrc-xfer (Designated Data Transfer node) will have a brief downtime at 11:45am Tuesday, Aug 8th
Lawrencium Supercluster back to service July 25th.
Lawrencium Supercluster Downtime
Lawrencium Supercluster New Firewall Configured June 12th
Lawrencium Supercluster Downtime
LBLnet network upgrade 8am May 20th- 5pm 21st
Open OnDemand(OOD) portal login issue, 9.00 AM May 2
Open OnDemand portal is having a login issue, and we are working on resolving the problem. We will keep you updated on the progress. Sorry for the inconvenience.
Lawrencium cluster login issue persists on March 13th.
Lawrencium users may still experience login failure related to the maintenance work performed last week(March 7-10th). The IDM team who is providing the authentication service is conducting an investigation. We will keep you posted on any progress being made.
Lawrencium Scheduler is back to service 10:30 AM Dec 14
Lawrencium Supercluster Downtime
/global/scratch/users/ is back to service.
Lawrencium Scheduler Slurm is back to service 9:30 AM Nov 3
Lawrencium Supercluster Back to Service
Lawrencium Supercluster Downtime
Lawrencium Supercluster back online
Lawrencium Supercluster Downtime
Reboot Compute Nodes Started at 6:30 PM Thursday, Feb 10 –1:00 PM Friday, Feb 11
Slurm Upgrade is Complete
Lawrencium Supercluster Downtime
Lawrencium Supercluster Service restored post-power outage
Login Failure: Sunday, 12/12 (Service is back 10:40am)
Lawrencium Supercluster Power Outage