Linux Cluster Support

Overview

The trend in high performance computing is towards the use of Linux clusters. Concurrently, there has been a growing interest in the use of Linux clusters for scientific research at Berkeley Lab. For many, a cluster assembled from inexpensive commodity off-the-shelf hardware and open source software promises to be a cost effective way to obtain a high performance system.

Though many of the concepts are simple, it remains difficult for scientists to navigate a myriad of technologies in order to arrive at a cluster configuration that will meet their needs. Similarly, it is harder to efficiently manage a multi-node compute cluster than it is to do the same for a desktop workstation. Consequently, adopters of this technology have had to invest large amounts of effort to realize the full potential of their systems.

The Scientific Cluster Support program was developed in 2003 to address the difficulties of obtaining and running a Linux cluster system so that PIs can have access to a dedicated resource that can provide the fast turnaround needed to facilitate scientific inquiry and development. The ultimate goal being to increase the overall use of scientific computing to Lab research projects, and to promote parallel computing within the Berkeley Lab community.

Service Description

The High Performance Computing Services Group in the IT Division offers the following services for LBNL and UC Berkeley researchers who own or want to acquire a Linux cluster to meet their computational needs.

Pre-purchase consulting – Understand customer application; Determine cluster hardware architecture and interconnect; Identify required software;
Procurement assistance – Assistance with developing a budget, development of RFP.
Setup and configuration – This includes installation and setup of the cluster hardware and networking; and installation and configuration of cluster software, scheduler, and applications software
Ongoing systems administration and cyber security – operating system and cluster software maintenance and upgrades; security updates; monitoring of cluster nodes; user accounts
Computer room space with networking and cooling* – Clusters will be hosted in the either the Bldg 50B-1275 or Earl Warren Hall datacenter to insure access to sufficient electrical, cooling, and networking infrastructure. PIs are responsible for covering the purchase costs of racks, PDUs, cooling doors, and seismic bracing for new installations.
Clusters are hosted in a Supercluster infrastructure provided by HPC Services consisting of a dedicated firewall subnet; one-time password authentication; multiple interactive login nodes; access to shared third-party compilers, scientific libraries, and applications; shared home directory storage and
Lustre parallel filesystem. This Supercluster infrastructure is used by all the clusters within the datacenter to facilitate the movement of researchers across projects and the sharing of compute resources.

Requirements

Systems in the SCS or Berkeley Research Computing Program must meet the following requirements to be eligible for support:

Intel x86 architecture
Participating cluster must have a minimum of 8 compute nodes
Dedicated cluster architecture. No interactive logins on compute nodes
Red Hat Enterprise Linux or equivalent operating system
Warewulf3 cluster implementation toolkit
SchedMD SLURM scheduler
OpenMPI message passing library
All slave nodes only reachable from master node
Clusters that will be located in the datacenter must meet the following additional requirements

Rack mounted hardware required. Desktop form factor hardware not allowed
Equipment to be installed into APC Netshelter AR3350 computer racks equipped with 2 ea. APC AP8867 PDUs and Motivair active cooling doors. Prospective cluster owners should include the cost of these additional items into their planning budget

Rates

Berkeley Lab has determined that it will cover the cost of the Lab’s HPC infrastructure including investment in the data center and the cost of maintaining HPC expertise. Therefore, PIs are only charged for the incremental effort of adding their cluster into our support pool. Pricing for PI-owned clusters can be calculated by using the following rates that apply to their configuration.
Master node: $300/mo.
Infiniband support: $300/mo.
Storage node: $300/mo. per compute node
IBM GPFS or Lustre support: $300/mo.
Compute node: $25/mo. per compute node
1275 Data Center Colocation: $100/mo per rack
For example, support for a 20-node standalone cluster with an infiniband interconnect would be priced at:
$300/mo master + $300/mo. infiniband support + (20 nodes x $25/mo/node) = $1100/mo.

Backups for storage servers are available from the IT Backups Group and are priced separately.

PIs should check the SCS Service Level Agreement for a full description of the program provisions and requirements.

*Note: Datacenter space is very limited and based on availability.

Example customers

SCG currently provide dedicated cluster and Linux servers SLA to quite a variety of groups, a random sampling includes: NSD, CRD, ETA, ALS.

One Level Up

The Resources below are served as the HPC User Guide

Linux Cluster Support

Accounts

Project Accounts

User Accounts

Connecting

Multi-Factor Authentication

Data Movement and Storage

Globus for Lawrencium

Getting Started with Rocky-8

Open OnDemand

Introduction

Jupyter Server

Running Jobs

1. My Slurm Association

2. Slurm Overview

3. Job Submission Script Examples

4. Monitor Jobs

5. GNU Parallel

6. Hadoop and Spark Job Helper

Software

Software Module Farm (SMF)

Software Module Management

Cf1 (Californium) Intel Phi Cluster

Cm1 (Curium) AMD Cluster

Es1 (Einsteinium) GPU Cluster

Lawrencium Linux Cluster

ALSACC - Advanced Light Source

ARES - Applied Nuclear Physics Program

CATAMOUNT - Material Sciences Division

CATSCAN - Computational Research Division

DIRAC1 - Computational Research Division

ETNA - Molecular Foundry Nanoscience Facility

MHG - Chemical Sciences Division

NANO

VULCAN

XMAS - X-ray Microdiffraction Analysis Software at ALS Beamline 12.3.2

Operations

IT Division

IT Help Desk