Lawrencium Cluster

Cluster Description

LAWRENCIUM is the platform for the LBNL Condo Cluster Computing (LC3) program, which provides a sustainable way to meet the midrange computing requirement for Berkeley Lab. LAWRENCIUM is part of the LBNL Supercluster and shares the same Supercluster infrastructure. This includes the system management software, software module farm, scheduler, storage and backend network infrastructure.
Storage and Backup:

Login and Data Transfer:

The Lawrencium Supercluster uses One Time Password (OTP) for login authentication for all the services provided below. Please also refer to the Data Transfer page for additional information.

Login server: lrc-login.lbl.gov
DATA transfer server: lrc-xfer.lbl.gov
Globus Online endpoint: lbnl#lrc

Hardware Configuration:

LAWRENCIUM is composed of multiple generations of hardware hence it is physically separated into several partitions to facilitate management and to meet the requirements to host Condo projects. The following table lists the hardware configuration for each individual partition.

Partition

Nodes

Node List

CPU

Cores

Memory

Infiniband

Accelerator

lr3

243

n0[000-163].lr3
n0[016-031].lr3
n0[040-059].lr3
n0[064-115].lr3
n0[120-139].lr3
n0[144-163].lr3
n0[309-336].lr3

n0[164-203].lr3
n0[213-308].lr3

INTEL XEON E5 2670

INTEL XEON E5-2670 v2

64GB

FDR

–

lr4

148

n0[000-147].lr4

INTEL XEON E5-2670 v3

64GB

FDR

–

lr5

192

n0[000-143].lr5
n0[192-195].lr5n0[148-191].lr5

INTEL XEON E5-2680 v4

INTEL XEON ES-2640 v4

64GB

128GB

FDR
QDR

–

lr6

n0[000-087].lr6

INTEL XEON Gold 6130 (Skylake)

96GB

128GB

FDR

–

lr6

156

n0[088-115].lr6

n0[144-271].lr6

INTEL XEON Gold 5218 (Cascade)

INTEL XEON Gold 6230

(Cascade)

96GB

128GB

FDR

lr7

n00[00-59].lr7

Intel(R) Xeon(R) Gold 6330

256GB

HDR

–

lr_bigmem

n0272.lr6

n0273.lr6

INTEL XEON GOLD 5218

1584GB

EDR

–

es1

n00[12-23].es1
n0032.es1
n00[43-44].es1

n00[24-31].es1
n00[33-42].es1

n00[00-05].es1
n00[45-52].es1

Intel XEON E5-2623

Intel XEON Silver 4212

AMD EPYC 7742

64 GB

187 GB

96 GB

512 GB

FDR

2X NVIDIA
V100

4X NVIDIA
2080TI

4x A40

cf1

n0[000-071].cf1

INTEL XEON PHI 7210

192GB

FDR

cm1

n0[000-013].cm1

AMD EPYC

256GB

FDR

csd_lr6_96
(private)

n0[088-103].lr6

n0[228-271].lr6

INTEL XEON GOLD 5218

INTEL XEON GOLD 6230

96GB

FDR

csd_lr6_192
(private)

n0[144-227].lr6

Intel XEON Gold
6230

192GB

FDR

Storage and Backup:

Lawrencium cluster users are entitled to access the following storage systems so please get familiar with them.

Name	Location	Quota	Backup	Allocation	Description
HOME	/global/home/users/$USER	20GB	Yes	Per User	HOME directory for permanent data storage
GROUP-SW	/global/home/groups-sw/$GROUP	200GB	Yes	Per Group	GROUP directory for software and data sharing with backup
GROUP	/global/home/groups/$GROUP	400GB	No	Per Group	GROUP directory for data sharing without backup
SCRATCH	/global/scratch/users/$USER	none	No	Per User	SCRATCH directory with Lustre high performance parallel file system
CLUSTERFS	/clusterfs/axl/$USER	none	No	Per User	Private storage for AXL condo
CLUSTERFS	/clusterfs/cumulus/$USER	none	No	Per User	Private storage for CUMULUS condo
CLUSTERFS	/clusterfs/esd/$USER	none	No	Per User	Private storage for ESD condos
CLUSTERFS	/clusterfs/geoseq/$USER	none	No	Per User	Private storage for CO2SEQ condo
CLUSTERFS	/clusterfs/nokomis/$USER	none	No	Per User	Private storage for NOKOMIS condo

Recharge Model

LAWRENCIUM is a Lab-fund platform for Lawrencium Condo program. LBNL has made a significant investment in developing this platform to meet the midrange computing requirement at Berkeley Lab. The primary purpose of it is to provide a sustainable way to host all the condo projects while meeting the computing requirements from other users as well. To achieve this goal, condo users are allowed to run within their condo contributions for free. However normal users who would like to use the LAWRENCIUM cluster are subject to the LBNL recharge rate. Condo users who would need to run outside of their condo contributions are also subject to the same recharge rate as normal users. For this purpose, condo users will obtain either one or two projects/accounts when their accounts are created on LAWRENCIUM, per the instruction we receive from the PI of the condo project. They would need to provide the correct project when running jobs inside or outside of their condo contributions, which will be explained in detail in the Scheduler Configuration section below. The current recharge model has been effective since Jan, 2011 with the standard recharge rate of $0.01 per Service Unit (1 cent per service unit, SU). Due to the hardware architecture difference we discount effective recharge rate for older generations of hardware and this may go down further when we have newer generations of hardware in production, please refer to the following table for the current recharge rate for each partition.

Partition	Nodes	Node List	SU to Core CPU Hour Ratio	Effective Recharge Rate
lr3	332	n0[000-003].lr3 n0[016-031].lr3 n0[040-059].lr3 n0[064-071].lr3 n0[076-115].lr3 n0[120-139].lr3 n0[144-163].lr3 n0[164-203].lr3 n0[213-336].lr3 n0[369-408].lr3	free	free
lr4	141	n0[000-095].lr4 n0[099-110].lr4 n0[112-135].lr4 n0[139-147].lr4	0.50	$0.005 per Core CPU Hour
lr5	192	n0[000-143].lr5 n0[148-195].lr5	0.75	$0.0075 per Core CPU Hour
lr6	290	n0[000-269].lr6 n0[362-381].lr6	1.00	$0.0100 per Core CPU Hour
lr7	60	n00[00-59].lr7	1.00	$0.0100 per Core CPU Hour
cf1	72	n00[00-71].cf1	0.40	$0.0040 per Core CPU Hour
lr_bigmem	2	n0[272-273].lr[6]	1.50	$0.0150 per Core CPU Hour
es1	47	n00[00-52].es1	1.00	$0.0100 per Core CPU Hour
cm1	14	n00[00-13].cm1	0.75	$0.00750 per Core CPU Hour
cm2	3	n00[00-01,03].cm[2]	1.00	$0.0100 per Core CPU Hour
ood_inter	5	n000[0-4].ood0	1.00	$0.0100 per Core CPU Hour

NOTE: The usage calculation is based on the resource that is allocated to the job instead of the actual usage of the job. For example, if a job asked for one lr5 node with one CPU requirement (typical serial job case), and the job ran for 24 hours, since lr5 nodes are allocated exclusively to the job (please refer to the following Scheduler Configuration section for more detail), the charge that this job incurred would be: $0.0075/(core*hour) * 1 node * 24 cores/node * 24 hours = $4.32, instead of: $0.0075/(core*hour) * 1 core * 24 hours = $0.18.

Scheduler Configuration:

Lawrencium cluster uses SLURM to submit jobs as the scheduler to manage jobs on the cluster. To use the Lawrencium resource the partition “lr3, lr4, lr5, lr6, es1, cm1, cm2” must be used (“–partition=xxx”) along with user (“–account=xxx”). Currently the QoS is lr_normal and lr_debug and lr_lowprio. A standard fair-share policy with a decay half life value of 14 days (2 weeks) is enforced.

For normal users to use the LAWRENCIUM resource the proper project account, e.g., “–account=ac_abc”, is needed. The QoS “lr_normal” is also required based on the partition that the job is submitted to, e.g., “–qos=lr_normal”.
If a debug job is desired the “lr_debug” QoS should be specified, e.g., “–qos=lr_debug” so that the scheduler can adjust job priority accordingly.
Condo users please use the proper condo QoS, e.g., “–qos=condo_xyz”, as well as the proper recharge account “–account=lr_xyz”.
The partition name is always required in all cases, e.g., “–partition=lr6”.
A standard fair-share policy with a decay half life value of 14 days (2 weeks) is enforced. All accounts are given equal shares value of 1. All users under each account associated within a partition is subjected to decay’g in priority based on the resources used and the overall parent account usage. Usage is a value between 0.0 and 1.0 that represents the the associates proportional usage of the system. A value of 0 indicates that the association is over-served. In other words that account has used its share of the resources and will be given a lower value of shares compared to users who have not used as much resources.
Job prioritization is based on Age, Fairshare, Partition and QOS – note: lr_lowprio qos jobs are not given any prioritization and some QOS have higher values than others.
If a node feature is not provided, the job will be dispatched to nodes based on a predefined order, for “lr3” the order is: lr3_c16, lr3_c20; for “lr5” the order is: lr5_c28, lr5_c20.

Partition

Nodes

Node List

Node
Features

Shared

QoS

QoS Limit

Account

lr3

337

n0[000-163]
.lr3
n0[309-336].lr3

n0[164-203]
.lr3
n0[213-308]
.lr3

lr3
lr3_c16

lr3
lr3_c20

Exclusive

lr_normal

lr_debug

condo_axl

condo_esd1

condo_
nanotheory

condo_nokomis

condo_jgicloud

32 nodes max per job
72:00:00 wallclock limit

4 nodes max per job
4 nodes in total
01:00:00 wallclock limit

30 nodes max per group

36 nodes max per group

4 nodes max per group

40 nodes max per group

ac_*
pc_*

lr_axl

lr_esd1

lr_
nanotheory

lr_nokomis

lr_jgicloud

lr4

108

n0[000-095]
.lr4
n0[099-110]
.lr4
n0[112-135]
.lr4
n0[139-147]
.lr4

lr4

Exclusive

lr_normal

lr_debug

condo_
minnehaha

condo_
matminer

32 nodes max per job
72:00:00 wallclock limit

4 nodes max per job
4 nodes in total
01:00:00 wallclock limit

36 nodes max per group

4 nodes max per group

ac_*
pc_*

lr_
minnehaha

lr_
matminer

lr5

144

n0[000-143]
.lr5

n0[148-191]
.lr5

lr5

lr5_c20,lr5

Exclusive

lr_normal

lr_debug

condo_ceder

32 nodes max per job
72:00:00 wallclock limit

4 nodes in total
4 nodes max per job
01:00:00 wallclock limit

44 nodes max per group

ac_*
pc_*

lr_ceder

lr6

244

n0[000-115,
144-271]
.lr6

lr6

lr6_m192

lr6_cas

lr6_sky

Exclusive

lr_normal

lr_debug

condo_esd2

condo_mp_lr6

condo_cumulus_lr6

condo_oppie

condo_omega

condo_alsu

condo_chandra_lr6

32 nodes max per job
72:00:00 wallclock limit

4 nodes max per job
4 nodes in total
01:00:00 wallclock limit

16 nodes max per group

20 nodes max per group

12 nodes max per group

4 nodes max per group

2 nodes max per group

ac_*
pc_*

lr_esd2

lr_mp

lr_cumulus

lr_oppie

lr_omega

lr_alsu

lr_chandra

lr7

n00[00-50].lr7

lr6

Shared

lr_normal

lr_debug

32 nodes max per job
72:00:00 wallclock limit

4 nodes max per job
4 nodes in total
01:00:00 wallclock limit

ac_*
pc_*

cf1

n0[000-071]
.cf1

cf1

Exclusive

cf_normal

cf_debug

condo_mp_cf1

16 nodes max per job
72:00:00 wallclock limit

4 nodes max per job
4 nodes in total
01:00:00 wallclock limit

56 nodes max per job

ac_*
pc_*

lr_mp

es1

n00[00-05].es1

n00[45-52].es1

n00[12-23].es1
n0032.es1.es1
n00[43-44].es1

n00[24-31].es1
n00[33-42].es1

es1_a40

es1_v100
es1_2080ti

Shared

es_normal

es_debug

condo_mp_es1

20 nodes max per job
72:00:00 wallclock limit

4 nodes max per job
4 nodes in total
01:00:00 wallclock limit

1 node max per job

ac_*
pc_*

lr_mp

cm1

n00[00-13].cm1

cm1_amd
cm1

Shared

condo_qchem

14 nodes max per job

lr_qchem

csd_lr6_96
(private)

n0[088-103].lr6

n0[228-271].lr6

lr6, lr6_cas

Exclusive

condo_neugroup

condo_statmech

22 nodes max per group

lr_neugroup

lr_statmech

csd_lr6_192
(private)

n0[144-227].lr6

lr6,lr6_cas lr6_m192

Exclusive

condo_amos

condo_chandra_lr6

condo_fstheory
condo_mhg2
condo_ninjaone

condo_mp_lr6

24 nodes max per group

2 nodes max per group

18 nodes max per group

16 nodes max per group

lr_amos

lr_chandra

lr_fstheory
lr_mhg2
lr_ninajone

lr_mp

Software Configuration:

The Lawrencium cluster uses Global Software Farm and Environment Modules to manage the cluster wide software installation.

Cluster Status:

Please visit here for the live status of Lawrencium cluster.

Additional Information:

Please send us tickets to hpcshelp@lbl.gov or send email to ScienceIT@lbl.gov for any inquiries or service requests.

HPC User Guide

Cluster Description

Login and Data Transfer:

Hardware Configuration:

Storage and Backup:

Recharge Model

Scheduler Configuration:

Software Configuration:

Cluster Status:

Additional Information:

Operations

IT Division

IT Help Desk