Cluster Description
LAWRENCIUM is the platform for the LBNL Condo Cluster Computing (LC3) program, which provides a sustainable way to meet the midrange computing requirement for Berkeley Lab. LAWRENCIUM is part of the LBNL Supercluster and shares the same Supercluster infrastructure. This includes the system management software, software module farm, scheduler, storage and backend network infrastructure.
Storage and Backup:
Login and Data Transfer:
The Lawrencium Supercluster uses One Time Password (OTP) for login authentication for all the services provided below. Please also refer to the Data Transfer page for additional information.
- Login server: lrc-login.lbl.gov
- DATA transfer server: lrc-xfer.lbl.gov
- Globus Online endpoint: lbnl#lrc
Hardware Configuration:
LAWRENCIUM is composed of multiple generations of hardware hence it is physically separated into several partitions to facilitate management and to meet the requirements to host Condo projects. The following table lists the hardware configuration for each individual partition.
Partition | Nodes | Node List | CPU | Cores | Memory | Infiniband | Accelerator | ||||||
---|---|---|---|---|---|---|---|---|---|---|---|---|---|
lr3 | 243 |
|
|
16 20 |
|
FDR | – | ||||||
lr4 | 148 | n0[000-147].lr4 | INTEL XEON E5-2670 v3 | 24 | 64GB | FDR | – | ||||||
lr5 | 192 | n0[000-143].lr5 n0[192-195].lr5n0[148-191].lr5 |
INTEL XEON E5-2680 v4
INTEL XEON ES-2640 v4 |
28 20 |
64GB
128GB |
FDR QDR |
– | ||||||
lr6 | 88 | n0[000-087].lr6 | INTEL XEON Gold 6130 (Skylake) | 32 | 96GB
128GB |
FDR | – | ||||||
lr6 | 156 | n0[088-115].lr6
n0[144-271].lr6 |
INTEL XEON Gold 5218 (Cascade)
INTEL XEON Gold 6230 (Cascade) |
32 40 |
96GB
128GB |
FDR | |||||||
lr7 | 60 | n00[00-59].lr7 | Intel(R) Xeon(R) Gold 6330 | 56 | 256GB | HDR | – | ||||||
lr_bigmem | 2 |
|
|
32 32 |
|
EDR | – | ||||||
es1 | 47 |
n00[24-31].es1 n00[00-05].es1 |
Intel XEON E5-2623
Intel XEON Silver 4212 AMD EPYC 7742 |
8
8 64 |
96 GB 96 GB 512 GB |
FDR |
4X NVIDIA 4X NVIDIA 4x A40 |
||||||
cf1 | 72 | n0[000-071].cf1 | INTEL XEON PHI 7210 | 64 | 192GB | FDR | |||||||
cm1 | 14 | n0[000-013].cm1 | AMD EPYC | 48 | 256GB | FDR | |||||||
csd_lr6_96 (private) |
60 | n0[088-103].lr6
n0[228-271].lr6 |
INTEL XEON GOLD 5218
INTEL XEON GOLD 6230 |
32 40 |
96GB | FDR | |||||||
csd_lr6_192 (private) |
84 | n0[144-227].lr6 | Intel XEON Gold 6230 |
40 | 192GB | FDR |
Storage and Backup:
Lawrencium cluster users are entitled to access the following storage systems so please get familiar with them.
Name | Location | Quota | Backup | Allocation | Description |
---|---|---|---|---|---|
HOME | /global/home/users/$USER | 20GB | Yes | Per User | HOME directory for permanent data storage |
GROUP-SW | /global/home/groups-sw/$GROUP | 200GB | Yes | Per Group | GROUP directory for software and data sharing with backup |
GROUP | /global/home/groups/$GROUP | 400GB | No | Per Group | GROUP directory for data sharing without backup |
SCRATCH | /global/scratch/users/$USER | none | No | Per User | SCRATCH directory with Lustre high performance parallel file system |
CLUSTERFS | /clusterfs/axl/$USER | none | No | Per User | Private storage for AXL condo |
CLUSTERFS | /clusterfs/cumulus/$USER | none | No | Per User | Private storage for CUMULUS condo |
CLUSTERFS | /clusterfs/esd/$USER | none | No | Per User | Private storage for ESD condos |
CLUSTERFS | /clusterfs/geoseq/$USER | none | No | Per User | Private storage for CO2SEQ condo |
CLUSTERFS | /clusterfs/nokomis/$USER | none | No | Per User | Private storage for NOKOMIS condo |
Recharge Model
LAWRENCIUM is a Lab-fund platform for Lawrencium Condo program. LBNL has made a significant investment in developing this platform to meet the midrange computing requirement at Berkeley Lab. The primary purpose of it is to provide a sustainable way to host all the condo projects while meeting the computing requirements from other users as well. To achieve this goal, condo users are allowed to run within their condo contributions for free. However normal users who would like to use the LAWRENCIUM cluster are subject to the LBNL recharge rate. Condo users who would need to run outside of their condo contributions are also subject to the same recharge rate as normal users. For this purpose, condo users will obtain either one or two projects/accounts when their accounts are created on LAWRENCIUM, per the instruction we receive from the PI of the condo project. They would need to provide the correct project when running jobs inside or outside of their condo contributions, which will be explained in detail in the Scheduler Configuration section below. The current recharge model has been effective since Jan, 2011 with the standard recharge rate of $0.01 per Service Unit (1 cent per service unit, SU). Due to the hardware architecture difference we discount effective recharge rate for older generations of hardware and this may go down further when we have newer generations of hardware in production, please refer to the following table for the current recharge rate for each partition.
Partition | Nodes | Node List | SU to Core CPU Hour Ratio | Effective Recharge Rate |
---|---|---|---|---|
lr3 | 332 | n0[000-003].lr3 n0[016-031].lr3 n0[040-059].lr3 n0[064-071].lr3 n0[076-115].lr3 n0[120-139].lr3 n0[144-163].lr3 n0[164-203].lr3 n0[213-336].lr3 n0[369-408].lr3 |
free | free |
lr4 | 141 | n0[000-095].lr4 n0[099-110].lr4 n0[112-135].lr4 n0[139-147].lr4 |
0.50 | $0.005 per Core CPU Hour |
lr5 | 192 | n0[000-143].lr5 n0[148-195].lr5 |
0.75 | $0.0075 per Core CPU Hour |
lr6 | 290 | n0[000-269].lr6 n0[362-381].lr6 |
1.00 | $0.0100 per Core CPU Hour |
lr7 | 60 | n00[00-59].lr7 | 1.00 | $0.0100 per Core CPU Hour |
cf1 | 72 | n00[00-71].cf1 | 0.40 | $0.0040 per Core CPU Hour |
lr_bigmem | 2 | n0[272-273].lr[6] | 1.50 | $0.0150 per Core CPU Hour |
es1 | 47 | n00[00-52].es1 | 1.00 | $0.0100 per Core CPU Hour |
cm1 | 14 | n00[00-13].cm1 | 0.75 | $0.00750 per Core CPU Hour |
cm2 | 3 | n00[00-01,03].cm[2] | 1.00 | $0.0100 per Core CPU Hour |
ood_inter | 5 | n000[0-4].ood0 | 1.00 | $0.0100 per Core CPU Hour |
NOTE: The usage calculation is based on the resource that is allocated to the job instead of the actual usage of the job. For example, if a job asked for one lr5 node with one CPU requirement (typical serial job case), and the job ran for 24 hours, since lr5 nodes are allocated exclusively to the job (please refer to the following Scheduler Configuration section for more detail), the charge that this job incurred would be: $0.0075/(core*hour) * 1 node * 24 cores/node * 24 hours = $4.32, instead of: $0.0075/(core*hour) * 1 core * 24 hours = $0.18.
Scheduler Configuration:
Lawrencium cluster uses SLURM to submit jobs as the scheduler to manage jobs on the cluster. To use the Lawrencium resource the partition “lr3, lr4, lr5, lr6, es1, cm1, cm2” must be used (“–partition=xxx”) along with user (“–account=xxx”). Currently the QoS is lr_normal and lr_debug and lr_lowprio. A standard fair-share policy with a decay half life value of 14 days (2 weeks) is enforced.
- For normal users to use the LAWRENCIUM resource the proper project account, e.g., “–account=ac_abc”, is needed. The QoS “lr_normal” is also required based on the partition that the job is submitted to, e.g., “–qos=lr_normal”.
- If a debug job is desired the “lr_debug” QoS should be specified, e.g., “–qos=lr_debug” so that the scheduler can adjust job priority accordingly.
- Condo users please use the proper condo QoS, e.g., “–qos=condo_xyz”, as well as the proper recharge account “–account=lr_xyz”.
- The partition name is always required in all cases, e.g., “–partition=lr6”.
- A standard fair-share policy with a decay half life value of 14 days (2 weeks) is enforced. All accounts are given equal shares value of 1. All users under each account associated within a partition is subjected to decay’g in priority based on the resources used and the overall parent account usage. Usage is a value between 0.0 and 1.0 that represents the the associates proportional usage of the system. A value of 0 indicates that the association is over-served. In other words that account has used its share of the resources and will be given a lower value of shares compared to users who have not used as much resources.
- Job prioritization is based on Age, Fairshare, Partition and QOS – note: lr_lowprio qos jobs are not given any prioritization and some QOS have higher values than others.
- If a node feature is not provided, the job will be dispatched to nodes based on a predefined order, for “lr3” the order is: lr3_c16, lr3_c20; for “lr5” the order is: lr5_c28, lr5_c20.
Partition | Nodes | Node List | Node Features |
Shared | QoS | QoS Limit | Account | |||||||||||||||||||||
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
lr3 | 337 |
|
|
Exclusive |
|
|
|
|||||||||||||||||||||
lr4 | 108 | n0[000-095] .lr4 n0[099-110] .lr4 n0[112-135] .lr4 n0[139-147] .lr4 |
lr4 | Exclusive |
|
|
|
|||||||||||||||||||||
lr5 | 144 | n0[000-143] .lr5 n0[148-191] |
lr5
lr5_c20,lr5 |
Exclusive |
|
|
|
|||||||||||||||||||||
lr6 | 244 | n0[000-115, 144-271] .lr6 |
lr6
lr6_m192 lr6_sky |
Exclusive |
|
|
|
|||||||||||||||||||||
lr7 | 60 | n00[00-50].lr7 | lr6 | Shared | lr_normal
lr_debug |
32 nodes max per job 72:00:00 wallclock limit 4 nodes max per job |
ac_* pc_* |
|||||||||||||||||||||
cf1 | 72 | n0[000-071] .cf1 |
cf1 | Exclusive |
|
|
|
|||||||||||||||||||||
es1 | 43 |
n00[24-31].es1 |
es1_a40
es1_v100 |
Shared |
|
|
|
|||||||||||||||||||||
cm1 | 14 | n00[00-13].cm1 | cm1_amd cm1 |
Shared | condo_qchem | 14 nodes max per job | lr_qchem | |||||||||||||||||||||
csd_lr6_96 (private) |
60 |
n0[088-103].lr6
n0[228-271].lr6
|
lr6, lr6_cas | Exclusive | condo_neugroup
condo_statmech |
22 nodes max per group
22 nodes max per group |
lr_neugroup
lr_statmech |
|||||||||||||||||||||
csd_lr6_192 (private) |
84 | n0[144-227].lr6 | lr6,lr6_cas lr6_m192 | Exclusive | condo_amos
condo_chandra_lr6 condo_fstheory condo_mp_lr6 |
24 nodes max per group
2 nodes max per group 18 nodes max per group 16 nodes max per group |
lr_amos
lr_chandra lr_fstheory lr_mp |
Software Configuration:
The Lawrencium cluster uses Global Software Farm and Environment Modules to manage the cluster wide software installation.
Cluster Status:
Please visit here for the live status of Lawrencium cluster.
Additional Information:
Please send us tickets to hpcshelp@lbl.gov or send email to ScienceIT@lbl.gov for any inquiries or service requests.