Skip to content
Open
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
30 commits
Select commit Hold shift + click to select a range
9c25f22
add doc
2Ryan09 Apr 27, 2026
564c548
additional alternatives
2Ryan09 Apr 27, 2026
9ebb191
use ansible
2Ryan09 Apr 27, 2026
55fde55
use slurmrest
2Ryan09 Apr 28, 2026
d661315
add to ci/cd
2Ryan09 Apr 28, 2026
e9b7818
update readme
2Ryan09 Apr 28, 2026
e54d4ee
streamline job processing and enhance slurmrest integration by using …
2Ryan09 Apr 28, 2026
ed091db
update default URL to sdf-slurmrest-dev.slac.stanford.edu
2Ryan09 Apr 29, 2026
1395b93
add typing to slurmrest
2Ryan09 Apr 29, 2026
54435f0
update doc to include autogenerated client
2Ryan09 Apr 29, 2026
c364bbd
fix test imports
2Ryan09 Apr 29, 2026
4553784
rm port from new URL
2Ryan09 Apr 29, 2026
ed37f62
Merge branch 'main' into sdf-cli-to-slurmrest
2Ryan09 May 6, 2026
bab6c24
slurmrest v0.0.44 -> v0.0.42 endpoints
2Ryan09 May 6, 2026
a28b0aa
query associations one at a time
2Ryan09 May 6, 2026
3054a6c
add doc about individual association fetching
2Ryan09 May 7, 2026
c5106b4
fix broken data formatting
2Ryan09 May 8, 2026
7b539aa
rm deprecated slurmremap
2Ryan09 May 8, 2026
f30bd35
account for data format changes
2Ryan09 May 8, 2026
4aa333f
REST API supports @cluster in account name
2Ryan09 May 8, 2026
6ac8670
add test for data format consistency
2Ryan09 May 8, 2026
699b399
update submodule
2Ryan09 May 8, 2026
ef529a5
SLURM_JWT -> SLURMREST_JWT
2Ryan09 Jun 4, 2026
64532d3
add helpful logging
Jun 8, 2026
ea2c9cd
update submodule
2Ryan09 Jun 8, 2026
cb40e71
typo
2Ryan09 Jun 8, 2026
dd06bd1
typos
2Ryan09 Jun 8, 2026
85a40a8
typo
2Ryan09 Jun 8, 2026
e4620b5
documentation updates
2Ryan09 Jun 8, 2026
7d34cef
Merge remote-tracking branch 'origin/main' into sdf-cli-to-slurmrest
2Ryan09 Jun 8, 2026
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
18 changes: 17 additions & 1 deletion .github/workflows/ci-cd.yaml
Original file line number Diff line number Diff line change
@@ -1,4 +1,4 @@
name: CI/CD
name: CLI CI/CD

on:
push:
Expand All @@ -17,6 +17,22 @@ jobs:
- name: Checkout repository
uses: actions/checkout@v6

- name: Set up Java (for OpenAPI Generator)
uses: actions/setup-java@v4
with:
java-version: '26'
distribution: 'temurin'

- name: Generate OpenAPI Client
run: |
# Install and run OpenAPI Generator
npm install -g @openapitools/openapi-generator-cli
openapi-generator-cli generate \
-i openapi-specs/slurmrest-api-v0.0.44.json \
-g python \
-o slurmrest_client/ \
--package-name openapi_client

- name: Install uv
uses: astral-sh/setup-uv@v7
with:
Expand Down
6 changes: 6 additions & 0 deletions .gitignore
Original file line number Diff line number Diff line change
Expand Up @@ -21,3 +21,9 @@ lib64

# VS Code
.vscode/

# Secrets
.env

# Generated Client
slurmrest_client/
8 changes: 8 additions & 0 deletions Makefile
Original file line number Diff line number Diff line change
Expand Up @@ -4,6 +4,14 @@ PIP_BIN ?= $(VENV_DIR)/bin/pip
PYTHON_BIN ?= python3.9
VAULT_SECRET_PATH ?= secret/tid/coact

generate-client:
npm install -g @openapitools/openapi-generator-cli
openapi-generator-cli generate \
-i openapi-specs/slurmrest-api-v0.0.44.json \
-g python \
-o slurmrest_client \
--package-name openapi_client

secrets:
mkdir etc/.secrets/ -p
#set -e; for i in ldap_binddn ldap_bindpw; do vault kv get --field=$$i $(VAULT_SECRET_PATH) > etc/.secrets/$$i ; done
Expand Down
16 changes: 16 additions & 0 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -36,6 +36,22 @@ then run
make apply
```

## OpenAPI Client Generation

The CLI includes an auto-generated Python client for the Slurm REST API. To generate the client locally:

**Prerequisites:**
- Java runtime (required by OpenAPI Generator)
- On macOS with Homebrew: `brew install openjdk`
- Add to your shell profile: `echo 'export PATH="/opt/homebrew/opt/openjdk/bin:$PATH"' >> ~/.zshrc`

**Generate the client:**
```
make generate-client
```

In CI/CD the workflow generates the client before running `uv sync`; locally you must run `make generate-client` before syncing dependencies.


# Usage

Expand Down
2 changes: 1 addition & 1 deletion ansible-runner/project
106 changes: 106 additions & 0 deletions docs/slurmrest_migration.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,106 @@
# Migration of `sacctmgr` and `sacct` Calls to `slurmrest`

Currently, `sdf-cli` runs `sacctmgr` and `sacct` CLI tools directly via `subprocess` in order to gather account association information, job accounting information, and toggling the number of nodes assigned to a facility in the event of an overage. In a step towards containerization and more robust interaction with SLURM, it is desired to migrate these CLI tool usages to `slurmrest` endpoints.
Comment on lines +1 to +3

## Current Usage

There are currently three read operations and one write operations:

### Read Operations

- `sacctmgr show assoc where account={','.join(list_of_assoc)} --noheader -P format=Account,GrpNodes,GrpJobs,MaxJobs`
- `modules/coact.py`
- Equivalent endpoint: `GET /slurmdb/v0.0.44/associations/`
- `SLURM_TIME_FORMAT=%s sacct --allusers --duplicates --allclusters --allocations --starttime="{start}T00:00:00" --endtime="{start}T23:59:59" --truncate --parsable2 --format=JobID,User,UID,Account,Partition,QOS,Submit,Start,End,Elapsed,NCPUS,AllocNodes,AllocTRES,CPUTimeRAW,NodeList,Reservation,ReservationId,State`
- `api/scripts/jobs2usage.py:37`
- Equivalent endpoint: `GET /slurmdb/v0.0.44/jobs/`
- `SLURM_TIME_FORMAT=%s {sacct_bin_path} --allusers --duplicates --allclusters --allocations --starttime="{date}T{start_time}" --endtime="{date}T{end_time}" --truncate --parsable2 --format=JobID,User,UID,Account,Partition,QOS,Submit,Start,End,Elapsed,NCPUS,AllocNodes,AllocTRES,CPUTimeRAW,NodeList,Reservation,ReservationId,State`
- `modules/coact.py`
- Equivalent endpoint: `GET /slurmdb/v0.0.44/jobs/`

### Write Operations

- `sacctmgr modify -i account name=$facility:_regular_@$cluster set GrpTRES=node=$nodes`
- `cli/modules/coact.py`
- NOT directly possible, `slurmrest` does not support account resource allocation assignments

## Migration Implications

Currently, three daemon tasks can be migrated easily:
- `coact-jobs-import.sh`
- `coact-reporegistration-daemon.sh`
- `coact-userregistration-daemon.sh`

One remains difficult due to the un-migratable write operation:
- `coact-facility-overage-daemon.sh`

## Possible Alternatives

All potential workarounds within `slurmrest` have significant disadvantages vs the current `GrpTRES=node=0` approach. `sacctmgr` appears to be the only reliable way to make modifications to account allocations. There is [a ticket](https://support.schedmd.com/show_bug.cgi?id=24356) with SLURM to support more `sacctmgr` features, however there is no activity on it other than the original post.

### Path Forward

> Execute same CLI tools via ansible

We already have to maintain an ssh connection between the future container and the SLURM infrastructure through Ansible. Ansible allows for direct, ad-hoc command execution via its [`command`](https://docs.ansible.com/projects/ansible/latest/collections/ansible/builtin/command_module.html) module.

e.g.
```
ansible [pattern] -m command -a 'sacctmgr ...'
```

## Use of the Autogenerated Python Client

`slurmrest` ships with the ability to [generate a Python client](https://slurm.schedmd.com/rest.html#python-guide) via `openapi-generator-cli`. The openapi spec was generated once using the `slurmrest` instance within [`slurm-docker-cluster`](https://github.com/giovtorres/slurm-docker-cluster) and now lives in `openapi-specs/`.

Any future updates to `slurmrest` should support previous endpoints, but any new endpoints will require regenerating the openapi spec, which requires a live `slurmrest` instance.

For local development, the client can be created via `make generate-client` (a Java runtime is needed). For containerization, the client is built in CI/CD and will be packaged inside the container for usage. This was chosen to keep the client out of the git history, as it is large and not managed by SLAC, while still keeping it available for usage.

## Associations Fetching

One notable deviation from the previous CLI data gathering to `slurmrest` is the fetching of associations. Previously, all associations were fetched with:

```bash
sacctmgr show assoc where account={','.join(list_of_assoc)} --noheader -P format=Account,GrpNodes,GrpJobs,MaxJobs
```

In the `slurmrest` implementation, now associations are collected one by one as the data volume over network has caused instability.

## Data Format Differences

Several format differences between sacct/sacctmgr CLI output and slurmrest REST API responses required explicit handling in the migration.

### Jobs: Memory TRES units

sacct serialises memory TRES with a unit suffix (K/M/G), e.g.:

```
AllocTRES=cpu=128,mem=512G,node=4,billing=128,gres/gpu:a100=4
```

slurmrest returns memory TRES `count` as a **bare integer in megabytes** with no suffix. To remain compatible with `_kilos_to_int()`, which was written for sacct-style suffixed strings, the migration appends an `M` suffix when serialising memory from the REST response:

```python
value = f"{tres.count}M" if tres.type == "mem" else str(tres.count)
```

Without this, `_kilos_to_int("524288")` would interpret the value as bytes rather than megabytes, producing a ~1024× underestimate relative to `cluster["mem"]` (which is stored in bytes from `nodememgb * 1073741824`).

### Jobs: Time fields

sacct returns Unix timestamps as integers when `SLURM_TIME_FORMAT=%s` is set. The migration code then called `parse_datetime(int(d["Start"]), force_tz=True)` to convert them.

slurmrest returns timestamps as integers in the same epoch-second format, but nested under a `time` struct:

```python
# sacct: int(d["Start"]) → unix timestamp
# REST: job.time.start → unix timestamp (same value, different path)
pendulum.from_timestamp(job.time.start)
```

No unit conversion is needed, but a guard for `0` / falsy values is required since slurmrest uses `0` to indicate "not set" (e.g. a job that never started has `time.start == 0`).

### Jobs: TRES key format for GPUs

sacct uses `gres/gpu:a100=4` (type/name:subtype=count). slurmrest splits this into `tres.type = "gres/gpu"`, `tres.name = "a100"`, `tres.count = 4`. The migration reconstructs the sacct-style key as `f"{tres.type}/{tres.name}"` where a name is present, otherwise just `tres.type`. The `_calc_resource_hours` GPU detection (`if "gpu" in k`) handles both forms correctly.
6 changes: 2 additions & 4 deletions import-jobs.sh
Original file line number Diff line number Diff line change
Expand Up @@ -24,15 +24,13 @@ echo ">" $DATE" ("$(date)")"
# full
./venv/bin/python3 ./sdf_click.py coact slurmdump --date $DATE \
| tee ../slurm-job-history/$DATE \
| ./venv/bin/python3 ./sdf_click.py coact slurmremap \
| tee ../slurm-job-remapped/$DATE \
| ./venv/bin/python3 ./sdf_click.py coact slurmimport --password-file $PASSWORD_FILE --output=upload >/dev/null

# just for 2023 imports
#cat ../slurm-job-remapped/$DATE | ./sdf.py coact slurmimport --password-file $PASSWORD_FILE --output=upload >/dev/null
#cat ../slurm-job-history/$DATE | ./sdf.py coact slurmimport --password-file $PASSWORD_FILE --output=upload >/dev/null

# don't pull data from slurm
#cat ../slurm-job-history/$DATE | ./sdf.py coact slurmremap | tee ../slurm-job-remapped/$DATE | ./sdf.py coact slurmimport --password-file $PASSWORD_FILE --output=upload >/dev/null
#cat ../slurm-job-history/$DATE | ./sdf.py coact slurmimport --password-file $PASSWORD_FILE --output=upload >/dev/null

###
# recalculate summaries
Expand Down
Loading
Loading