A lightweight Python CLI for queueing and executing shell commands in parallel. Inspired by GNU Parallel, parallelcmd provides:
- Command generation from argument combinations
- Concurrent execution with live output and progress tracking
- Job management: inspect, reset, delete, and update queued jobs
- Flexible workflows: resume and scale workers on demand
| GNU Parallel | parallelcmd | |
|---|---|---|
| State | Stateless¹ (fire and forget) | Stateful (SQLite queue persists) |
| Resume | Manual (--joblog + --resume) |
Automatic (re-run exec) |
| Job management | Limited (joblog file) | First-class (check, reset, delete, update) |
| Dependency | Perl | Python stdlib only |
Use GNU Parallel for one-shot parallel runs. Use parallelcmd when you need to stop, resume, inspect, and selectively retry jobs across sessions — especially for long ML or HPC experiment sweeps.
¹ GNU Parallel can be made stateful via
--sqlmaster/--sqlworker, but requires installing the PerlDBD::SQLitemodule separately.
Download the single script and make it executable — no pip or dependencies required.
# wget
wget https://raw.githubusercontent.com/AI-ModCon/parallelcmd/main/parallelcmd.py
chmod +x parallelcmd.py
# curl
curl -O https://raw.githubusercontent.com/AI-ModCon/parallelcmd/main/parallelcmd.py
chmod +x parallelcmd.pyYou can then run it directly:
./parallelcmd.py --helpOr place it somewhere on your PATH (e.g. ~/.local/bin/) to use it as parallelcmd.py from any directory.
- Python 3.8+
- Standard library only (no external Python dependencies)
python3 parallelcmd.py --helpCreate a job database:
python3 parallelcmd.py init "echo {}" ::: a b cRun queued jobs with 4 workers:
python3 parallelcmd.py exec -j 4Or do both in one command:
python3 parallelcmd.py run -j 4 "echo {}" ::: a b cIf you omit the subcommand entirely, parallelcmd.py defaults to run:
python3 parallelcmd.py -j 4 "echo {}" ::: a b cCheck status:
python3 parallelcmd.py checkinit builds commands and stores them in pardb.sqlite by default. Use --db <name> to target <name>.sqlite, or set the PARDB environment variable.
:::starts an inline argument list.::::starts an argument list loaded from a file (one value per line; empty lines and#comments are ignored).:::: -reads the argument list from stdin.- Multiple lists are combined with Cartesian product.
- If the command has no
{}placeholders, placeholders are appended automatically. - If no
:::or::::separator is given and stdin is a pipe, stdin lines are used as the argument list automatically.
| Placeholder | Meaning |
|---|---|
{} |
Current argument (positional, auto-assigned left to right) |
{0}, {1}, … |
Explicit positional argument from the Nth ::: / :::: list (0-indexed) |
{%} |
Worker slot number (0-indexed, stable for the lifetime of exec) |
{#} |
Job sequence number (the DB Seq of the job being run) |
{} and {0} / {1} can be mixed freely. {%} and {#} are substituted at run time, not at init time, so the stored command retains the literal placeholder.
Example — Cartesian product with implicit placeholders:
python3 parallelcmd.py init "python train.py --lr {} --seed {}" ::: 1e-3 1e-4 ::: 1 2 3This creates 6 jobs.
Example — explicit positional placeholders (same result, order made explicit):
python3 parallelcmd.py init "python train.py --lr {0} --seed {1}" ::: 1e-3 1e-4 ::: 1 2 3Example — GPU assignment via worker slot:
python3 parallelcmd.py run -j 4 "CUDA_VISIBLE_DEVICES={%} python train.py --lr {}" ::: 1e-3 1e-4 1e-5 1e-6Each worker holds a fixed slot (0–3), so all its jobs run on the same GPU.
Example — --zip to pair lists element-by-element instead of Cartesian product:
# Without --zip: 4 jobs (a+x, a+y, b+x, b+y)
python3 parallelcmd.py init "cmd {0} {1}" ::: a b ::: x y
# With --zip: 2 jobs (a+x, b+y)
python3 parallelcmd.py init --zip "cmd {0} {1}" ::: a b ::: x yStops at the shortest list when lengths differ.
Initialize the job queue, or append to an existing one.
python3 parallelcmd.py init [options] <command ...> [ ::: <args ...> ]* [ :::: <argfile ...> ]*Options:
-a, --appendappend to existing table instead of recreating-f, --forcedrop the existingparjobtable and recreate it--check_dupskip commands that already exist--zippair argument lists element-by-element instead of Cartesian product-v, --verbose
Execute queued jobs in parallel.
python3 parallelcmd.py exec [options]Options:
-j, --nworkers <n>number of workers (default:4)--id <id ...>run only these specific job IDs--progressshow aggregate progress line--barshow a visual ASCII progress bar (alternative to--progress)--etaappend estimated time remaining to the progress or bar line--dashboardcompact live dashboard mode--dryrunprint commands without running; jobs are marked done (exit 0), so runreset --allbefore a real run-v, --verbose--timeskip <sec>throttle displayed output updates--randomorderfetch pending jobs in random order--prefix <cmd>prefix each command; supports shell env var assignments (example:srun -N1 -n1,NP=8)--max_jobs <n>max jobs per worker--delay <sec>sleep this many seconds before starting each job (default:0); also used as the upper bound for the initial per-worker random stagger--wait <sec>when no job is available, wait this many seconds and retry instead of exiting (useful when another process is still adding jobs)--timeout <sec>kill a task and move to the next if it runs longer than this many seconds; timed-out jobs are recorded with exit code124--retries <n>retry a failed job up to N times before marking it failed (default:0; timed-out jobs are never retried)--halt <n>stop queuing new jobs after N failures; already-running jobs complete normally--output-dir <dir>save each job's stdout (and stderr) to<dir>/<seq>.out--quietsuppress per-job output lines (useful with--progressor--bar)--tagprefix each output line with the full command instead of the seq ID--hook <file>Python plugin file; see Hooks below
Initialize and execute in one step (init + exec).
python3 parallelcmd.py run [options] <command ...> [ ::: <args ...> ]* [ :::: <argfile ...> ]*Common options include:
- init side:
--append,-f/--force,--check_dup,--zip - exec side:
-j/--nworkers,--id,--progress,--bar,--eta,--dashboard,--dryrun,--randomorder,--prefix,--max_jobs,--delay,--wait,--timeout,--retries,--halt,--output-dir,--quiet,--tag,--hook
Inspect queue summary or list all rows.
python3 parallelcmd.py check [options]Options:
-l, --listlist all matching rows instead of the summary--nonzerofilter to only jobs with non-zero exit value--runningfilter to only currently running jobs--where <sql>arbitrary SQLWHEREclause--like <pattern>filter byCommand LIKE <pattern>--id <id ...>filter by specific job IDs
Reset selected jobs to pending (Starttime, JobRuntime, Exitval set to NULL).
python3 parallelcmd.py reset [--all | --nonzero | --like <pattern> | --id <id ...> | --where <sql>]Options:
-a, --allreset all jobs--nonzeroreset only jobs with non-zero exit value--where <sql>arbitrary SQLWHEREclause--like <pattern>filter byCommand LIKE <pattern>--id <id ...>filter by specific job IDs-y, --yesskip confirmation prompt
Prompts for confirmation before resetting rows (skipped with -y).
Delete selected jobs.
python3 parallelcmd.py delete [options]Options:
-a, --alldelete all jobs--like <pattern>filter by SQL LIKE pattern on command text--id <id ...>filter by job ID(s)-y, --yesskip confirmation prompt
Prompts for confirmation before deleting rows (skipped with -y).
Find/replace command text for selected jobs.
python3 parallelcmd.py update [options]Options:
--replace "old,new"find and replace text pair (comma-separated)--like <pattern>filter by SQL LIKE pattern on command text--id <id ...>filter by job ID(s)-y, --yesskip confirmation prompt
Prompts for confirmation before updating rows (skipped with -y).
Note: if the replacement text starts with
--, use the=form to prevent argparse from treating it as a flag:python3 parallelcmd.py update --replace='--old-flag,--new-flag'
Inspect the health of the SQLite database — useful when jobs appear stuck or the DB seems unresponsive.
python3 parallelcmd.py diagnose [--stale SECONDS]Reports:
- Job counts by state (pending / running / success / failed / error)
- In-progress jobs (
Exitval = -1000) with age in seconds; flags any older than--stale(default:3600) as potentially stale - DB file sizes (
.sqlite,.sqlite-wal,.sqlite-shm); warns if WAL exceeds 10 MB - Exclusive lock probe — attempts
BEGIN EXCLUSIVEwith a 2-second timeout - Open file handles via
lsof
To recover stale in-progress jobs after a crash:
python3 parallelcmd.py reset --where "Exitval = -1000" -y--hook <file> loads a Python file that can inspect each job before and/or after it runs. Define either or both functions:
def on_before_task(taskid, cmd):
# called after --delay sleep, before the subprocess launches
# return False → requeue this job to pending and stop this worker
return True
def on_after_task(taskid, cmd, exitval, runtime):
# called after the exit value is written to the DB
# return False → stop this worker (other workers keep running)
return True- Either function can be omitted — only the defined ones are called.
- Exceptions inside a hook are logged and treated as
True(continue). - Returning
Falsestops only the calling worker; other workers are unaffected.
Example hook files are in the hooks/ directory:
| File | Purpose |
|---|---|
hooks/my_slurm_hook.py |
Stop workers when SLURM remaining time drops below 1 hour |
hooks/my_pbs_hook.py |
Same for PBS/Torque (qstat) |
python3 parallelcmd.py exec -j 4 --hook=hooks/my_slurm_hook.pyEdit CHECK_TIMELEFT at the top of the hook file to adjust the threshold.
--db <name>SQLite DB basename; the file on disk is<name>.sqlite--db_retries <n>max retries when SQLite is locked (default:10)--log_level {debug,info}logging level (default:info)
Pipe arguments from stdin (auto-detected when no ::: or :::: is given):
cat cases.txt | python3 parallelcmd.py -j 4 "bash run.sh {}"
seq 10 | python3 parallelcmd.py "echo {}"Pipe stdin explicitly with :::: - (combinable with other arg lists):
cat cases.txt | python3 parallelcmd.py run "bash run.sh {} {}" :::: - ::: seed1 seed2Run scripts from values in a file:
python3 parallelcmd.py init "bash run_case.sh {}" :::: cases.txt
python3 parallelcmd.py exec -j 8Use a custom DB file:
python3 parallelcmd.py --db jobs init "echo {}" ::: x y z
python3 parallelcmd.py --db jobs exec -j 2Kill tasks that exceed a time limit and continue to the next job:
python3 parallelcmd.py exec -j 4 --timeout 300Timed-out jobs are recorded with exit code 124. Find them with:
python3 parallelcmd.py check -l --where "Exitval = 124"Reset timed-out jobs to retry with a longer timeout:
python3 parallelcmd.py reset --where "Exitval = 124"
python3 parallelcmd.py exec -j 4 --timeout 600Keep workers alive while another process appends jobs later:
python3 parallelcmd.py exec -j 4 --wait 10
python3 parallelcmd.py init -a "echo {}" ::: later1 later2Retry failed jobs only:
python3 parallelcmd.py reset
python3 parallelcmd.py exec -j 4Overwrite the queue with a new set of jobs (drop and recreate):
python3 parallelcmd.py init -f "echo {}" ::: x y z
python3 parallelcmd.py exec -j 4- Job output is streamed to stdout while running.
- Queue state is persisted in SQLite, so you can stop and resume workflows.
reset,delete, andupdateprompt for confirmation by default; pass-yto skip.- With
--wait, workers poll for newly appended jobs instead of exiting as soon as the queue is empty.
Add these to ~/.bashrc or ~/.zshrc to avoid typing the full command each time.
Assumes parallelcmd.py is on your PATH.
# parallelcmd aliases
alias pc='parallelcmd.py'
# init
alias pci='parallelcmd.py init'
alias pcia='parallelcmd.py init --append'
alias pcif='parallelcmd.py init --force'
# exec
alias pce='parallelcmd.py exec'
alias pcer='parallelcmd.py exec --randomorder'
alias pcep='parallelcmd.py exec --progress'
# check
alias pck='parallelcmd.py check'
alias pckl='parallelcmd.py check -l'
alias pckf='parallelcmd.py check -l --nonzero'
# reset / delete / update
alias pcr='parallelcmd.py reset'
alias pcra='parallelcmd.py reset --all'
alias pcrf='parallelcmd.py reset --nonzero'
alias pcd='parallelcmd.py delete'
alias pcda='parallelcmd.py delete --all'
alias pcu='parallelcmd.py update'
# reset timed-out jobs
alias pctimeout='parallelcmd.py reset --where "Exitval = 124"'
# exec with N workers and progress (usage: pcej 8)
pcej() { parallelcmd.py exec -j "$@"; }
# run (init + exec) with common worker counts and progress
pcj4() { parallelcmd.py run -j 4 "$@"; }
pcj8() { parallelcmd.py run -j 8 "$@"; }
pcj16() { parallelcmd.py run -j 16 "$@"; }-
database is locked- Usually temporary when multiple workers/processes access SQLite.
- Retry the command; avoid running multiple
execsessions against the same DB at once.
-
No jobs are executed
- Check queue state:
python3 parallelcmd.py check -l. - If jobs are already completed or marked in-progress, reset them:
python3 parallelcmd.py reset.
- Check queue state:
-
Workers exit before later jobs are appended
- Start
execwith--wait <seconds>so workers keep polling. - Append work with
init -a ...from another process or terminal.
- Start
-
Unexpected shell behavior / quoting issues
- Commands are executed through
bash -c. - Wrap complex commands in quotes and test one command manually before
init.
- Commands are executed through
-
Stop workers based on SLURM/PBS remaining time
- Use
--hook=hooks/my_slurm_hook.py(ormy_pbs_hook.py). - Must be run inside an allocation where
SLURM_JOB_ID/PBS_JOBIDis set.
- Use
-
Some jobs have exit code
124- These jobs were killed by
--timeout. - Reset and retry them:
python3 parallelcmd.py reset --where "Exitval = 124", then re-runexecwith a larger--timeoutor without it.
- These jobs were killed by
-
update --replacedoes not parse as expected- Use exactly one comma-separated pair:
--replace "old,new". - If your text contains commas, run multiple updates with simpler replacement pairs.
- Use exactly one comma-separated pair:
-
Argument file (
::::) seems ignored- Ensure one argument per line.
- Blank lines and lines starting with
#are intentionally skipped.
| Feature | GNU Parallel | parallelcmd |
|---|---|---|
| Input: inline list | ::: |
::: |
| Input: file | :::: |
:::: |
| Input: stdin (auto) | pipe or - |
pipe (auto-detected when no :::) |
| Input: stdin (explicit) | :::: - |
:::: - |
| Input: multiple lists | Cartesian product | Cartesian product |
| Input: linked/paired lists | --link |
--zip |
| Column split | --colsep REGEX |
— |
| Null delimiter | -0 |
— |
| Stop at sentinel | -E VALUE |
— |
| Skip empty lines | --no-run-if-empty |
— |
| Arg substitution: full | {} |
{} |
| Arg substitution: no ext | {.} |
— |
| Arg substitution: basename | {/} |
— |
| Arg substitution: dirname | {//} |
— |
| Arg substitution: job # | {#} |
{#} |
| Arg substitution: slot # | {%} |
{%} |
| Positional substitution | {1}, {2}, … |
{0}, {1}, … |
| Workers | -j N |
-j N |
| Load-based throttle | --load, --noswap, --memfree |
— |
| Nice/priority | --nice |
— |
| Startup delay | --delay SEC |
--delay SEC |
| Progress bar | --progress, --eta, --bar |
--progress, --bar, --eta, --dashboard |
| Job log | --joblog FILE |
SQLite DB (always persisted) |
| Resume incomplete batch | --resume (via joblog) |
re-run exec (auto, SQLite state) |
| Retry failed only | --resume-failed |
reset --nonzero + exec |
| Retry N times | --retries N |
--retries N |
| Skip duplicates | — | --check_dup |
| Output order | -k / --keep-order |
— (streamed as-is) |
| Tag output | --tag, --tagstring |
--tag |
| Save results to dir | --results DIR |
--output-dir DIR |
| Immediate streaming | --ungroup |
always streamed |
| Line buffering | --linebuffer |
— |
| Timeout | --timeout DURATION |
--timeout SEC |
| Exit code for timeout | 124 | 124 |
| Halt on failure | --halt soon/now,fail=N |
--halt N |
| Custom kill signal | --termseq |
— |
| Dry-run | --dry-run |
--dryrun |
| Verbose / print cmd | --verbose |
-v / --verbose |
| Random order | --shuf |
--randomorder |
| Interactive confirm | --interactive |
— |
| Command prefix | -- (shell) |
--prefix CMD |
| SLURM/PBS time-limit hook | — | --hook FILE (hooks/my_slurm_hook.py) |
| Before/after job hooks | — | --hook FILE (on_before_task, on_after_task) |
| Remote execution | --sshlogin, --slf, --trc |
— |
| Distributed file sync | --transfer, --return, --cleanup |
— |
| Pipe/streaming mode | --pipe, --block, --pipepart |
— |
| Semaphore mode | sem / --semaphore |
— |
| tmux integration | --tmux |
— |
| Multiple queues | separate invocations | --db NAME (named SQLite files) |
| Inspect queue | --joblog + external tools |
check, check -l, --where, --like |
| Edit queued commands | — | update --replace |
| Delete specific jobs | — | delete --id, delete --like |
| Reset specific jobs | — | reset --id, reset --where |
| Wait for new jobs | — | --wait SEC (keep workers polling) |
| Max jobs per worker | — | --max_jobs N |
| External dependencies | none (Perl) | none (Python stdlib only) |
| Persistent state | optional (joblog file) | always (SQLite) |
GNU Parallel is broader for one-shot parallel execution — especially argument substitution, remote/distributed runs, pipe streaming, and output formatting. parallelcmd trades those for a persistent job queue with first-class management (inspect, edit, delete, reset by SQL filter) and native SLURM time-limit awareness, making it better suited for long-running experiment pipelines where you need to stop, resume, and selectively retry jobs across sessions.
-
How do I resume after interruption?
- Just run
python3 parallelcmd.py exec -j 4again. - Completed jobs (exit code
0) stay done; pending jobs continue.
- Just run
-
How do I retry only failed jobs?
- Failed jobs are those with non-zero exit values.
- Run
python3 parallelcmd.py reset(default filter resets jobs withExitval <> 0), then runexecagain. - Use
--nonzeroto be explicit:python3 parallelcmd.py reset --nonzero.
-
What does exit code
124mean?- The job was killed by
--timeout. This matches the GNUtimeoutexit code convention. - Reset and rerun:
python3 parallelcmd.py reset --where "Exitval = 124", thenexecwith a longer--timeout.
- The job was killed by
-
Can I have multiple queues?
- Yes. Use different database basenames with
--db. - Example:
python3 parallelcmd.py --db exp1 init ...thenexecusing the same--db.
- Yes. Use different database basenames with
-
Is it safe to run two
execcommands on the same DB?- It is not recommended.
- SQLite coordination can work, but contention/locking increases and behavior is harder to reason about.
-
Can I inspect/edit queued commands before running?
- Inspect:
python3 parallelcmd.py check --list - Bulk edit text:
python3 parallelcmd.py update --replace "old,new" --like "%pattern%" - Remove unwanted rows:
python3 parallelcmd.py delete --id 12 13 14
- Inspect: