parallelcmd

A lightweight Python CLI for queueing and executing shell commands in parallel. Inspired by GNU Parallel, parallelcmd provides:

Command generation from argument combinations
Concurrent execution with live output and progress tracking
Job management: inspect, reset, delete, and update queued jobs
Flexible workflows: resume and scale workers on demand

vs GNU Parallel

	GNU Parallel	parallelcmd
State	Stateless¹ (fire and forget)	Stateful (SQLite queue persists)
Resume	Manual (`--joblog` + `--resume`)	Automatic (re-run `exec`)
Job management	Limited (joblog file)	First-class (`check`, `reset`, `delete`, `update`)
Dependency	Perl	Python stdlib only

Use GNU Parallel for one-shot parallel runs. Use parallelcmd when you need to stop, resume, inspect, and selectively retry jobs across sessions — especially for long ML or HPC experiment sweeps.

¹ GNU Parallel can be made stateful via --sqlmaster / --sqlworker, but requires installing the Perl DBD::SQLite module separately.

Installation

Download the single script and make it executable — no pip or dependencies required.

# wget
wget https://raw.githubusercontent.com/AI-ModCon/parallelcmd/main/parallelcmd.py
chmod +x parallelcmd.py

# curl
curl -O https://raw.githubusercontent.com/AI-ModCon/parallelcmd/main/parallelcmd.py
chmod +x parallelcmd.py

You can then run it directly:

./parallelcmd.py --help

Or place it somewhere on your PATH (e.g. ~/.local/bin/) to use it as parallelcmd.py from any directory.

Requirements

Python 3.8+
Standard library only (no external Python dependencies)

Quick start

python3 parallelcmd.py --help

Create a job database:

python3 parallelcmd.py init "echo {}" ::: a b c

Run queued jobs with 4 workers:

python3 parallelcmd.py exec -j 4

Or do both in one command:

python3 parallelcmd.py run -j 4 "echo {}" ::: a b c

If you omit the subcommand entirely, parallelcmd.py defaults to run:

python3 parallelcmd.py -j 4 "echo {}" ::: a b c

Check status:

python3 parallelcmd.py check

Command model

init builds commands and stores them in pardb.sqlite by default. Use --db <name> to target <name>.sqlite, or set the PARDB environment variable.

::: starts an inline argument list.
:::: starts an argument list loaded from a file (one value per line; empty lines and # comments are ignored).
:::: - reads the argument list from stdin.
Multiple lists are combined with Cartesian product.
If the command has no {} placeholders, placeholders are appended automatically.
If no ::: or :::: separator is given and stdin is a pipe, stdin lines are used as the argument list automatically.

Placeholders

Placeholder	Meaning
`{}`	Current argument (positional, auto-assigned left to right)
`{0}`, `{1}`, …	Explicit positional argument from the Nth `:::` / `::::` list (0-indexed)
`{%}`	Worker slot number (0-indexed, stable for the lifetime of `exec`)
`{#}`	Job sequence number (the DB `Seq` of the job being run)

{} and {0} / {1} can be mixed freely. {%} and {#} are substituted at run time, not at init time, so the stored command retains the literal placeholder.

Example — Cartesian product with implicit placeholders:

python3 parallelcmd.py init "python train.py --lr {} --seed {}" ::: 1e-3 1e-4 ::: 1 2 3

This creates 6 jobs.

Example — explicit positional placeholders (same result, order made explicit):

python3 parallelcmd.py init "python train.py --lr {0} --seed {1}" ::: 1e-3 1e-4 ::: 1 2 3

Example — GPU assignment via worker slot:

python3 parallelcmd.py run -j 4 "CUDA_VISIBLE_DEVICES={%} python train.py --lr {}" ::: 1e-3 1e-4 1e-5 1e-6

Each worker holds a fixed slot (0–3), so all its jobs run on the same GPU.

Example — --zip to pair lists element-by-element instead of Cartesian product:

# Without --zip: 4 jobs (a+x, a+y, b+x, b+y)
python3 parallelcmd.py init "cmd {0} {1}" ::: a b ::: x y

# With --zip: 2 jobs (a+x, b+y)
python3 parallelcmd.py init --zip "cmd {0} {1}" ::: a b ::: x y

Stops at the shortest list when lengths differ.

Subcommands

`init`

Initialize the job queue, or append to an existing one.

python3 parallelcmd.py init [options] <command ...> [ ::: <args ...> ]* [ :::: <argfile ...> ]*

Options:

-a, --append append to existing table instead of recreating
-f, --force drop the existing parjob table and recreate it
--check_dup skip commands that already exist
--zip pair argument lists element-by-element instead of Cartesian product
-v, --verbose

`exec`

Execute queued jobs in parallel.

python3 parallelcmd.py exec [options]

Options:

-j, --nworkers <n> number of workers (default: 4)
--id <id ...> run only these specific job IDs
--progress show aggregate progress line
--bar show a visual ASCII progress bar (alternative to --progress)
--eta append estimated time remaining to the progress or bar line
--dashboard compact live dashboard mode
--dryrun print commands without running; jobs are marked done (exit 0), so run reset --all before a real run
-v, --verbose
--timeskip <sec> throttle displayed output updates
--randomorder fetch pending jobs in random order
--prefix <cmd> prefix each command; supports shell env var assignments (example: srun -N1 -n1, NP=8)
--max_jobs <n> max jobs per worker
--delay <sec> sleep this many seconds before starting each job (default: 0); also used as the upper bound for the initial per-worker random stagger
--wait <sec> when no job is available, wait this many seconds and retry instead of exiting (useful when another process is still adding jobs)
--timeout <sec> kill a task and move to the next if it runs longer than this many seconds; timed-out jobs are recorded with exit code 124
--retries <n> retry a failed job up to N times before marking it failed (default: 0; timed-out jobs are never retried)
--halt <n> stop queuing new jobs after N failures; already-running jobs complete normally
--output-dir <dir> save each job's stdout (and stderr) to <dir>/<seq>.out
--quiet suppress per-job output lines (useful with --progress or --bar)
--tag prefix each output line with the full command instead of the seq ID
--hook <file> Python plugin file; see Hooks below

`run`

Initialize and execute in one step (init + exec).

python3 parallelcmd.py run [options] <command ...> [ ::: <args ...> ]* [ :::: <argfile ...> ]*

Common options include:

init side: --append, -f/--force, --check_dup, --zip
exec side: -j/--nworkers, --id, --progress, --bar, --eta, --dashboard, --dryrun, --randomorder, --prefix, --max_jobs, --delay, --wait, --timeout, --retries, --halt, --output-dir, --quiet, --tag, --hook

`check`

Inspect queue summary or list all rows.

python3 parallelcmd.py check [options]

Options:

-l, --list list all matching rows instead of the summary
--nonzero filter to only jobs with non-zero exit value
--running filter to only currently running jobs
--where <sql> arbitrary SQL WHERE clause
--like <pattern> filter by Command LIKE <pattern>
--id <id ...> filter by specific job IDs

`reset`

Reset selected jobs to pending (Starttime, JobRuntime, Exitval set to NULL).

python3 parallelcmd.py reset [--all | --nonzero | --like <pattern> | --id <id ...> | --where <sql>]

Options:

-a, --all reset all jobs
--nonzero reset only jobs with non-zero exit value
--where <sql> arbitrary SQL WHERE clause
--like <pattern> filter by Command LIKE <pattern>
--id <id ...> filter by specific job IDs
-y, --yes skip confirmation prompt

Prompts for confirmation before resetting rows (skipped with -y).

`delete`

Delete selected jobs.

python3 parallelcmd.py delete [options]

Options:

-a, --all delete all jobs
--like <pattern> filter by SQL LIKE pattern on command text
--id <id ...> filter by job ID(s)
-y, --yes skip confirmation prompt

Prompts for confirmation before deleting rows (skipped with -y).

`update`

Find/replace command text for selected jobs.

python3 parallelcmd.py update [options]

Options:

--replace "old,new" find and replace text pair (comma-separated)
--like <pattern> filter by SQL LIKE pattern on command text
--id <id ...> filter by job ID(s)
-y, --yes skip confirmation prompt

Prompts for confirmation before updating rows (skipped with -y).

Note: if the replacement text starts with --, use the = form to prevent argparse from treating it as a flag:
python3 parallelcmd.py update --replace='--old-flag,--new-flag'

`diagnose`

Inspect the health of the SQLite database — useful when jobs appear stuck or the DB seems unresponsive.

python3 parallelcmd.py diagnose [--stale SECONDS]

Reports:

Job counts by state (pending / running / success / failed / error)
In-progress jobs (Exitval = -1000) with age in seconds; flags any older than --stale (default: 3600) as potentially stale
DB file sizes (.sqlite, .sqlite-wal, .sqlite-shm); warns if WAL exceeds 10 MB
Exclusive lock probe — attempts BEGIN EXCLUSIVE with a 2-second timeout
Open file handles via lsof

To recover stale in-progress jobs after a crash:

python3 parallelcmd.py reset --where "Exitval = -1000" -y

Hooks

--hook <file> loads a Python file that can inspect each job before and/or after it runs. Define either or both functions:

def on_before_task(taskid, cmd):
    # called after --delay sleep, before the subprocess launches
    # return False → requeue this job to pending and stop this worker
    return True

def on_after_task(taskid, cmd, exitval, runtime):
    # called after the exit value is written to the DB
    # return False → stop this worker (other workers keep running)
    return True

Either function can be omitted — only the defined ones are called.
Exceptions inside a hook are logged and treated as True (continue).
Returning False stops only the calling worker; other workers are unaffected.

Example hook files are in the hooks/ directory:

File	Purpose
`hooks/my_slurm_hook.py`	Stop workers when SLURM remaining time drops below 1 hour
`hooks/my_pbs_hook.py`	Same for PBS/Torque (`qstat`)

python3 parallelcmd.py exec -j 4 --hook=hooks/my_slurm_hook.py

Edit CHECK_TIMELEFT at the top of the hook file to adjust the threshold.

Global options

--db <name> SQLite DB basename; the file on disk is <name>.sqlite
--db_retries <n> max retries when SQLite is locked (default: 10)
--log_level {debug,info} logging level (default: info)

Useful examples

Pipe arguments from stdin (auto-detected when no ::: or :::: is given):

cat cases.txt | python3 parallelcmd.py -j 4 "bash run.sh {}"
seq 10 | python3 parallelcmd.py "echo {}"

Pipe stdin explicitly with :::: - (combinable with other arg lists):

cat cases.txt | python3 parallelcmd.py run "bash run.sh {} {}" :::: - ::: seed1 seed2

Run scripts from values in a file:

python3 parallelcmd.py init "bash run_case.sh {}" :::: cases.txt
python3 parallelcmd.py exec -j 8

Use a custom DB file:

python3 parallelcmd.py --db jobs init "echo {}" ::: x y z
python3 parallelcmd.py --db jobs exec -j 2

Kill tasks that exceed a time limit and continue to the next job:

python3 parallelcmd.py exec -j 4 --timeout 300

Timed-out jobs are recorded with exit code 124. Find them with:

python3 parallelcmd.py check -l --where "Exitval = 124"

Reset timed-out jobs to retry with a longer timeout:

python3 parallelcmd.py reset --where "Exitval = 124"
python3 parallelcmd.py exec -j 4 --timeout 600

Keep workers alive while another process appends jobs later:

python3 parallelcmd.py exec -j 4 --wait 10
python3 parallelcmd.py init -a "echo {}" ::: later1 later2

Retry failed jobs only:

python3 parallelcmd.py reset
python3 parallelcmd.py exec -j 4

Overwrite the queue with a new set of jobs (drop and recreate):

python3 parallelcmd.py init -f "echo {}" ::: x y z
python3 parallelcmd.py exec -j 4

Notes

Job output is streamed to stdout while running.
Queue state is persisted in SQLite, so you can stop and resume workflows.
reset, delete, and update prompt for confirmation by default; pass -y to skip.
With --wait, workers poll for newly appended jobs instead of exiting as soon as the queue is empty.

Aliases

Add these to ~/.bashrc or ~/.zshrc to avoid typing the full command each time. Assumes parallelcmd.py is on your PATH.

# parallelcmd aliases
alias pc='parallelcmd.py'

# init
alias pci='parallelcmd.py init'
alias pcia='parallelcmd.py init --append'
alias pcif='parallelcmd.py init --force'

# exec
alias pce='parallelcmd.py exec'
alias pcer='parallelcmd.py exec --randomorder'
alias pcep='parallelcmd.py exec --progress'

# check
alias pck='parallelcmd.py check'
alias pckl='parallelcmd.py check -l'
alias pckf='parallelcmd.py check -l --nonzero'

# reset / delete / update
alias pcr='parallelcmd.py reset'
alias pcra='parallelcmd.py reset --all'
alias pcrf='parallelcmd.py reset --nonzero'
alias pcd='parallelcmd.py delete'
alias pcda='parallelcmd.py delete --all'
alias pcu='parallelcmd.py update'

# reset timed-out jobs
alias pctimeout='parallelcmd.py reset --where "Exitval = 124"'

# exec with N workers and progress  (usage: pcej 8)
pcej() { parallelcmd.py exec -j "$@"; }

# run (init + exec) with common worker counts and progress
pcj4()  { parallelcmd.py run -j 4 "$@"; }
pcj8()  { parallelcmd.py run -j 8 "$@"; }
pcj16() { parallelcmd.py run -j 16 "$@"; }

Troubleshooting

database is locked
- Usually temporary when multiple workers/processes access SQLite.
- Retry the command; avoid running multiple exec sessions against the same DB at once.
No jobs are executed
- Check queue state: python3 parallelcmd.py check -l.
- If jobs are already completed or marked in-progress, reset them: python3 parallelcmd.py reset.
Workers exit before later jobs are appended
- Start exec with --wait <seconds> so workers keep polling.
- Append work with init -a ... from another process or terminal.
Unexpected shell behavior / quoting issues
- Commands are executed through bash -c.
- Wrap complex commands in quotes and test one command manually before init.
Stop workers based on SLURM/PBS remaining time
- Use --hook=hooks/my_slurm_hook.py (or my_pbs_hook.py).
- Must be run inside an allocation where SLURM_JOB_ID / PBS_JOBID is set.
Some jobs have exit code 124
- These jobs were killed by --timeout.
- Reset and retry them: python3 parallelcmd.py reset --where "Exitval = 124", then re-run exec with a larger --timeout or without it.
update --replace does not parse as expected
- Use exactly one comma-separated pair: --replace "old,new".
- If your text contains commas, run multiple updates with simpler replacement pairs.
Argument file (::::) seems ignored
- Ensure one argument per line.
- Blank lines and lines starting with # are intentionally skipped.

Comparison with GNU Parallel

Feature	GNU Parallel	parallelcmd
Input: inline list	`:::`	`:::`
Input: file	`::::`	`::::`
Input: stdin (auto)	pipe or `-`	pipe (auto-detected when no `:::`)
Input: stdin (explicit)	`:::: -`	`:::: -`
Input: multiple lists	Cartesian product	Cartesian product
Input: linked/paired lists	`--link`	`--zip`
Column split	`--colsep REGEX`	—
Null delimiter	`-0`	—
Stop at sentinel	`-E VALUE`	—
Skip empty lines	`--no-run-if-empty`	—
Arg substitution: full	`{}`	`{}`
Arg substitution: no ext	`{.}`	—
Arg substitution: basename	`{/}`	—
Arg substitution: dirname	`{//}`	—
Arg substitution: job #	`{#}`	`{#}`
Arg substitution: slot #	`{%}`	`{%}`
Positional substitution	`{1}`, `{2}`, …	`{0}`, `{1}`, …
Workers	`-j N`	`-j N`
Load-based throttle	`--load`, `--noswap`, `--memfree`	—
Nice/priority	`--nice`	—
Startup delay	`--delay SEC`	`--delay SEC`
Progress bar	`--progress`, `--eta`, `--bar`	`--progress`, `--bar`, `--eta`, `--dashboard`
Job log	`--joblog FILE`	SQLite DB (always persisted)
Resume incomplete batch	`--resume` (via joblog)	re-run `exec` (auto, SQLite state)
Retry failed only	`--resume-failed`	`reset --nonzero` + `exec`
Retry N times	`--retries N`	`--retries N`
Skip duplicates	—	`--check_dup`
Output order	`-k` / `--keep-order`	— (streamed as-is)
Tag output	`--tag`, `--tagstring`	`--tag`
Save results to dir	`--results DIR`	`--output-dir DIR`
Immediate streaming	`--ungroup`	always streamed
Line buffering	`--linebuffer`	—
Timeout	`--timeout DURATION`	`--timeout SEC`
Exit code for timeout	124	124
Halt on failure	`--halt soon/now,fail=N`	`--halt N`
Custom kill signal	`--termseq`	—
Dry-run	`--dry-run`	`--dryrun`
Verbose / print cmd	`--verbose`	`-v` / `--verbose`
Random order	`--shuf`	`--randomorder`
Interactive confirm	`--interactive`	—
Command prefix	`--` (shell)	`--prefix CMD`
SLURM/PBS time-limit hook	—	`--hook FILE` (`hooks/my_slurm_hook.py`)
Before/after job hooks	—	`--hook FILE` (`on_before_task`, `on_after_task`)
Remote execution	`--sshlogin`, `--slf`, `--trc`	—
Distributed file sync	`--transfer`, `--return`, `--cleanup`	—
Pipe/streaming mode	`--pipe`, `--block`, `--pipepart`	—
Semaphore mode	`sem` / `--semaphore`	—
tmux integration	`--tmux`	—
Multiple queues	separate invocations	`--db NAME` (named SQLite files)
Inspect queue	`--joblog` + external tools	`check`, `check -l`, `--where`, `--like`
Edit queued commands	—	`update --replace`
Delete specific jobs	—	`delete --id`, `delete --like`
Reset specific jobs	—	`reset --id`, `reset --where`
Wait for new jobs	—	`--wait SEC` (keep workers polling)
Max jobs per worker	—	`--max_jobs N`
External dependencies	none (Perl)	none (Python stdlib only)
Persistent state	optional (joblog file)	always (SQLite)

GNU Parallel is broader for one-shot parallel execution — especially argument substitution, remote/distributed runs, pipe streaming, and output formatting. parallelcmd trades those for a persistent job queue with first-class management (inspect, edit, delete, reset by SQL filter) and native SLURM time-limit awareness, making it better suited for long-running experiment pipelines where you need to stop, resume, and selectively retry jobs across sessions.

FAQ

How do I resume after interruption?
- Just run python3 parallelcmd.py exec -j 4 again.
- Completed jobs (exit code 0) stay done; pending jobs continue.
How do I retry only failed jobs?
- Failed jobs are those with non-zero exit values.
- Run python3 parallelcmd.py reset (default filter resets jobs with Exitval <> 0), then run exec again.
- Use --nonzero to be explicit: python3 parallelcmd.py reset --nonzero.
What does exit code 124 mean?
- The job was killed by --timeout. This matches the GNU timeout exit code convention.
- Reset and rerun: python3 parallelcmd.py reset --where "Exitval = 124", then exec with a longer --timeout.
Can I have multiple queues?
- Yes. Use different database basenames with --db.
- Example: python3 parallelcmd.py --db exp1 init ... then exec using the same --db.
Is it safe to run two exec commands on the same DB?
- It is not recommended.
- SQLite coordination can work, but contention/locking increases and behavior is harder to reason about.
Can I inspect/edit queued commands before running?
- Inspect: python3 parallelcmd.py check --list
- Bulk edit text: python3 parallelcmd.py update --replace "old,new" --like "%pattern%"
- Remove unwanted rows: python3 parallelcmd.py delete --id 12 13 14

Name		Name	Last commit message	Last commit date
Latest commit History 75 Commits
hooks		hooks
README.md		README.md
parallelcmd.py		parallelcmd.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

parallelcmd

vs GNU Parallel

Installation

Requirements

Quick start

Command model

Placeholders

Subcommands

`init`

`exec`

`run`

`check`

`reset`

`delete`

`update`

`diagnose`

Hooks

Global options

Useful examples

Notes

Aliases

Troubleshooting

Comparison with GNU Parallel

FAQ

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

parallelcmd

vs GNU Parallel

Installation

Requirements

Quick start

Command model

Placeholders

Subcommands

init

exec

run

check

reset

delete

update

diagnose

Hooks

Global options

Useful examples

Notes

Aliases

Troubleshooting

Comparison with GNU Parallel

FAQ

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

`init`

`exec`

`run`

`check`

`reset`

`delete`

`update`

`diagnose`

Packages