PDF-Helper

A simple python package that helps with doing simple stuff with PDFs.

Features

If you want any other feature to be added, feel free to open an issue or fork the repo and make a pull request after adding your contribution.

Usage

Installation

You can install PDF-Helper via pip:

pip install pdf-helper

# Or use uv to install the tool
uv tool install pdf-helper

And run it using the command line:

pdf-helper <command> [options]

Or you can use uvx to run the package without installing it in a specific python environment:

uvx pdf-helper <command> [options]

You can also clone the repository and use uv run:

git clone https://github.com/MPCodeWriter21/PDF-Helper.git
cd PDF-Helper
uv run pdf-helper <command> [options]

Bundle PDFs

Bundle multiple files into one PDF:

pdf-helper bundle <input_file_1> <input_file_2>... <input_file_n> <output_file>

# E.g. Bundle PDFs 1, 2 and 3 into a new PDF
pdf-helper bundle 1.pdf 2.pdf 3.pdf new.pdf

# E.g. Take 1.png, 2.jpg, and 3.png and create a PDF named 123.pdf and override
# if already exists
pdf-helper bundle 1.png 2.jpg 3.png 123.pdf -f

# E.g. Take part1.pdf, image1.png, ending.pdf and bundle them into a PDF named final.pdf
pdf-helper bundle part1.pdf image1.png ending.pdf final.pdf -v

Split PDFs

Split a PDF into multiple PDFs, each containing a range of pages:

pdf-helper split <input_file> <output_folder> -s <split_point_1>,<split_point_2>

# E.g. Split a PDF into three PDFs, one with pages 1-10, the second with pages 11-20 and
# the third with pages 21-end
pdf-helper split my-pdf.pdf my-split-pdfs -s 10,20

# E.g. Split a PDF into PDFs each containing one page
pdf-helper split my-pdf.pdf my-split-pdfs  # No need to specify split points

Export PDF pages as image files

Export PDF pages as image files:

pdf-helper to-image <input_file> <output_folder> \
        -p <page_number_1>,<page_number_2>,...,<page_number_n> -s <scale_factor>

# E.g. Export pages 1, 2, 3 and 6 from a PDF with scale factor 1
pdf-helper to-image 1.pdf images -p 1-3,6 -s 1

# E.g. Export all pages from a PDF with scale 2
pdf-helper to-image my-pdf.pdf my-images

Remove pages from a PDF

Remove pages from a PDF:

pdf-helper remove-pages <input_file> <output_file> <page_number_1>,<page_number_2>,...,<page_number_n>

# E.g. Remove pages 1, 2, 3 and 6 from a PDF
pdf-helper remove-pages 1.pdf new.pdf 1-3,6

Export text from a PDF

To extract text from a PDF file and export them to text files you can do as follows:

pdf-helper extract-text <input_file> -o <output_file_name>

# E.g. Extract text from a PDF named my-pdf.pdf and save it to my-text.txt
pdf-helper extract-text my-pdf.pdf -o my-text.txt

Run Recipes

The recipe system lets you chain multiple PDF operations together in a single run using a YAML file. This unlocks features not available through individual CLI commands (e.g. selecting specific pages per file when bundling).

pdf-helper run-recipe <recipe_file.yaml>

# E.g. Run a simple recipe
pdf-helper run-recipe remove-pages.yaml

# E.g. Run with force overwrite and verbose logging
pdf-helper run-recipe bundle-workflow.yaml --force --verbose

Recipe File Format

A recipe is a YAML file with a steps list. Each step has an id, an operation, input/output paths, and operation-specific options. Steps can reference each other's outputs using { step: step_id }.

name: "Remove specific pages"
description: "Removes pages 2, 4, 6 from a PDF."
version: "1.0"

steps:
  - id: clean
    operation: remove_pages
    input: document.pdf
    pages_to_remove: [2, 4, 6]
    output: cleaned.pdf

Supported Operations

Operation	Status	Description
`bundle`	Available	Bundle files with optional per-file page selection
`remove_pages`	Available	Remove pages by 1-based index
`split_pdf`	Available	Split at given page boundaries
`pdf_to_image`	Available	Render pages as PNG images
`extract_text`	Available	Extract text content
`watermark`	Planned	Add text watermark (graceful fallback)
`encrypt`	Planned	Password-protect PDF (graceful fallback)
`metadata`	Planned	Set title/author/keywords (graceful fallback)

Operations marked Planned are not yet implemented — the recipe runner logs a warning and copies the input file through, so pipelines don't break.

Advanced Example: Multi-step Pipeline

# yaml-language-server: $schema=https://raw.githubusercontent.com/MPCodeWriter21/PDF-Helper/master/schemas/recipe-schema.json
name: "Split, Convert, and Extract Pipeline"
version: "1.0"

settings:
  temp_dir: "./.recipe-tmp"

steps:
  # Step 1: Split the PDF at pages 5 and 10
  - id: split
    operation: split_pdf
    input: report.pdf
    split_points: [5, 10]
    output_dir: .
    output_prefix: "report_part_"

  # Step 2: Convert the second chunk to images
  - id: to_images
    operation: pdf_to_image
    input:
      step: split
      file: report_part_2.pdf
    pages: "1-3"
    scale: 3
    output: ./output/images

  # Step 3: Extract text from the first chunk
  - id: extract
    operation: extract_text
    input:
      step: split
      file: report_part_1.pdf
    pages: "1-4"
    max_characters: 5000
    reverse_lines: true
    output: ./output/chapter-1-text.txt

Recipe Settings

Setting	Default	Description
`temp_dir`	`./.recipe-tmp`	Directory for intermediate files
`overwrite`	`false`	Overwrite existing output files
`cleanup_temp`	`false`	Remove temp directory after completion

Input values (e.g. passwords) can be sourced from environment variables or prompted at runtime:

inputs:
  password:
    env: PDF_PASSWORD
    prompt: "Enter output PDF password"

See examples/recipes/ for more example recipe files.

About

Author: CodeWriter21

GitHub: MPCodeWriter21/PDF-Helper

Donations

Your donations are very welcome: nowpayments.io

You can also consider donating a Star to the repo.

License

This project is licensed under the MIT License.

See the LICENSE

Name		Name	Last commit message	Last commit date
Latest commit History 50 Commits
.github		.github
examples/recipes		examples/recipes
schemas		schemas
src/pdf_helper		src/pdf_helper
tests		tests
.gitignore		.gitignore
.markdownlint.json		.markdownlint.json
.pre-commit-config.yaml		.pre-commit-config.yaml
CHANGELOG.md		CHANGELOG.md
LICENSE		LICENSE
README.md		README.md
pyproject.toml		pyproject.toml

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

PDF-Helper

Features

Usage

Installation

Bundle PDFs

Split PDFs

Export PDF pages as image files

Remove pages from a PDF

Export text from a PDF

Run Recipes

Recipe File Format

Supported Operations

Advanced Example: Multi-step Pipeline

Recipe Settings

About

Donations

License

References

About

Uh oh!

Releases

Packages

Uh oh!

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

PDF-Helper

Features

Usage

Installation

Bundle PDFs

Split PDFs

Export PDF pages as image files

Remove pages from a PDF

Export text from a PDF

Run Recipes

Recipe File Format

Supported Operations

Advanced Example: Multi-step Pipeline

Recipe Settings

About

Donations

License

References

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Uh oh!

Contributors

Uh oh!

Languages

Packages