Skip to content

OpenTechIL/markitdown-for-ai

Repository files navigation

MarkItDown4AI

License Docker Build Latest Tag Release Image Size

Convert PDF, DOCX, PPTX, XLSX, and HTML to Markdown — in one command, zero Python required.

Built on Microsoft's MarkItDown. Packaged as a Docker/Podman container and distributed as an AI agent skill that works with Claude Code, OpenCode, Codex, Cursor, Windsurf, and 40+ more.

docker run --rm -i ghcr.io/opentechil/markitdown-for-ai < report.pdf

Why use this?

Most AI agents can't read binary documents. They need plain text. MarkItDown4AI solves this by giving every agent — and every pipeline — a single, consistent way to extract structured Markdown from any document format.

  • No Python, no installs — runs entirely in Docker or Podman
  • Preserves structure — tables, headings, lists, and formatting survive conversion
  • AI-native — install once as a skill; every supported agent automatically knows how to use it
  • CI/CD ready — pipe it into any shell script, GitHub Action, or automation workflow
  • Multi-arch — native amd64 and arm64 images (Apple Silicon, AWS Graviton, x86 servers)

Quick Start

One-liner

docker run --rm -i ghcr.io/opentechil/markitdown-for-ai < document.pdf

Podman works as a drop-in replacement:

podman run --rm -i ghcr.io/opentechil/markitdown-for-ai < document.pdf

AI Agent Skills

Install the document-to-markdown skill so your AI agent automatically knows how to convert documents whenever you ask.

Via npx (recommended — works with all supported agents)

npx skills add OpenTechIL/markitdown-for-ai

Detects your agent automatically and installs to the correct location.

Claude Code

# Install globally for Claude Code
bash <(curl -fsSL https://raw.githubusercontent.com/OpenTechIL/markitdown-for-ai/main/install-skill.sh) --ai claude

Installs to ~/.claude/skills/document-to-markdown/.

Then in any Claude Code session, just say:

"Summarize this PDF" — and Claude will automatically convert and read it.

OpenCode

# Install globally for OpenCode
bash <(curl -fsSL https://raw.githubusercontent.com/OpenTechIL/markitdown-for-ai/main/install-skill.sh) --ai opencode

# Or install to a specific project
bash <(curl -fsSL https://raw.githubusercontent.com/OpenTechIL/markitdown-for-ai/main/install-skill.sh) --local

Global installs to ~/.config/opencode/skills/document-to-markdown/.
Local installs to .opencode/skills/document-to-markdown/ in the current project.

All locations at once

bash <(curl -fsSL https://raw.githubusercontent.com/OpenTechIL/markitdown-for-ai/main/install-skill.sh)

Installs to every supported location:

  • ~/.config/opencode/skills/ (OpenCode global)
  • ~/.claude/skills/ (Claude Code)
  • ~/.agents/skills/ (Codex / shared agents)

Supported Formats

Format Extension
PDF .pdf
Word .docx
PowerPoint .pptx
Excel .xlsx
HTML .html

Usage Examples

Pipe a file

cat report.docx | docker run --rm -i ghcr.io/opentechil/markitdown-for-ai

Mount and convert by filename

docker run --rm -v "$(pwd):/data" -w /data ghcr.io/opentechil/markitdown-for-ai slides.pptx

Save output to a file

docker run --rm -i ghcr.io/opentechil/markitdown-for-ai < input.xlsx > output.md

Batch convert a directory

for f in *.pdf; do
  docker run --rm -i ghcr.io/opentechil/markitdown-for-ai < "$f" > "${f%.pdf}.md"
done

Scrape an HTML page

curl -s "https://example.com" | docker run --rm -i ghcr.io/opentechil/markitdown-for-ai

RAG / embedding pipeline

docker run --rm -i ghcr.io/opentechil/markitdown-for-ai < document.pdf | my-embed-cli ingest

Multi-Platform Support

Architecture Targets
amd64 x86_64 servers, most desktops
arm64 Apple Silicon, AWS Graviton, ARM servers

Runs on Linux, macOS (Docker Desktop or Podman), and Windows (Docker Desktop).


Security

  • Runs as a non-root user (appuser) inside the container
  • Multi-stage build — no build tools in the runtime image
  • No network access during document conversion
  • Self-contained: only the MarkItDown library and its declared extras (pdf, docx, pptx, xlsx)

Development

Build locally

docker build -t markitdown-for-ai .

Test locally

docker run --rm -i markitdown-for-ai < test.pdf

CI/CD

GitHub Actions builds and publishes multi-arch images to GHCR on every push to main. Releases are tagged automatically.


Contributing

Contributions are welcome. Please follow Conventional Commits and update CHANGELOG.md under [Unreleased] with every change. See AGENTS.md for full contributor guidance.


License

Apache License 2.0 — see LICENSE for details.

About

MarkItDown is a Python tool that converts various document formats to Markdown. This Docker image provides a portable, easy-to-use containerized version that works out of the box.

Topics

Resources

License

Stars

Watchers

Forks

Packages

 
 
 

Contributors