Demo2Tutorial: From Human Experience to Multimodal Software Tutorials
Zechen Bai,
Zhiheng Chen,
Yiqi Lin,
Kevin Qinghong Lin,
Difei Gao,
Xiangwu Guo,
Xin Wang,
Mike Zheng Shou
Show Lab, National University of Singapore
Paper | PyPI | Quick Start
ShowHow is the minimal public release of Demo2Tutorial: a lightweight tool for recording desktop workflows and turning them into polished step-by-step multimodal tutorials.
Raw screen recordings are useful demonstrations, but they are often long, passive, and difficult to follow. ShowHow turns a recorded desktop workflow into a structured tutorial with concise instructions, selected keyframes, and visual guidance.
This repository keeps the paper-facing identity as Demo2Tutorial, while the public software tool and package are named ShowHow:
- repository:
Demo2Tutorial - Python package:
showhow - local tool / product name:
ShowHow
- Browser-based recording UI
- Local macOS recorder for desktop workflows
- Multimodal tutorial generation from recorded sessions
- Editable HTML/PDF export for generated tutorials
- PyPI package for simple installation
- Windows support after a dedicated public validation pass
- Robust multi-monitor support after a dedicated validation pass
- MCP integration as a polished advanced workflow
- Skill-oriented integrations and broader agent-facing tooling
pip install showhow
python -m showhow.cli web --host 127.0.0.1 --port 18090Then open (usually the browser will automatically open):
http://127.0.0.1:18090
git clone https://github.com/showlab/Demo2Tutorial.git
cd Demo2Tutorial
python3 -m venv .venv
source .venv/bin/activate
pip install --upgrade pip
pip install -e .
python -m showhow.cli web --host 127.0.0.1 --port 18090- Python 3.10+
ffmpeg- OpenAI API access for generation
- macOS (Windows compatibility is not strictly tested)
- This release is currently recommended for single-display use only.
- If multiple monitors are connected, recording, action capture, and frame alignment may be unreliable.
- For the best results, disconnect external monitors and perform the task on one screen.
You can either:
- enter the API key directly in the web UI (still only exists locally)
- or export it in your shell
export OPENAI_API_KEY=your_key_hereFor recording on macOS, you may need to grant these permission to Terminal or iTerm2 if necessary:
- Screen Recording
- Accessibility
- Input Monitoring
If the recorder does not work as expected, run:
python -m showhow.cli doctorThe recommended workflow is the web UI:
python -m showhow.cli webThen:
- open the local page in your browser
- enter the API key if needed
- start recording
- perform the task
- stop recording
- generate the tutorial
python -m showhow.cli record --topic "demo_flow" --generate --model gpt-4opython -m showhow.cli start --topic "demo_flow"
python -m showhow.cli rec-status
python -m showhow.cli stop
python -m showhow.cli generate --session-id <SESSION_ID>By default, recordings are saved under:
~/Downloads/record_save
Each session may produce artifacts such as:
events.jsonlmetadata.json- session video
- parsed trace
- tutorial draft
- rendered tutorial assets
tutorial.html
Install ffmpeg and ensure it is available on your PATH.
Run:
python -m showhow.cli doctorThen check:
- OS permissions
- recorder host/port availability
ffmpegavailability
Make sure OPENAI_API_KEY is set correctly, or enter it in the web UI.
Check whether the recording session produced:
- a valid event log
- a valid video file
- a valid session directory under the record root
- Currently optimized for local single-user usage
- Default generation depends on API-backed captioning and planning
- Desktop recording behavior depends on OS permissions
- Some advanced composition features may require optional dependencies
If you find this project useful, please cite the paper:
@inproceedings{bai2026demo2tutorial,
title={Demo2Tutorial: From Human Experience to Multimodal Software Tutorials},
author={Bai, Zechen and Chen, Zhiheng and Lin, Yiqi and Lin, Kevin Qinghong and Gao, Difei and Guo, Xiangwu and Wang, Xin and Shou, Mike Zheng},
booktitle={Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition},
pages={29588--29597},
year={2026}
}This project is released under the MIT License. See LICENSE.



