Daily news transcript → podcast pipeline. Generates an MP3 from a text transcript using a Samantha voice clone (Qwen3-TTS via mlx-audio), updates an RSS 2.0 / Podcasting 2.0 feed, syncs to Cloudflare R2, and sends a Telegram status message.
uv sync # one-time: install deps
mycast new-podcast "My Daily News" # one-time: create output/feed.xml
# drop a transcript at ./incoming/2026-04-29.txt
uv run mycast run # tts -> feed -> sync -> notifyFor one-time setup (Python env, mlx-audio, ffmpeg, rclone, direnv, optional auto-watcher), see SETUP.md.
incoming/<date>.txt ──tts──▶ output/<date>.mp3
output/<date>.txt (copied)
output/<date>.vtt ──┐
├──feed──▶ output/feed.xml
│
output/* ──sync──▶ R2
──notify──▶ Telegram
Each step is independently runnable and idempotent.
All commands run via uv run mycast <command>. A summary:
| Command | Purpose |
|---|---|
run [--all] [--force] |
Full pipeline: tts → feed → sync → notify (default: only the latest incoming file) |
tts [files...] [--all] [--force] |
Generate MP3 audio from incoming/*.txt (default: only the latest, skips already-processed) |
feed |
Refresh output/feed.xml with every output/*.mp3 |
sync [--dry-run] |
Push ./output/ to R2 via rclone copy |
notify <message> |
Send a one-off Telegram message |
new-podcast <title> [-d ...] [-o ...] [--force] |
Create a new feed.xml template |
Add -v / --verbose for debug logging. Logs always go to logs/mycast.log.
uv run mycast run # only the latest incoming/*.txt
uv run mycast run --all # every incoming/*.txt (skips already-processed)
uv run mycast run --force # reprocess the latest even if its mp3 exists
uv run mycast run --all --force # reprocess everything- tts: By default, looks at the lexicographically-last
incoming/*.txt(which, given theYYYY-MM-DDnaming, is the newest date), generatesoutput/<stem>.mp3if it doesn't already exist, copies the transcript tooutput/<stem>.txt.--allprocesses every file inincoming/;--forcere-runs even if the output mp3 already exists. - feed: Rebuilds RSS items in
output/feed.xmlfor everyoutput/*.mp3. Same-date entries are replaced, so re-running is safe. - sync:
rclone copy ./output r2:mycast(configurable viaMYCAST_R2_REMOTE). - notify: Sends one Telegram message summarizing each step's outcome (success or failure).
A flock (.mycast.lock) prevents concurrent runs from oversubscribing the GPU. If a run is already in progress, the new invocation exits immediately.
Exit code is 0 on full success, 1 if any step failed (the Telegram message indicates which).
uv run mycast tts # only the latest incoming/*.txt
uv run mycast tts --all # every unprocessed file in incoming/
uv run mycast tts --force # re-run the latest even if mp3 exists
uv run mycast tts incoming/2026-04-29.txt # process specific file(s)Calls the mlx_audio Python API directly (mlx_audio.tts.utils.load_model) — no shell-out. The model is loaded once per process and reused across all input files in a single invocation.
The transcript is chunked into ~400-character pieces (sentence-aligned, with --- separator lines stripped) before being fed to the TTS model. Each chunk gets a fresh in-context-learning (ICL) voice-clone prefill from custom-voices/<voice>.{wav,txt}. Audio segments are concatenated and written as a single output/<stem>.mp3 via mlx_audio.audio_io.write (which uses ffmpeg internally).
The transcript is also copied to output/<stem>.txt so the R2 sync includes it alongside the audio.
Tunables: MYCAST_MAX_TOKENS (per-chunk codec budget, default 4096), MYCAST_CHUNK_CHARS (max characters per chunk, default 400).
uv run mycast feedWalks output/*.mp3, parses the date from the filename (YYYY-MM-DD), reads the matching output/<stem>.txt, and writes/replaces the RSS <item> for that date. Generates output/<stem>.vtt (WebVTT timestamps distributed proportionally by sentence length across the audio duration) and links it via <podcast:transcript>.
The episode description is whatever appears before the first --- line in the transcript.
uv run mycast sync
uv run mycast sync --dry-run # preview without uploadingShells out to rclone copy ./output $MYCAST_R2_REMOTE. rclone's natural diffing means only changed/new files transfer.
uv run mycast notify "feed updated manually"Sends a plain-text message to the chat configured by TELEGRAM_CHAT_ID. Useful in scripts.
uv run mycast new-podcast "My Daily News" -d "Personal news roundup, read aloud."
uv run mycast new-podcast "Other" -o other-feed.xml --forceOne-time setup. Default output is output/feed.xml. After creating, edit the file to fill in podcast details (link, image, category, etc.).
All configuration is done via environment variables, loaded from .envrc by direnv. Copy .envrc.example to .envrc, fill in values, and run direnv allow.
| Variable | Required | Default | Purpose |
|---|---|---|---|
TELEGRAM_BOT_TOKEN |
yes (for notify) | — | Bot token from BotFather |
TELEGRAM_CHAT_ID |
yes (for notify) | — | Target chat ID |
MYCAST_BASE_URL |
no | https://mycast.hekuli.com/ |
Public URL prefix used in feed.xml enclosure URLs |
MYCAST_R2_REMOTE |
no | r2:mycast |
rclone remote:bucket for sync |
MYCAST_VOICE |
no | samantha |
Voice clone reference (custom-voices/<name>.{wav,txt}) |
MYCAST_TTS_BACKEND |
no | qwen3 |
TTS engine: qwen3 (Qwen3-TTS, ICL voice cloning) or chatterbox (Resemble Chatterbox, caches speaker conditionals once for cross-chunk consistency) |
MYCAST_MODEL |
no | depends on backend | mlx-audio model id (default: Qwen3-TTS-12Hz-1.7B-Base-bf16 for qwen3, chatterbox-fp16 for chatterbox) |
MYCAST_MAX_TOKENS |
no | 4096 |
Per-chunk codec-token budget for TTS (12.5 Hz, so 4096 ≈ 5.5 min per chunk) |
MYCAST_CHUNK_CHARS |
no | 400 |
Max characters per text chunk fed to the TTS model |
MYCAST_SPEED |
no | 0.9 (qwen3) / 1.0 (chatterbox) |
Playback speed multiplier (ffmpeg atempo, preserves pitch). 1.0 = no change |
MYCAST_EXAGGERATION |
no | 0.5 |
Chatterbox-only: emotion/prosody intensity (0=flat, 0.5=natural, 1=very expressive). Ignored by qwen3 |
MYCAST_CFG_WEIGHT |
no | 0.5 |
Chatterbox-only: classifier-free guidance weight for voice cloning fidelity. Ignored by qwen3 |
MYCAST_TEMPERATURE |
no | 0.8 |
Sampling temperature. Lower = consistent/flat, higher = expressive/drift |
MYCAST_TOP_P |
no | 0.9 |
Nucleus sampling cutoff |
MYCAST_NORMALIZE_RMS |
no | 0.1 |
Per-chunk loudness target (RMS) for equalizing volume across chunks |
MYCAST_LANG_CODE |
no | auto |
TTS language hint. auto runs per-chunk detection (English vs German). Force a specific code with english, german, french, italian, portuguese, spanish, russian, chinese, japanese, korean |
MYCAST_SEED |
no | 42 |
Random seed reset before each chunk's TTS call. Pins voice consistency across chunks; sweep different seeds to find one whose voice you like best |
Voice clones are stored as custom-voices/<name>.wav + custom-voices/<name>.txt pairs. The .txt file must be an exact transcript of what's spoken in the .wav (used by Qwen3-TTS; Chatterbox ignores the transcript). The default voice samantha is shipped in this repo.
To add a new voice: drop custom-voices/myvoice.wav (~10s of clean speech) and custom-voices/myvoice.txt (its exact transcript), then export MYCAST_VOICE=myvoice. Both backends use the same WAV/transcript pair, so switching MYCAST_TTS_BACKEND between qwen3 and chatterbox doesn't require any other changes.
Switch backends by setting MYCAST_TTS_BACKEND:
MYCAST_TTS_BACKEND=qwen3 uv run mycast tts incoming/2026-04-29.txt # default
MYCAST_TTS_BACKEND=chatterbox uv run mycast tts incoming/2026-04-29.txtQuick comparison:
Qwen3-TTS Base (qwen3) |
Chatterbox (chatterbox) |
|
|---|---|---|
| Speaker conditioning | rebuilt every chunk (drift-prone) | cached once at load (consistent) |
| Languages | english, german, french, italian, portuguese, spanish, russian, chinese, japanese, korean | en, de, fr, es, it, pt, ru, zh, ja, ko + ar/da/el/fi/he/hi/ms/nl/no/pl/sv/sw/tr |
| Reference transcript needed | yes (.txt exact transcript) |
no (only .wav) |
| Per-chunk language tag | yes — lang_code arg |
yes — lang_code arg (2-letter codes) |
| Built-in loudness normalization | no | no (Turbo variant has it; classic doesn't) |
Natural-language instruct |
no | no |
Both honor the same [GERMAN]...[/GERMAN] markup in input transcripts and the same env-var tunables (seed, temperature, top_p, max_tokens, chunk_chars, speed, normalize_rms).
For longer stretches of German content (a quoted statement, a German headline, a paragraph), wrap them in [GERMAN] ... [/GERMAN] markers in the transcript. Both backends parse these and route the content to the model's German language code, fixing the American-accented German pronunciation that otherwise occurs with English-cloned voices. Single German names embedded inline don't need markers.
The minister released a statement yesterday.
[GERMAN]
Die Lage ist ernst, aber wir haben einen klaren Plan für die kommenden Wochen.
[/GERMAN]
Translated, it says the situation is serious but they have a clear plan.
A macOS LaunchAgent can watch ./incoming/ and run mycast run automatically whenever a new transcript appears. See SETUP.md → Step 5 for installation.
- Re-running anything is always safe: TTS skips processed files; feed replaces entries; rclone diffs.
- Output filenames must contain a
YYYY-MM-DDdate — the feed step parses it from the basename. - Audio output is MP3 directly from mlx-audio (which uses ffmpeg under the hood).