mycast

Daily news transcript → podcast pipeline. Generates an MP3 from a text transcript using a Samantha voice clone (Qwen3-TTS via mlx-audio), updates an RSS 2.0 / Podcasting 2.0 feed, syncs to Cloudflare R2, and sends a Telegram status message.

Quick start

uv sync                    # one-time: install deps
mycast new-podcast "My Daily News"   # one-time: create output/feed.xml
# drop a transcript at ./incoming/2026-04-29.txt
uv run mycast run          # tts -> feed -> sync -> notify

For one-time setup (Python env, mlx-audio, ffmpeg, rclone, direnv, optional auto-watcher), see SETUP.md.

Pipeline

incoming/<date>.txt  ──tts──▶  output/<date>.mp3
                              output/<date>.txt   (copied)
                              output/<date>.vtt   ──┐
                                                    ├──feed──▶  output/feed.xml
                                                    │
                              output/*            ──sync──▶  R2
                                                  ──notify──▶  Telegram

Each step is independently runnable and idempotent.

Commands

All commands run via uv run mycast <command>. A summary:

Command	Purpose
`run [--all] [--force]`	Full pipeline: tts → feed → sync → notify (default: only the latest incoming file)
`tts [files...] [--all] [--force]`	Generate MP3 audio from `incoming/*.txt` (default: only the latest, skips already-processed)
`feed`	Refresh `output/feed.xml` with every `output/*.mp3`
`sync [--dry-run]`	Push `./output/` to R2 via `rclone copy`
`notify <message>`	Send a one-off Telegram message
`new-podcast <title> [-d ...] [-o ...] [--force]`	Create a new `feed.xml` template

Add -v / --verbose for debug logging. Logs always go to logs/mycast.log.

`mycast run` — full orchestration

uv run mycast run                # only the latest incoming/*.txt
uv run mycast run --all          # every incoming/*.txt (skips already-processed)
uv run mycast run --force        # reprocess the latest even if its mp3 exists
uv run mycast run --all --force  # reprocess everything

tts: By default, looks at the lexicographically-last incoming/*.txt (which, given the YYYY-MM-DD naming, is the newest date), generates output/<stem>.mp3 if it doesn't already exist, copies the transcript to output/<stem>.txt. --all processes every file in incoming/; --force re-runs even if the output mp3 already exists.
feed: Rebuilds RSS items in output/feed.xml for every output/*.mp3. Same-date entries are replaced, so re-running is safe.
sync: rclone copy ./output r2:mycast (configurable via MYCAST_R2_REMOTE).
notify: Sends one Telegram message summarizing each step's outcome (success or failure).

A flock (.mycast.lock) prevents concurrent runs from oversubscribing the GPU. If a run is already in progress, the new invocation exits immediately.

Exit code is 0 on full success, 1 if any step failed (the Telegram message indicates which).

`mycast tts` — generate audio only

uv run mycast tts                                # only the latest incoming/*.txt
uv run mycast tts --all                          # every unprocessed file in incoming/
uv run mycast tts --force                        # re-run the latest even if mp3 exists
uv run mycast tts incoming/2026-04-29.txt        # process specific file(s)

Calls the mlx_audio Python API directly (mlx_audio.tts.utils.load_model) — no shell-out. The model is loaded once per process and reused across all input files in a single invocation.

The transcript is chunked into ~400-character pieces (sentence-aligned, with --- separator lines stripped) before being fed to the TTS model. Each chunk gets a fresh in-context-learning (ICL) voice-clone prefill from custom-voices/<voice>.{wav,txt}. Audio segments are concatenated and written as a single output/<stem>.mp3 via mlx_audio.audio_io.write (which uses ffmpeg internally).

The transcript is also copied to output/<stem>.txt so the R2 sync includes it alongside the audio.

Tunables: MYCAST_MAX_TOKENS (per-chunk codec budget, default 4096), MYCAST_CHUNK_CHARS (max characters per chunk, default 400).

`mycast feed` — update the RSS feed

uv run mycast feed

Walks output/*.mp3, parses the date from the filename (YYYY-MM-DD), reads the matching output/<stem>.txt, and writes/replaces the RSS <item> for that date. Generates output/<stem>.vtt (WebVTT timestamps distributed proportionally by sentence length across the audio duration) and links it via <podcast:transcript>.

The episode description is whatever appears before the first --- line in the transcript.

`mycast sync` — push to R2

uv run mycast sync
uv run mycast sync --dry-run    # preview without uploading

Shells out to rclone copy ./output $MYCAST_R2_REMOTE. rclone's natural diffing means only changed/new files transfer.

`mycast notify` — Telegram status message

uv run mycast notify "feed updated manually"

Sends a plain-text message to the chat configured by TELEGRAM_CHAT_ID. Useful in scripts.

`mycast new-podcast` — create feed.xml

uv run mycast new-podcast "My Daily News" -d "Personal news roundup, read aloud."
uv run mycast new-podcast "Other" -o other-feed.xml --force

One-time setup. Default output is output/feed.xml. After creating, edit the file to fill in podcast details (link, image, category, etc.).

Configuration

All configuration is done via environment variables, loaded from .envrc by direnv. Copy .envrc.example to .envrc, fill in values, and run direnv allow.

Variable	Required	Default	Purpose
`TELEGRAM_BOT_TOKEN`	yes (for notify)	—	Bot token from BotFather
`TELEGRAM_CHAT_ID`	yes (for notify)	—	Target chat ID
`MYCAST_BASE_URL`	no	`https://mycast.hekuli.com/`	Public URL prefix used in `feed.xml` enclosure URLs
`MYCAST_R2_REMOTE`	no	`r2:mycast`	rclone `remote:bucket` for `sync`
`MYCAST_VOICE`	no	`samantha`	Voice clone reference (`custom-voices/<name>.{wav,txt}`)
`MYCAST_TTS_BACKEND`	no	`qwen3`	TTS engine: `qwen3` (Qwen3-TTS, ICL voice cloning) or `chatterbox` (Resemble Chatterbox, caches speaker conditionals once for cross-chunk consistency)
`MYCAST_MODEL`	no	depends on backend	mlx-audio model id (default: `Qwen3-TTS-12Hz-1.7B-Base-bf16` for qwen3, `chatterbox-fp16` for chatterbox)
`MYCAST_MAX_TOKENS`	no	`4096`	Per-chunk codec-token budget for TTS (12.5 Hz, so 4096 ≈ 5.5 min per chunk)
`MYCAST_CHUNK_CHARS`	no	`400`	Max characters per text chunk fed to the TTS model
`MYCAST_SPEED`	no	`0.9` (qwen3) / `1.0` (chatterbox)	Playback speed multiplier (ffmpeg atempo, preserves pitch). 1.0 = no change
`MYCAST_EXAGGERATION`	no	`0.5`	Chatterbox-only: emotion/prosody intensity (0=flat, 0.5=natural, 1=very expressive). Ignored by qwen3
`MYCAST_CFG_WEIGHT`	no	`0.5`	Chatterbox-only: classifier-free guidance weight for voice cloning fidelity. Ignored by qwen3
`MYCAST_TEMPERATURE`	no	`0.8`	Sampling temperature. Lower = consistent/flat, higher = expressive/drift
`MYCAST_TOP_P`	no	`0.9`	Nucleus sampling cutoff
`MYCAST_NORMALIZE_RMS`	no	`0.1`	Per-chunk loudness target (RMS) for equalizing volume across chunks
`MYCAST_LANG_CODE`	no	`auto`	TTS language hint. `auto` runs per-chunk detection (English vs German). Force a specific code with `english`, `german`, `french`, `italian`, `portuguese`, `spanish`, `russian`, `chinese`, `japanese`, `korean`
`MYCAST_SEED`	no	`42`	Random seed reset before each chunk's TTS call. Pins voice consistency across chunks; sweep different seeds to find one whose voice you like best

Voice cloning

Voice clones are stored as custom-voices/<name>.wav + custom-voices/<name>.txt pairs. The .txt file must be an exact transcript of what's spoken in the .wav (used by Qwen3-TTS; Chatterbox ignores the transcript). The default voice samantha is shipped in this repo.

To add a new voice: drop custom-voices/myvoice.wav (~10s of clean speech) and custom-voices/myvoice.txt (its exact transcript), then export MYCAST_VOICE=myvoice. Both backends use the same WAV/transcript pair, so switching MYCAST_TTS_BACKEND between qwen3 and chatterbox doesn't require any other changes.

Comparing TTS backends

Switch backends by setting MYCAST_TTS_BACKEND:

MYCAST_TTS_BACKEND=qwen3      uv run mycast tts incoming/2026-04-29.txt    # default
MYCAST_TTS_BACKEND=chatterbox uv run mycast tts incoming/2026-04-29.txt

Quick comparison:

	Qwen3-TTS Base (`qwen3`)	Chatterbox (`chatterbox`)
Speaker conditioning	rebuilt every chunk (drift-prone)	cached once at load (consistent)
Languages	english, german, french, italian, portuguese, spanish, russian, chinese, japanese, korean	en, de, fr, es, it, pt, ru, zh, ja, ko + ar/da/el/fi/he/hi/ms/nl/no/pl/sv/sw/tr
Reference transcript needed	yes (`.txt` exact transcript)	no (only `.wav`)
Per-chunk language tag	yes — `lang_code` arg	yes — `lang_code` arg (2-letter codes)
Built-in loudness normalization	no	no (Turbo variant has it; classic doesn't)
Natural-language `instruct`	no	no

Both honor the same [GERMAN]...[/GERMAN] markup in input transcripts and the same env-var tunables (seed, temperature, top_p, max_tokens, chunk_chars, speed, normalize_rms).

German content

For longer stretches of German content (a quoted statement, a German headline, a paragraph), wrap them in [GERMAN] ... [/GERMAN] markers in the transcript. Both backends parse these and route the content to the model's German language code, fixing the American-accented German pronunciation that otherwise occurs with English-cloned voices. Single German names embedded inline don't need markers.

The minister released a statement yesterday.

[GERMAN]
Die Lage ist ernst, aber wir haben einen klaren Plan für die kommenden Wochen.
[/GERMAN]

Translated, it says the situation is serious but they have a clear plan.

Auto-watcher (optional)

A macOS LaunchAgent can watch ./incoming/ and run mycast run automatically whenever a new transcript appears. See SETUP.md → Step 5 for installation.

Notes

Re-running anything is always safe: TTS skips processed files; feed replaces entries; rclone diffs.
Output filenames must contain a YYYY-MM-DD date — the feed step parses it from the basename.
Audio output is MP3 directly from mlx-audio (which uses ffmpeg under the hood).

Name		Name	Last commit message	Last commit date
Latest commit History 8 Commits
agent-instructions		agent-instructions
custom-voices		custom-voices
launchd		launchd
logs		logs
.envrc.example		.envrc.example
.gitignore		.gitignore
.python-version		.python-version
README.md		README.md
SETUP.md		SETUP.md
mycast.py		mycast.py
pyproject.toml		pyproject.toml
uv.lock		uv.lock

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

mycast

Quick start

Pipeline

Commands

`mycast run` — full orchestration

`mycast tts` — generate audio only

`mycast feed` — update the RSS feed

`mycast sync` — push to R2

`mycast notify` — Telegram status message

`mycast new-podcast` — create feed.xml

Configuration

Voice cloning

Comparing TTS backends

German content

Auto-watcher (optional)

Notes

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

mycast

Quick start

Pipeline

Commands

mycast run — full orchestration

mycast tts — generate audio only

mycast feed — update the RSS feed

mycast sync — push to R2

mycast notify — Telegram status message

mycast new-podcast — create feed.xml

Configuration

Voice cloning

Comparing TTS backends

German content

Auto-watcher (optional)

Notes

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

`mycast run` — full orchestration

`mycast tts` — generate audio only

`mycast feed` — update the RSS feed

`mycast sync` — push to R2

`mycast notify` — Telegram status message

`mycast new-podcast` — create feed.xml

Packages