Skip to content

feat(api): expose transcript state via X-Transcript-Status header#37

Merged
syswave-dev merged 1 commit into
mainfrom
feat/x-transcript-status-header
Jun 22, 2026
Merged

feat(api): expose transcript state via X-Transcript-Status header#37
syswave-dev merged 1 commit into
mainfrom
feat/x-transcript-status-header

Conversation

@syswave-dev

Copy link
Copy Markdown
Collaborator

Closes #36. Follow-up to #33/#34.

Why

PullMD distinguishes YouTube transcript states internally (ok | none | blocked | error) but only conveyed them as English placeholder strings in the response body. Programmatic consumers (notably the collector discovery pipeline) had to brittle-string-match to tell a transient 429 block from a genuinely absent transcript.

What

  • lib/web.js: surface yt.transcriptStatus on the extraction result.
  • server.js: set X-Transcript-Status response header on GET /api; include transcriptStatus in the SSE result event.
  • Header is only set for sources that carry a status (YouTube); absent otherwise.
  • Transient blocks are never cached, so the header is always present for the actionable blocked case (served fresh).
  • README: documented in both header references.

Consumer semantics

  • ok → real transcript.
  • blocked (HTTP 429) / error → transient, not cached → skip persisting, retry later.
  • none → permanent negative.

Tests

764 JS (+3): result surfaces transcriptStatus; GET /api sets the header; header omitted for sources without a status.

🤖 Generated with Claude Code

…oses #36)

PullMD distinguishes YouTube transcript states internally (ok|none|blocked
|error) but only conveyed them as English placeholder strings in the body,
forcing programmatic consumers (collector) to brittle-string-match a
transient 429 block vs a genuinely absent transcript.

Surface yt.transcriptStatus on the extraction result and set it as the
X-Transcript-Status response header on GET /api (and in the SSE result
event). Header is only set for sources that carry a status (YouTube);
absent otherwise. Transient blocks are never cached, so the header is
always present for the actionable `blocked` case (served fresh).

Documented in README (both header references). Tests: 764 JS (+3).

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
Claude-Session: https://claude.ai/code/session_01QzC4vKjJBjuAoD1JzyYAWp
@syswave-dev syswave-dev merged commit e9c9df5 into main Jun 22, 2026
4 checks passed
@syswave-dev syswave-dev deleted the feat/x-transcript-status-header branch June 22, 2026 09:12
syswave-dev added a commit that referenced this pull request Jun 25, 2026
…metadata on cache hits (#39)

Strip 1x1 tracking pixels in cleanDom, add the sciencedaily-lead-image
recipe (unwrap #text), and serve the full persisted metadata
(og:image/twitter:image, description, author, …) on /api cache hits in
both frontmatter and format=json.

Batches untagged main commits since v3.1.0: #34 (youtube 429 vs missing
transcript), #35 (bundle yt_transcript.py), #37 (X-Transcript-Status).

Bumps version to 3.2.0.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Expose transcript state via X-Transcript-Status header for programmatic consumers

1 participant