Find Feed: prefer a page-specific feed over a site-wide <head> feed (#5299)#5327
Open
bojanstef wants to merge 1 commit into
Open
Find Feed: prefer a page-specific feed over a site-wide <head> feed (#5299)#5327bojanstef wants to merge 1 commit into
bojanstef wants to merge 1 commit into
Conversation
…Ranchero-Software#5299) findFeedsInHTMLPage returned early with the <head> feed whenever one existed, discarding every body-discovered feed. For a /blog request on a site whose <head> advertises a site-wide feed (e.g. relay.fm's all-network feed) while the page links its own feed in the body ("Subscribe via RSS"), the page-specific feed was dropped before bestFeed could consider it. Before falling back to the <head> feed, validate any body-linked candidate whose URL is on the same host and nested under the requested page's path, and prefer it. Host comparison is www- and case-insensitive (via localeForLowercasing); root requests (no path) keep the existing <head>-feed behavior, so whole-site Find Feed is unchanged. A unit test covers the path/host matching. Diagnosed by tracing the discovery path at runtime: both feeds were discovered, but the body feed was discarded by the didFindFeedInHTMLHead early return before bestFeed ran. Written with LLM assistance and reviewed to meet the project's quality bar; clean module build with no new warnings, swiftlint --strict clean, tests pass. Generated-by: claude-opus-4-8 Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
eeeae4b to
d6bda7c
Compare
Author
|
@brentsimmons There's AI-generated comments, let me know if I should:
|
Author
|
@brentsimmons would it help if I sent a screen recording? |
Collaborator
|
I’m not sure this is actually the fix I want — we might just be trading a fix for this one specific site for breaking other sites. Not sure what to do yet. It may be just that it needs a special case for this particular site. |
Author
|
@brentsimmons fair enough, let me know if you want to brainstorm |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Fixes #5299. I asked first on Discourse and got the go-ahead: https://discourse.netnewswire.com/t/first-time-contributing-can-i-fix-issue-5299/299
Problem
Find Feed on
https://www.relay.fm/blogselectshttp://relay.fm/master/feed(the all-network feed) instead of the page's ownhttps://www.relay.fm/blog/feed.Root cause
The page advertises two feeds:
http://relay.fm/master/feed— a<link rel="alternate">in<head>(.HTMLHead)https://www.relay.fm/blog/feed— a body "Subscribe via RSS" link (.HTMLLink)In
FeedFinder.findFeedsInHTMLPage, as soon as any<head>feed is found it setsdidFindFeedInHTMLHead = trueand returns the<head>feeds, so the body candidate sitting infeedSpecifiersToDownloadis discarded.bestFeedthen receives a single-element set and returnsmaster/feedwithout ever scoring — so/blog/feedis dropped before it can be considered. (Confirmed by tracing the discovery at runtime: both feeds were discovered; the body feed was discarded by the early return.) This isn't a relay.fm misconfiguration — both feeds are valid and advertised.Fix
Before falling back to the
<head>feed, validate any body-linked candidate whose URL is on the same host and nested under the requested page's path (e.g./blog/feedfor a/blogrequest) and prefer it. Host comparison iswww- and case-insensitive (vialocaleForLowercasing). Root-domain requests (no path) keep the existing<head>-feed behavior, so whole-site Find Feed is unchanged.Testing
testFeedURLIsUnderRequestedPagePath— on-path preference,www/case-insensitive host matching, different-host rejection, root-path exclusion, and the sibling-prefix trap (/bloggeris not under/blog).swiftlint lint --strictclean;FeedFindermodule build has no new warnings; module tests pass.relay.fm/blognow resolves tohttps://www.relay.fm/blog/feed.Disclosure
Diagnosed and drafted with LLM assistance (Claude Opus 4.8), reviewed to meet the project's quality bar — per CONTRIBUTING.md.