feat(DocLang): add hyperlink support by vagenas · Pull Request #578 · docling-project/docling-core

vagenas · 2026-04-07T10:12:27Z

No description provided.

Signed-off-by: Panos Vagenas <pva@zurich.ibm.com>

github-actions · 2026-04-07T10:12:40Z

✅ DCO Check Passed

Thanks @vagenas, all your commits are properly signed off. 🎉

mergify · 2026-04-07T10:13:06Z

Merge Protections

Your pull request matches the following merge protections and will not be merged until they are valid.

🔴 Require two reviewer for test updates

Waiting for

#approved-reviews-by >= 2

This rule is failing.

When test data is updated, we require two reviewers

#approved-reviews-by >= 2

🟢 Enforce conventional commit

Wonderful, this rule succeeded.

Make sure that we follow https://www.conventionalcommits.org/en/v1.0.0/

title ~= ^(fix|feat|docs|style|refactor|perf|test|build|ci|chore|revert)(?:\(.+\))?(!)?:

Signed-off-by: Panos Vagenas <pva@zurich.ibm.com>

codecov · 2026-04-07T10:49:00Z

Codecov Report

❌ Patch coverage is 84.52381% with 13 lines in your changes missing coverage. Please review.

Files with missing lines	Patch %	Lines
docling_core/experimental/doclang.py	84.52%	13 Missing ⚠️

📢 Thoughts on this report? Let us know!

ceberam · 2026-04-08T09:14:42Z

+            hyperlink = AnyUrl(uri_text)
+        except Exception:
+            # Invalid URL, skip
+            return


This will reject any relative hyperlink. Is this behavior intended?
In the examples like test/data/doc/content_all.gt.dclg.xml I can see that <uri> also supports relative hyperlinks, so we should be able to deserialize it.

For instance, deserializing the following text will discard the hyperlink:

<doclang version="1.0.0"> <text> <hyperlink> <uri>/blog/</uri> hyperlink </hyperlink> </text> </doclang>

Also, the roundtrip test in test_roundtrip_hyperlink will fail if we add a relative hyperlink like this:

doc.add_text( label=DocItemLabel.TEXT, text="simple link", hyperlink=Path("/blog/") )

ceberam · 2026-04-08T09:28:17Z

+        content_elements = [
+            node
+            for node in el.childNodes
+            if isinstance(node, Element)
+            and node.tagName not in {DoclangToken.URI.value, DoclangToken.LOCATION.value, DoclangToken.LAYER.value}
+        ]


This collects only element children and the surrounding text nodes will be excluded.

For instance, if you deserialize:

<hyperlink> <uri>https://example.com</uri> pre <bold>mid</bold> post </hyperlink>

you will get:

TextItem(text="mid", hyperlink="https://example.com/")

This is the example I used:

from docling_core.experimental.doclang import DoclangDeserializer xml = """<doclang version="1.0.0"> <text> <hyperlink> <uri>https://example.com</uri> pre <bold>mid</bold> post </hyperlink> </text> </doclang>""" doc = DoclangDeserializer().deserialize(text=xml) for doc in doc.texts: print(doc.text)

(you will get just mid, without pre or post)

feat(DocLang): add hyperlink support

1b367fd

Signed-off-by: Panos Vagenas <pva@zurich.ibm.com>

vagenas mentioned this pull request Apr 7, 2026

DocLang backlog #486

Open

52 tasks

vagenas requested review from PeterStaar-IBM, cau-git and ceberam April 7, 2026 10:25

apply CDATA escaping to URIs as needed

7e58e8b

Signed-off-by: Panos Vagenas <pva@zurich.ibm.com>

cau-git approved these changes Apr 8, 2026

View reviewed changes

ceberam requested changes Apr 8, 2026

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat(DocLang): add hyperlink support#578

feat(DocLang): add hyperlink support#578
vagenas wants to merge 2 commits into
mainfrom
add-doclang-hyperlinks

vagenas commented Apr 7, 2026

Uh oh!

github-actions Bot commented Apr 7, 2026 •

edited

Loading

Uh oh!

mergify Bot commented Apr 7, 2026 •

edited

Loading

Uh oh!

codecov Bot commented Apr 7, 2026

Uh oh!

ceberam Apr 8, 2026

Uh oh!

ceberam Apr 8, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Conversation

vagenas commented Apr 7, 2026

Uh oh!

github-actions Bot commented Apr 7, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

mergify Bot commented Apr 7, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Merge Protections

🔴 Require two reviewer for test updates

🟢 Enforce conventional commit

Uh oh!

codecov Bot commented Apr 7, 2026

Codecov Report

Uh oh!

ceberam Apr 8, 2026

Choose a reason for hiding this comment

Uh oh!

ceberam Apr 8, 2026

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

github-actions Bot commented Apr 7, 2026 •

edited

Loading

mergify Bot commented Apr 7, 2026 •

edited

Loading