Skip to content

fix metadata arglists for python defaults#285

Open
jsavyasachi wants to merge 4 commits into
clj-python:masterfrom
jsavyasachi:fix-pyarglists-defaults
Open

fix metadata arglists for python defaults#285
jsavyasachi wants to merge 4 commits into
clj-python:masterfrom
jsavyasachi:fix-pyarglists-defaults

Conversation

@jsavyasachi

@jsavyasachi jsavyasachi commented Jun 8, 2026

Copy link
Copy Markdown
Contributor

Fixes #284.
Fixes #283.

Changes

  • Preserve values returned in argspec defaults when generating pyarglists.
  • Stringify Python-object defaults, such as <class 'int'>, instead of exposing pointer metadata.
  • Add :arglists metadata for vars created by py/from-import, so Clojure doc output can render the signature before the docstring.
  • Add regression coverage for positional defaults, keyword-only defaults, Python-object defaults, and rendered clojure.repl/doc output.

Verification

env PATH=/private/tmp/libpython-clj-x86-python-shim:$PATH clojure -M:test -n libpython-clj2.metadata-test
env PATH=/private/tmp/libpython-clj-x86-python-shim:$PATH clojure -M:test -n libpython-clj2.python-test
env PATH=/private/tmp/libpython-clj-x86-python-shim:$PATH clojure -M:test

Note: the PATH shim was needed locally because Java is x86_64 while the default Homebrew Python is arm64.

@jsavyasachi jsavyasachi changed the title fix metadata pyarglists defaults fix metadata arglists for python defaults Jun 8, 2026

@cnuernber cnuernber left a comment

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This looks good to me.

@jjtolton - If you have a minute could you take a look?

@jjtolton

Copy link
Copy Markdown
Contributor

Initial impression, big improvement in dev experience. Doing a little adversarial testing to make sure nothing breaks with weird or opaque types.

@jjtolton

Copy link
Copy Markdown
Contributor

@jsavyasachi Overall I think this is a great direction. However, there are some edge cases to consider.

  1. Things like pytorch models have really nasty reprs, and there's currently no truncation or limit.
image
  1. nested opaques leak raw pointers:
image
  1. if there is a bad repr, the var is silently dropped from the ns and not interned.

  2. if someone writes a broken repr that hangs or is infinitely recursive, that will blow up in the metadata. It's also a weird potential attack vector that exposes an exec while poopulating the metadata, but at that point you are running an unsandboxed repl against untrusted python code there are a lot of other potential attack vectors so I'm not going to worry about that.

Technically #4 is actually a bug in the way libpython-clj handles RecursionErrors when stringifying, the recursive repr should result in a stackoverflow and be a special case of (#3) but it's a landmine currently.

In general, I think this is a good direction -- the only thing missing is some graceful degradation for the above cases that allows a developer to diagnose and debug without the repl experience being unceremoniously degraded by surprisingy metadata handling.

Python adversarial testing file
import functools


class BadStr:
    """repr/str both raise -- triggers the throw path in py-default->jvm."""
    def __repr__(self):
        raise ValueError('boom repr')
    def __str__(self):
        raise ValueError('boom str')


class WeirdStr:
    def __repr__(self):
        return 'x' * 40


class HugeReprModel:
    """Simulates a PyTorch-style model whose repr is enormous (many nested layers)."""
    def __init__(self, n_layers=300):
        self.n_layers = n_layers

    def __repr__(self):
        lines = ["GnarlyNet("]
        for i in range(self.n_layers):
            lines.append(
                f"  (layer{i}): Linear(in_features=4096, out_features=4096, bias=True)")
            lines.append(f"  (act{i}): GELU(approximate='none')")
            lines.append(f"  (drop{i}): Dropout(p=0.1, inplace=False)")
        lines.append(")")
        return "\n".join(lines)

    __str__ = __repr__


class RecursiveRepr:
    """repr recurses without bound -> Python raises RecursionError."""
    def __repr__(self):
        return repr(self)
    __str__ = __repr__


class HangRepr:
    """repr never returns -> str(x) blocks forever while holding the GIL."""
    def __repr__(self):
        while True:
            pass
    __str__ = __repr__


_bad = BadStr()
_weird = WeirdStr()
_sentinel = object()
_partial = functools.partial(int, 0)
_recursive = RecursiveRepr()
_hang = HangRepr()
_huge = HugeReprModel(300)


def f_class(x=int):
    """Default is a Python class object."""
    return x


def f_lambda(x=lambda a: a):
    """Default is a lambda."""
    return x


def f_badstr(x=_bad):
    """Default's str()/repr() raise."""
    return x


def f_weird(x=_weird):
    """Default has a custom repr."""
    return x


def f_sentinel(x=_sentinel):
    """Default is a bare sentinel object()."""
    return x


def f_partial(x=_partial):
    """Default is a functools.partial."""
    return x


def f_nested_opaque(x=(int, str)):
    """Default is a tuple of opaque class objects (nested case)."""
    return x


def f_huge(model=_huge):
    """Default is an object with a massive repr -- the 'detonate the arglists' case."""
    return model


def f_kw_huge(*, model=_huge):
    """Keyword-only variant of the massive-repr default."""
    return model


def f_recursive(x=_recursive):
    """Default's repr recurses infinitely -> RecursionError."""
    return x


def f_hang(x=_hang):
    """Default's repr never returns -> str() hangs forever."""
    return x


def f_mixed(a, b=1, c=int, *, d=_sentinel, e=2):
    """Mixed positional/keyword-only with a couple of opaque defaults."""
    return (a, b, c, d, e)

@jjtolton jjtolton left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

See comments.

@jsavyasachi

jsavyasachi commented Jun 18, 2026

Copy link
Copy Markdown
Contributor Author

Thanks for the adversarial pass. I ported your file into testcode as the fixture. What's fixed:

  • Huge reprs: truncated at 200 chars, so a big model default can't blow up the arglists.
  • repr/str that raises (BadStr): falls back to <unprintable>, and the var is interned instead of being silently dropped.
  • Nested opaques (f_nested_opaque): this one was a real leak, a tuple of classes was emitting raw Pointer objects. It now renders (<class 'int'>, <class 'str'>).

Two I left out of this PR:

  • f_recursive: reproduced it as a native crash (SIGILL) in the stringify path, below where a Clojure catch can reach. That's the RecursionError handling you flagged, should this be a separate issue we open?
  • f_hang: can't be guarded without a timeout, so out of scope I hope?

Still learning, so happy to hear your thoughts and make further changes if necessary!

py-default->jvm stringified opaque Python defaults with str(py-str x),
which (1) was unbounded for huge reprs, (2) only handled a top-level
opaque so a collection default (e.g. a tuple of classes) leaked raw
pointers, and (3) threw when a default's repr raised, dropping the var
from the namespace.

safe-py-str truncates long reprs and falls back to "<unprintable>" when
repr/str raises; py-default->jvm now falls back to a str() of the whole
default whenever any nested value is opaque.

Recursive- and hanging-repr cases are left out: the former is a native
crash in libpython-clj's stringify path, the latter needs a timeout.
@jsavyasachi jsavyasachi force-pushed the fix-pyarglists-defaults branch from 3127adf to 0952848 Compare June 18, 2026 02:48
@jjtolton

Copy link
Copy Markdown
Contributor

Thanks for the adversarial pass. I ported your file into testcode as the fixture. What's fixed:

  • Huge reprs: truncated at 200 chars, so a big model default can't blow up the arglists.

  • repr/str that raises (BadStr): falls back to <unprintable>, and the var is interned instead of being silently dropped.

  • Nested opaques (f_nested_opaque): this one was a real leak, a tuple of classes was emitting raw Pointer objects. It now renders (<class 'int'>, <class 'str'>).

Two I left out of this PR:

  • f_recursive: reproduced it as a native crash (SIGILL) in the stringify path, below where a Clojure catch can reach. That's the RecursionError handling you flagged, should this be a separate issue we open?

  • f_hang: can't be guarded without a timeout, so out of scope I hope?

Still learning, so happy to hear your thoughts and make further changes if necessary!

Great work and much appreciated! I will take a look today.

I agree the last two are out of scope (unless you wanted an extra challenge) as f_recursive is a bug within libpython-clj and f_hang would require some creative workaround like a short watchdog timer (metadata ingestion should be taking microseconds at most, anything that lasts more than a millisecond should be reaped with a fallback). Ofc I don't actually know how possible that is without killing the repl, so there would be some design work involved.

but that's defense in depth against someone hanging a repr, so I would not call it high priority.

the goal of course is to make sure no one blames libpython-clj for poorly written python code 😇

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

libpython-clj2.metadata/pyarglists is not returning 'defaults' add "parameter" to docstring (as for clojure fn)

3 participants