fix metadata arglists for python defaults#285
Conversation
|
Initial impression, big improvement in dev experience. Doing a little adversarial testing to make sure nothing breaks with weird or opaque types. |
|
@jsavyasachi Overall I think this is a great direction. However, there are some edge cases to consider.
Technically #4 is actually a bug in the way libpython-clj handles RecursionErrors when stringifying, the recursive repr should result in a stackoverflow and be a special case of (#3) but it's a landmine currently. In general, I think this is a good direction -- the only thing missing is some graceful degradation for the above cases that allows a developer to diagnose and debug without the repl experience being unceremoniously degraded by surprisingy metadata handling. Python adversarial testing fileimport functools
class BadStr:
"""repr/str both raise -- triggers the throw path in py-default->jvm."""
def __repr__(self):
raise ValueError('boom repr')
def __str__(self):
raise ValueError('boom str')
class WeirdStr:
def __repr__(self):
return 'x' * 40
class HugeReprModel:
"""Simulates a PyTorch-style model whose repr is enormous (many nested layers)."""
def __init__(self, n_layers=300):
self.n_layers = n_layers
def __repr__(self):
lines = ["GnarlyNet("]
for i in range(self.n_layers):
lines.append(
f" (layer{i}): Linear(in_features=4096, out_features=4096, bias=True)")
lines.append(f" (act{i}): GELU(approximate='none')")
lines.append(f" (drop{i}): Dropout(p=0.1, inplace=False)")
lines.append(")")
return "\n".join(lines)
__str__ = __repr__
class RecursiveRepr:
"""repr recurses without bound -> Python raises RecursionError."""
def __repr__(self):
return repr(self)
__str__ = __repr__
class HangRepr:
"""repr never returns -> str(x) blocks forever while holding the GIL."""
def __repr__(self):
while True:
pass
__str__ = __repr__
_bad = BadStr()
_weird = WeirdStr()
_sentinel = object()
_partial = functools.partial(int, 0)
_recursive = RecursiveRepr()
_hang = HangRepr()
_huge = HugeReprModel(300)
def f_class(x=int):
"""Default is a Python class object."""
return x
def f_lambda(x=lambda a: a):
"""Default is a lambda."""
return x
def f_badstr(x=_bad):
"""Default's str()/repr() raise."""
return x
def f_weird(x=_weird):
"""Default has a custom repr."""
return x
def f_sentinel(x=_sentinel):
"""Default is a bare sentinel object()."""
return x
def f_partial(x=_partial):
"""Default is a functools.partial."""
return x
def f_nested_opaque(x=(int, str)):
"""Default is a tuple of opaque class objects (nested case)."""
return x
def f_huge(model=_huge):
"""Default is an object with a massive repr -- the 'detonate the arglists' case."""
return model
def f_kw_huge(*, model=_huge):
"""Keyword-only variant of the massive-repr default."""
return model
def f_recursive(x=_recursive):
"""Default's repr recurses infinitely -> RecursionError."""
return x
def f_hang(x=_hang):
"""Default's repr never returns -> str() hangs forever."""
return x
def f_mixed(a, b=1, c=int, *, d=_sentinel, e=2):
"""Mixed positional/keyword-only with a couple of opaque defaults."""
return (a, b, c, d, e) |
|
Thanks for the adversarial pass. I ported your file into
Two I left out of this PR:
Still learning, so happy to hear your thoughts and make further changes if necessary! |
py-default->jvm stringified opaque Python defaults with str(py-str x), which (1) was unbounded for huge reprs, (2) only handled a top-level opaque so a collection default (e.g. a tuple of classes) leaked raw pointers, and (3) threw when a default's repr raised, dropping the var from the namespace. safe-py-str truncates long reprs and falls back to "<unprintable>" when repr/str raises; py-default->jvm now falls back to a str() of the whole default whenever any nested value is opaque. Recursive- and hanging-repr cases are left out: the former is a native crash in libpython-clj's stringify path, the latter needs a timeout.
3127adf to
0952848
Compare
Great work and much appreciated! I will take a look today. I agree the last two are out of scope (unless you wanted an extra challenge) as but that's defense in depth against someone hanging a repr, so I would not call it high priority. the goal of course is to make sure no one blames libpython-clj for poorly written python code 😇 |


Fixes #284.
Fixes #283.
Changes
<class 'int'>, instead of exposing pointer metadata.:arglistsmetadata for vars created bypy/from-import, so Clojure doc output can render the signature before the docstring.clojure.repl/docoutput.Verification
Note: the PATH shim was needed locally because Java is x86_64 while the default Homebrew Python is arm64.