Python: feat(python): Add local-codeact package with AST validation by eavanvalkenburg · Pull Request #6091 · microsoft/agent-framework

eavanvalkenburg · 2026-05-26T16:09:34Z

Motivation and Context

Foundry hosted agents already run inside a sandboxed container, which makes them
a viable target for the CodeAct pattern without the heavyweight isolation that
Hyperlight or Monty bring. We want a lightweight, in-process variant that lets
agents author and execute Python in the host environment with as many guard
rails as possible, while being explicit that the package is only safe to use
when the surrounding environment is already a sandbox.

This PR adds a new alpha package agent-framework-local-codeact that:

Mirrors the CodeAct provider surface used by agent-framework-hyperlight and
agent-framework-monty, so users can swap providers without changing their
agent code.
Executes generated Python in a subprocess by default with byte/time limits and
a workspace tempdir, and offers an opt-in unsafe_in_process mode for trusted
setups.
Enforces an AST allow-list for imports, calls, and builtins before any code
runs, with user-overridable allowed_imports / allowed_builtins (deny-list
overrides also available for the rare opt-in case).
Supports host tools surfaced via call_tool(...) (model only sees
execute_code), file mounts, environment variables, and configurable byte and
time limits.
Ships a runnable Foundry hosted-agent sample wired the same way as the
hyperlight_codeact container sample, and a standalone direct-invocation
sample.

Description

New package layout (python/packages/local_codeact/):

LocalCodeActProvider — context provider that injects an execute_code tool
plus CodeAct instructions; same shape as the Hyperlight/Monty providers.
LocalExecuteCodeTool — the underlying tool with all configuration knobs
(host tools, approval mode, workspace, file mounts, env, limits, execution
mode, allow/deny lists for imports and builtins).
_validator.py — AST-based allow-list validator run before any execution,
enforcing imports, builtin usage, and rejecting dynamic-eval constructs.
_runner.py / _bridge.py — subprocess runner and JSON bridge protocol used
to invoke the embedded Python runner.
_files.py — file-mount handling with size caps and read/read-write modes.
ProcessExecutionLimits — timeout and per-stream/per-file byte limits.
Samples: samples/foundry_hosted_agent.py (Foundry Responses host) and
samples/local_execute_code.py (direct tool invocation), plus
samples/README.md.
Tests under tests/local_codeact/ covering the provider, tool, validator,
runner, file-mount handling, and the Windows tempdir-cleanup race.
Package registered in python/PACKAGE_STATUS.md as alpha. No lazy-loading in
core yet (per alpha guidance); samples live inside the package.

Notes:

Default execution mode is subprocess with explicit Python executable.
The allow-list is the source of truth; deny-list is opt-in. User overrides for
both are supported on LocalCodeActProvider and LocalExecuteCodeTool.
Windows tempdir cleanup races are handled with both
tempfile.TemporaryDirectory(ignore_cleanup_errors=True) and a broad
contextlib.suppress(Exception) around the final cleanup.

Contribution Checklist

The code builds clean without any errors or warnings
The PR follows the Contribution Guidelines
All unit tests pass, and I have added new tests where possible
Is this a breaking change? If yes, add "[BREAKING]" prefix to the title of the PR.

moonbox3 · 2026-05-26T16:13:14Z

Python Test Coverage Report •

File	Stmts	Miss	Cover	Missing
packages/local_codeact/agent_framework_local_codeact
_bridge.py	196	54	72%	26, 30, 37–42, 54, 63, 73–75, 82, 133, 146–147, 150–152, 154, 163, 168–171, 175, 180, 183, 190, 200–201, 211–214, 218, 222–224, 243–245, 253–254, 261–262, 276–279, 281–282, 293
_execute_code_tool.py	224	38	83%	56, 58, 85, 87, 90, 103, 111, 116, 118, 120, 128–129, 131, 147, 150, 153–156, 158, 163–164, 168, 234, 252–254, 291, 304–307, 311, 316, 321, 402–403, 446
_files.py	133	37	72%	23, 27, 29, 47, 55–59, 80, 107–108, 114–115, 122, 124, 130–131, 139, 143–148, 164, 171–172, 176, 179–180, 182–183, 185–186, 189–190
_instructions.py	38	15	60%	16, 35, 41–45, 47–50, 55, 110–111, 113
_provider.py	34	8	76%	85, 89, 93, 97, 101, 105, 109, 113
_runner.py	123	96	21%	21–24, 27, 30–37, 40, 44, 48, 52–60, 64–70, 72, 74–75, 89–91, 95–96, 100–113, 117–118, 120–121, 131–133, 142–143, 147–151, 154, 156–157, 159, 164–166, 168–171, 173, 184–197, 206, 210
_types.py	22	0	100%
_validator.py	76	10	86%	279–280, 294–295, 311–312, 316, 318, 345, 383
TOTAL	37686	4610	87%

Python Unit Test Overview

Tests	Skipped	Failures	Errors	Time
7409	34 💤	0 ❌	0 🔥	1m 57s ⏱️

Copilot

Pull request overview

Adds a new alpha Python workspace package, agent-framework-local-codeact, intended to enable CodeAct-style execution of model-generated Python in externally sandboxed environments (e.g., Foundry hosted agents), with subprocess execution, file-mount capture, and AST-based validation.

Changes:

Registers agent-framework-local-codeact in the Python workspace (uv/pyproject) and marks it as alpha in PACKAGE_STATUS.md.
Introduces LocalExecuteCodeTool / LocalCodeActProvider with subprocess runner + IPC bridge, file-mount capture helpers, and dynamic instructions.
Adds unit tests plus usage samples and package documentation.

Reviewed changes

Copilot reviewed 20 out of 22 changed files in this pull request and generated 7 comments.

Show a summary per file

File	Description
python/uv.lock	Adds the new editable workspace member and lock metadata.
python/pyproject.toml	Registers the new workspace package.
python/PACKAGE_STATUS.md	Marks `agent-framework-local-codeact` as alpha.
python/packages/local_codeact/pyproject.toml	New package definition, tooling config, and test tasks.
python/packages/local_codeact/README.md	Package docs, security posture, and configuration surface.
python/packages/local_codeact/AGENTS.md	Package architecture and contributor notes.
python/packages/local_codeact/LICENSE	MIT license for the new package.
python/packages/local_codeact/agent_framework_local_codeact/init.py	Public API exports for the package.
python/packages/local_codeact/agent_framework_local_codeact/_types.py	Public types for execution mode, mounts, and limits.
python/packages/local_codeact/agent_framework_local_codeact/_validator.py	AST-based code validation layer.
python/packages/local_codeact/agent_framework_local_codeact/_bridge.py	Parent-side subprocess bridge + tool dispatch.
python/packages/local_codeact/agent_framework_local_codeact/_runner.py	Child-process runner implementing the JSON-lines protocol.
python/packages/local_codeact/agent_framework_local_codeact/_files.py	Mount normalization + symlink-safe file capture.
python/packages/local_codeact/agent_framework_local_codeact/_instructions.py	Dynamic CodeAct instructions and tool descriptions.
python/packages/local_codeact/agent_framework_local_codeact/_execute_code_tool.py	Main `execute_code` tool orchestration and output shaping.
python/packages/local_codeact/agent_framework_local_codeact/_provider.py	Context provider that injects the run-scoped tool + instructions.
python/packages/local_codeact/tests/local_codeact/test_validator.py	Validator allow/block behavior tests.
python/packages/local_codeact/tests/local_codeact/test_local_codeact.py	Tool/provider behavior, subprocess execution, mounts, and limits tests.
python/packages/local_codeact/samples/README.md	Sample index and run instructions.
python/packages/local_codeact/samples/local_execute_code.py	Local usage sample for direct tool invocation.
python/packages/local_codeact/samples/foundry_hosted_agent.py	Foundry hosted-agent wiring sample.
python/packages/local_codeact/agent_framework_local_codeact/py.typed	Marks the package as typed.

github-actions

Automated Code Review

Reviewers: 4 | Confidence: 88%

✓ Correctness

After thorough examination of the local-codeact package implementation, I found no correctness bugs. The code demonstrates excellent engineering practices: proper error handling with early validation, safe resource cleanup using try-finally blocks and context managers, correct subprocess management with timeout handling, secure AST validation with comprehensive allow/block lists, and proper IPC serialization with JSON-safe conversions. All test assertions correctly match the implementation behavior. The Windows environment variable handling (SYSTEMROOT, COMSPEC, PATHEXT) is intentional and necessary for subprocess creation. The validator's permissive approach to user-defined functions is documented and tested. Edge cases like subprocess death, tool call failures, timeout during execution, and symlink handling are all properly managed.

✓ Security Reliability

The local CodeAct package provides defense-in-depth controls for executing LLM-generated Python code, with AST validation, subprocess isolation, and explicit environment control. The implementation is generally sound for its stated purpose (use in external sandboxes like Foundry). However, there are three reliability concerns: (1) the AST validator allows 'open' in ALLOWED_BUILTINS while blocking it in BLOCKED_BUILTINS, creating conflicting policy; (2) subprocess environment building on Windows includes parent environment keys that could leak sensitive data; (3) the validator allows delattr/setattr which could modify object internals unsafely. The package correctly disclaims being a security sandbox and documents required external isolation.

✓ Test Coverage

The test suite provides solid coverage of core functionality (subprocess execution, tool calling, validation, file capture, environment isolation). However, several edge cases and error paths lack coverage: (1) invalid input validation for constructors (empty/invalid paths, negative limits), (2) error handling for subprocess failures (invalid Python executable, runner script errors, malformed bridge responses), (3) boundary conditions for limits (exact limit sizes, total capture limits), (4) file mount edge cases (duplicate mounts, overlapping paths, permission errors), (5) race conditions in async tool calls, and (6) error recovery paths in the bridge protocol. The existing tests are well-structured and verify the happy paths thoroughly.

✓ Design Approach

The design approach is sound for the stated goal of adding AST-validated local code execution for Foundry hosted agents. The validation correctly runs before all execution paths, the subprocess bridge properly serializes concurrent tool calls via async locks, symlink handling prevents directory traversal, and the custom allow/block list semantics are clearly documented. All test cases in the diff are consistent with the implementation.

Automated review by eavanvalkenburg's agents

- Remove 'open', 'getattr', 'setattr', 'hasattr' from ALLOWED_BUILTINS (bypass risk) - Add these to BLOCKED_BUILTINS with explanatory comments - Propagate AST validation settings to create_run_tool snapshot - Terminate subprocess before raising on error messages - Move module docstrings to file start in samples - Remove pointless string statements from samples - Document allowed_builtins behavior in visit_Call Fixes all 8 review comments in PR microsoft#6091

eavanvalkenburg · 2026-05-27T07:15:13Z

Review Comments Addressed

All 8 review comments have been addressed in commit a38ea7c:

Security Fixes

✅ Removed dangerous builtins from ALLOWED_BUILTINS: open, getattr, setattr, hasattr removed as they bypass AST attribute restrictions
✅ Added to BLOCKED_BUILTINS: getattr, setattr, hasattr, delattr now explicitly blocked with explanatory comments

Implementation Fixes

✅ create_run_tool propagation: Now propagates allowed_imports, blocked_imports, allowed_builtins, blocked_builtins to run-scoped tool
✅ Subprocess leak fix: Added await self._stop_process(process) before raising on error messages
✅ allowed_builtins behavior documented: Added docstring explaining that we only enforce block-list (not allow-list) for builtins to permit user-defined functions and registered tools

Sample Fixes

✅ Module docstrings moved: Moved module docstrings to file start in both samples
✅ Pointless strings removed: Removed trailing sample output string statements that triggered B018

All checks passing locally:

uv run poe check-packages -P local_codeact ✅
uv run poe mypy -P local_codeact ✅
uv run poe test -P local_codeact ✅ (46 tests)

eavanvalkenburg · 2026-05-27T07:16:54Z

Python 3.10 Compatibility Fixed

Fixed timeout test that was failing on Python 3.10 due to different TimeoutError string representation.

Commit: 4760db8

Verified on:

Python 3.10.15 ✅
Python 3.12.7 ✅

All 46 tests now pass on both versions.

- Remove 'open', 'getattr', 'setattr', 'hasattr' from ALLOWED_BUILTINS (bypass risk) - Add these to BLOCKED_BUILTINS with explanatory comments - Propagate AST validation settings to create_run_tool snapshot - Terminate subprocess before raising on error messages - Move module docstrings to file start in samples - Remove pointless string statements from samples - Document allowed_builtins behavior in visit_Call Fixes all 8 review comments in PR microsoft#6091

Add agent-framework-local-codeact alpha package for running LLM-generated Python code in Foundry hosted agents and other sandboxed environments. Key features: - Subprocess execution by default (isolated process) - Optional unsafe in-process mode for debugging - AST-based allow-list code validation - Customizable allowed/blocked imports and builtins - Host tool bridge with framed JSON-lines IPC - File mount system with capture and limits - .NET portability features (python_executable, runner_script) Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

- Remove 'open', 'getattr', 'setattr', 'hasattr' from ALLOWED_BUILTINS (bypass risk) - Add these to BLOCKED_BUILTINS with explanatory comments - Propagate AST validation settings to create_run_tool snapshot - Terminate subprocess before raising on error messages - Move module docstrings to file start in samples - Remove pointless string statements from samples - Document allowed_builtins behavior in visit_Call Fixes all 8 review comments in PR microsoft#6091

Python 3.10's TimeoutError has a different string representation than 3.11+. Update test to check for 'TimeoutError' instead of specific message content. Verified on Python 3.10.15 and 3.12.7.

- _validator.py: visit_Call now enforces ALLOWED_BUILTINS for names that match real Python builtins, while still treating unknown names as user-defined functions/registered tools. This makes the allowed_builtins parameter behave as a real allow-list. - _bridge.py / _runner.py: add explicit '# nosec' markers next to the existing '# noqa: S102/S404' so bandit accepts the intentional subprocess import and exec() calls (this package's whole purpose). - test_validator.py: add tests for unknown-builtin rejection, user-defined function acceptance, and custom allow-list expansion. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

On Windows a freshly-killed subprocess can briefly hold the temporary workspace directory open. Swallow OSError from temp_dir.cleanup() so the caller still receives the proper error Content from the run and so the timeout test passes on Windows. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

Ruff SIM105 prefers contextlib.suppress over try/except/pass. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

The previous OSError-only suppression missed the RecursionError that Python's TemporaryDirectory cleanup can raise on Windows when a freshly killed subprocess still holds a handle to the workspace. Pass ignore_cleanup_errors=True (Python 3.10+) so the platform stops retrying rmtree, and broaden the outer suppression so unexpected cleanup errors do not mask the actual run result. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

Model the sample after the hyperlight_codeact container sample: register compute and fetch_data as sandbox-only host tools on LocalCodeActProvider, wire a FoundryChatClient-backed agent, and serve via ResponsesHostServer. Update samples README with the new run/request instructions. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

Add a note to the sample docstring and the samples README directing readers to python/samples/04-hosting/foundry-hosted-agents/responses for the surrounding Foundry hosted-agent environment, Dockerfile, manifest, and deployment workflow. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

github-actions

Automated Code Review

Reviewers: 2 | Confidence: 77%

✓ Test Coverage

No actionable issues found in this dimension.

✓ Design Approach

The validator currently documents that os is restricted to os.environ and os.path, but the implementation only blocks a deny-list of dangerous os.* members. That leaves many other os capabilities available even though the package contract says they are not, so I would request changes on that design mismatch.

Automated review by eavanvalkenburg's agents

Address Copilot review: the validator's deny-list for os.* attributes was broader than the documented contract ('os.environ and os.path only') so attributes like os.listdir, os.open, and os.getcwd slipped through. Replace the deny-list with an allow-list of {environ, path} threaded through validator -> tool -> provider via a new allowed_os_attrs parameter. Harden virtual mount-path handling so a mount cannot be tricked into surfacing protected host data: - resolve_existing_directory rejects symlinked mount roots so a mount whose host_path is itself a symlink cannot expose another directory. - iter_real_files skips hardlinks (st_nlink > 1) and requires every entry's resolved path to stay under the mount root, defeating ln-based hardlink-into-mount and junction-style escapes. Update README to document the virtual-mount-paths-are-labels contract, the os.* allow-list, and the capture-time defenses. Add tests covering os.listdir/os.open/os.getcwd rejection, allowed_os_attrs override, hardlink skipping, and symlinked-mount-root rejection. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

Copilot AI review requested due to automatic review settings May 26, 2026 16:09

Copilot started reviewing on behalf of eavanvalkenburg May 26, 2026 16:09 View session

moonbox3 added documentation Improvements or additions to documentation python labels May 26, 2026

github-actions Bot changed the title ~~feat(python): Add local-codeact package with AST validation~~ Python: feat(python): Add local-codeact package with AST validation May 26, 2026

Copilot AI reviewed May 26, 2026

View reviewed changes

github-actions Bot reviewed May 26, 2026

View reviewed changes

Comment thread python/packages/local_codeact/agent_framework_local_codeact/_validator.py

eavanvalkenburg mentioned this pull request May 27, 2026

.NET: feat(dotnet): Add LocalCodeAct package for local Python execution #6105

Open

4 tasks

moonbox3 added the .NET label May 27, 2026

eavanvalkenburg temporarily deployed to integration May 27, 2026 12:23 — with GitHub Actions Inactive

eavanvalkenburg had a problem deploying to integration May 27, 2026 12:23 — with GitHub Actions Failure

github-actions Bot changed the title ~~Python: feat(python): Add local-codeact package with AST validation~~ .NET: Python: feat(python): Add local-codeact package with AST validation May 27, 2026

eavanvalkenburg had a problem deploying to integration May 27, 2026 12:36 — with GitHub Actions Failure

eavanvalkenburg temporarily deployed to integration May 27, 2026 12:36 — with GitHub Actions Inactive

eavanvalkenburg temporarily deployed to integration May 27, 2026 13:02 — with GitHub Actions Inactive

eavanvalkenburg temporarily deployed to integration May 27, 2026 13:17 — with GitHub Actions Inactive

eavanvalkenburg force-pushed the feature-local-codeact-python branch from f708ca4 to 9c91299 Compare May 27, 2026 14:08

eavanvalkenburg changed the title ~~.NET: Python: feat(python): Add local-codeact package with AST validation~~ Python: feat(python): Add local-codeact package with AST validation May 28, 2026

eavanvalkenburg marked this pull request as ready for review May 28, 2026 07:34

eavanvalkenburg and others added 4 commits May 28, 2026 09:34

fix: Python 3.10 compatibility for timeout test

3f0946f

Python 3.10's TimeoutError has a different string representation than 3.11+. Update test to check for 'TimeoutError' instead of specific message content. Verified on Python 3.10.15 and 3.12.7.

eavanvalkenburg and others added 6 commits May 28, 2026 09:34

Use contextlib.suppress for Windows cleanup race

a331ab8

Ruff SIM105 prefers contextlib.suppress over try/except/pass. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

Fix relative link to foundry-hosted-agents responses folder

9f40659

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

eavanvalkenburg force-pushed the feature-local-codeact-python branch from ad53370 to 9f40659 Compare May 28, 2026 07:35

github-actions Bot reviewed May 28, 2026

View reviewed changes

Comment thread python/packages/local_codeact/agent_framework_local_codeact/_validator.py Outdated

eavanvalkenburg enabled auto-merge May 28, 2026 09:00

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Python: feat(python): Add local-codeact package with AST validation#6091

Python: feat(python): Add local-codeact package with AST validation#6091
eavanvalkenburg wants to merge 11 commits into
microsoft:mainfrom
eavanvalkenburg:feature-local-codeact-python

eavanvalkenburg commented May 26, 2026 •

edited

Loading

Uh oh!

moonbox3 commented May 26, 2026 •

edited

Loading

Uh oh!

Copilot AI left a comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

github-actions Bot left a comment

Uh oh!

Uh oh!

eavanvalkenburg commented May 27, 2026

Uh oh!

eavanvalkenburg commented May 27, 2026

Uh oh!

github-actions Bot left a comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Conversation

eavanvalkenburg commented May 26, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Motivation and Context

Description

Contribution Checklist

Uh oh!

moonbox3 commented May 26, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Python Unit Test Overview

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Reviewed changes

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

github-actions Bot left a comment

Choose a reason for hiding this comment

Automated Code Review

✓ Correctness

✓ Security Reliability

✓ Test Coverage

✓ Design Approach

Uh oh!

Uh oh!

eavanvalkenburg commented May 27, 2026

Review Comments Addressed

Security Fixes

Implementation Fixes

Sample Fixes

Uh oh!

eavanvalkenburg commented May 27, 2026

Python 3.10 Compatibility Fixed

Uh oh!

github-actions Bot left a comment

Choose a reason for hiding this comment

Automated Code Review

✓ Test Coverage

✓ Design Approach

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

eavanvalkenburg commented May 26, 2026 •

edited

Loading

moonbox3 commented May 26, 2026 •

edited

Loading