Deploy Self-Improving Agents in Sandbox Runtime to Mix Private and Public Data

NVIDIA releases open-source NemoClaw stack for sandbox-isolated agents

NVIDIA published a working example of Hermes Agent running inside OpenShell, a sandbox that enforces network policy, manages credentials, and persists agent state across container rebuilds. The deployment pattern connects the agent to Slack and Outlook for internal messaging, GitHub and NVIDIA forums for external data, and Nemotron-3-Super-120B for reasoning.

The key mechanic is skill persistence. When a user teaches the agent a response format in conversation (e.g., "give me exactly 5 top issues and 3 discussions"), Hermes writes that format to a SKILL.md file. The skill survives snapshots, teardowns, and rebuilds. A new user or conversation can invoke the same skill without re-teaching it. Skills ship in a folder that the gateway picks up automatically.

The sandbox enforces two critical constraints. First, credentials are never passed to the agent itself. Slack and Outlook tokens are held by the proxy; authentication happens on the request boundary. Second, network policy is declarative code, not a prompt. A policy.yaml file lists every allowed host, port, HTTP method, and binary path. If Hermes tries to POST to an unlisted endpoint, the proxy returns 403 and the agent sees a tool error. External data (GitHub issues, forum posts) reaches the agent only through a host-side ETL that pre-fetches and mirrors it, giving the agent read-only access.

The example ships with observability baked in. Traces are recorded in Agent Trajectory Format and pulled via scripts/download-traces.sh. Setting PHOENIX_COLLECTOR_ENDPOINT in the environment enables live streaming to an Arize Phoenix collector for interactive debugging without leaving the sandbox.

Security model beats credentials-in-memory for real deployments

Most agent tutorials assume a trusted environment or ask teams to "be careful" with secrets. This one doesn't. By making the runtime (not the model, not the prompt) responsible for credential management and network isolation, the design survives a compromised model or a jailbroken agentic loop. That's the difference between a PoC and a production deployment.

The skill-persistence pattern also flips the economics of agent improvement. Today, agents trained on examples require either retraining (expensive, slow) or prompt engineering (fragile, per-conversation). Here, teaching happens conversationally, and the learned format is code, not memory. It travels with the agent and works across users. That matters because it means the agent gets smarter without code review or DevOps involvement.

The architecture itself is instructive: a model layer (Nemotron), a harness layer (Hermes Agent), and a runtime layer (OpenShell). This separation means you can swap the model (vLLM, NIM, or NVIDIA inference API) without touching the harness, and swap the harness without touching the runtime. It's explicit design for portability.

Start with policy.yaml before you teach the agent anything

The NemoClaw repo includes a full working example with Docker setup, environment templates, and bring-up scripts. Clone it, fill in .env with your Slack or Outlook credentials and an NVIDIA API key, and bash scripts/bring-up.sh will bootstrap the sandbox.

Before you ask the agent to do anything, read the policy.yaml reference in the OpenShell documentation and draft your own policies. Think through which data sources the agent needs read access to, which external APIs it may call (if any), and which binaries inside the sandbox should be allowed to make network requests. Policy is cheap to iterate on now; it's expensive to retrofit after you've taught the agent three months of custom skills.

The snapshot-restore cycle keeps learned state durable. When you rebuild the agent (new code, new model, new config), run scripts/snapshot.sh before tear-down and scripts/restore.sh after bring-up. The credential filter in the snapshot excludes .env, *token*, and *secret* files, so snapshots are safe to version-control or share.

Deploy Self-Improving Agents in Sandbox Runtime to Mix Private and Public Data

Our Take

Why it matters

Do this week

NVIDIA releases open-source NemoClaw stack for sandbox-isolated agents

Security model beats credentials-in-memory for real deployments

Start with policy.yaml before you teach the agent anything

One daily brief. Every story gets a hype verdict.

Related stories

The 30-Day AI-Native Challenge: a free/freemium roadmap to real AI skills

Your AI compliance gap is wider than your governance framework

Compliance teams ditch spreadsheets for unified EDD software