Our Take
This is infrastructure, not a capability leap: it solves the real problem (agents can't synthesize multi-source enterprise data reliably), but the answer is "run a separate server" rather than improving agents themselves.
Why it matters
Teams building agents for regulated industries (finance, healthcare, defense) need a way to ground agents in sensitive internal data without exposing that data to the agent harness. AI-Q delegates research work while keeping source documents inside the controlled environment. The practical win is data sovereignty plus auditability in one architectural pattern.
Do this week
DevOps: Deploy AI-Q (Docker Compose or Helm) in your data environment this week, then test MCP authentication patterns against your actual source systems so your agent pilots can start next sprint.
NVIDIA packages research pipeline as reusable agent skill
NVIDIA released AI-Q as an open-source blueprint that exposes a dedicated deep research backend as a portable "skill" that general-purpose agent frameworks can delegate to. Agent harnesses like Claude Code, Codex, and LangChain Deep Agents can now submit research tasks (multi-document synthesis, decision briefs, long-horizon analysis) to a running AI-Q server and receive structured reports with source attribution, rather than attempting synthesis themselves.
The skill ships with install scripts for three major harness platforms. Claude Code loads repo-local skills from `.claude/skills/`; Codex from a configured skills directory; OpenCode from `~/.config/opencode/skills/`. Once installed, phrases like "research the regulatory landscape across our internal policy docs and produce a memo" route through the skill, which submits a job to the AI-Q server, polls for completion, and returns a cited output.
The second part of the release adds first-class Model Context Protocol (MCP) support so AI-Q can authenticate against enterprise data sources without standing up a parallel retrieval stack. Three authentication patterns are documented: unauthenticated MCP servers (simplest case), service-account MCP auth (preferred for CI and shared enterprise sources), and forwarding the signed-in AI-Q user's bearer token (when downstream APIs already trust that identity). Tokens are captured at job-submit time and restored inside async workers, so long-running research jobs preserve user identity context. Token refresh mid-job is not yet supported; jobs that exceed the access token's time-to-live will fail on auth-required calls.
Data sovereignty and auditability matter more than architectural elegance
Agent harnesses are built for orchestration, not research. When agents attempt multi-source synthesis without a dedicated backend, they produce inconsistent results on tasks requiring enterprise data, long-horizon planning, or citation accuracy. More critically, the agent harness gains direct access to sensitive source documents, which is unacceptable in regulated industries.
AI-Q inverts the risk: the research pipeline runs where the data is, reads enterprise data, performs retrieval and synthesis, and emits only the cited output. Raw documents never leave the controlled environment. This is the concrete win for teams in healthcare, financial services, government, and defense. The agent harness sees a single high-level capability and never touches the underlying sources.
Auditability ships as a pipeline feature, not a compliance retrofit. AI-Q reports include source attribution, and the underlying NeMo Agent Toolkit emits OpenTelemetry traces. Compliance teams can inspect which sources were retrieved, how they were used, and how the final cited answer was produced.
Teams can also choose their model path: Nemotron reasoning models handle planning and synthesis, while frontier-model routers handle tasks needing additional capability. Open models can run on-premises as NVIDIA NIM, or teams can disable them entirely to meet strict compliance requirements. The same evaluation harnesses used for internal benchmarking (FreshQA, Deep Research Bench, DeepSearchQA) ship with the blueprint, so teams can measure quality on their own data.
Start with deployment, not experimentation
AI-Q runs on Docker Compose or Helm, meaning the same blueprint works on a developer laptop, an on-premises Kubernetes cluster, or an air-gapped data center. For teams in regulated industries, the deployment choice is the architectural choice: pick the environment where your data lives, spin up the server there, and expose it to your agent harness via MCP.
Begin by spinning up the AI-Q server in your data environment (Docker Compose or Helm from the GitHub repository). Then map your enterprise data sources as MCP servers, starting with the authentication pattern that matches your existing access controls (service account for shared sources, bearer token forwarding if your API gateway already trusts the AI-Q user). Test MCP connectivity against your actual source systems, not dummy data.
Once the server is running, install the skill into your agent framework (three commands per harness type). Verify that your agent can submit a research task and receive a report with citations. Only then wire in your first enterprise use case. The evaluation harnesses ship with the blueprint; run them against your own data to establish a baseline for research quality before you declare the pipeline production-ready.
Dell has validated AI-Q on its infrastructure and published a reference architecture for on-premises multi-agent research workflows in regulated industries like financial services, public sector, and manufacturing.