Tool brief · July 3, 2026

Kimi K2.7 Code in GitHub Copilot: worth switching the model picker for?

DeveloperFor Developer

The tool

Kimi K2.7 Code (in GitHub Copilot)

Visit Kimi K2.7 Code (in GitHub Copilot) →

What it is

Kimi K2.7 Code is Moonshot AI's open-weight coding model, and as of July 1 it's a selectable option inside GitHub Copilot's model picker. It is the first open-weight model selectable in Copilot's model picker, hosted by GitHub on Microsoft Azure rather than called out to Moonshot's own API. Architecturally it's a 1T-parameter open-weight MoE coding model with mandatory thinking mode, 256K context, and 30% fewer reasoning tokens than K2.6.

The next-work-session test

You're mid-way through wiring an evals harness — say, a small SDK that runs a suite of prompts through your agent loop, scores tool-call trajectories, and writes JSONL results. It's the kind of task where you're not asking for cleverness, you're asking for a lot of methodical scaffolding: retry logic, schema types, a runner, a report.

Switch the Copilot picker to Kimi K2.7 Code and run one agent-mode session against a scoped folder. The change you're testing: does the "30% fewer reasoning tokens" claim actually show up as faster turns and fewer credits burned on scaffolding work, versus your usual Sonnet or GPT-5 pick? That's the test. Not "is it smarter." Just: is it a viable default for the boring 60% of agent-loop work.

Pricing

You pay two ways depending on your plan. Kimi K2.7 Code is hosted by GitHub on Microsoft Azure. This model is billed at provider list pricing under usage-based billing. For post-June-1 accounts, that means AI Credits — Copilot usage is now billed through GitHub AI Credits where 1 credit equals $0.01, and GitHub's docs price each model by input, cached input, and output tokens. Rollout is gated: Kimi K2.7 Code is beginning to roll out to Copilot Pro, Pro+, and Max plans.

Raw model economics (Moonshot's list): API pricing is $0.95 per 1M input tokens and $4.00 per 1M output — that's the reference point, not necessarily what Copilot passes through. GitHub hasn't published a plain per-request multiplier for K2.7 in its official docs at the time of writing; third-party trackers report roughly parity with a 1.0× pinned rate, but treat that as unverified until it appears in GitHub's own pricing page. See GitHub Copilot plans & pricing for the current model list.

What we'd actually use it for

Long, mechanical agent runs where the token budget matters more than the ceiling of the model. Test scaffolding. SDK glue code. Fixture generation. Refactors across a package where you want the loop to grind for twenty minutes without you flinching at the credit meter. The 256K context is genuinely useful for eval work — feeding in a whole trajectory log and asking for failure-mode clustering, for example.

The pitch we're not buying: that this replaces your frontier pick for design work or hard debugging. Use it where volume matters.

Limits

Two real ones.

The benchmarks are self-reported. As of mid-June 2026, there are no independent third-party numbers for K2.7 Code on the standard public suites - no SWE-bench Verified, SWE-bench Pro, Terminal-Bench, LiveCodeBench, GPQA Diamond, AIME, or MMLU-Pro. That absence does not mean the model is weak; it means the public evidence is Moonshot's own. If your team runs its own evals, run them.

Long-context recall degrades before the stated window ends. Recall starts to degrade past ~180K (similar to other 256K models). Don't design your agent loop assuming you can pack the full 256K and get uniform attention across it.

Admin friction for orgs. Business and Enterprise tenants need policies flipped on before developers see the option — this isn't a silent addition, and if you're on a managed plan you may not see K2.7 in your picker at all until an admin acts.

Provenance considerations. Open weights are open, but the hosted model still runs through GitHub/Azure. If your compliance posture cares about model origin (not just where it's hosted), that's a conversation to have before pinning K2.7 as a default.

Try it if

You're burning credits on long agent-mode sessions and want a cheaper default for scaffolding-heavy work.
You're doing eval or SDK work where tool-call trajectories are long and you want a model tuned for agent loops.
You want to run your own bake-off before frontier-model bills renew.
You're on Copilot Pro, Pro+, or Max and already see it in the picker.

Skip it if

You rely on published third-party benchmarks to justify model choices to a review board — the data isn't there yet.
Your work sits above 180K tokens of active context and you need reliable recall across all of it.
You're on Business/Enterprise and can't get the admin policy flipped this quarter.
Your current model choice is bottlenecked by reasoning quality, not throughput. K2.7's win is efficiency, not a new ceiling.

Source: github.blog