Power costs 40% of AI factory OpEx. Here's how to cut it.

Power becomes the binding constraint on token economics

Power costs account for roughly 40% of operating expenses in large-scale AI inference operations (per NVIDIA). Unlike compute or memory, power is capped by regional grid capacity. This forces a new optimization metric: tokens per watt, which directly maps to cost per token and, by extension, margin per token sold.

NVIDIA's engineering blog outlines three categories of levers:

Hardware: Direct-to-chip liquid cooling at 45°C inlet temperature to raise power usage effectiveness (PUE). The GB200 NVL72 system includes in-rack power smoothing to flatten current spikes and enable denser GPU deployment within the same power envelope.
Software orchestration: NVIDIA DSX, a facility-scale platform that performs real-time power reallocation, dynamic workload scheduling, and recovery of "stranded" power at the rack level. DSX MaxLPS operates within the data center; DSX Flex connects to grid signals.
Model and precision selection: Mixture-of-experts (MoE) models activate only a subset of parameters per token, lowering per-token compute cost relative to dense models. Lower-precision formats like NVFP4 deliver more throughput per watt than FP8 at equivalent accuracy (per NVIDIA benchmarks).

Across six GPU architecture generations, NVIDIA claims inference throughput per megawatt has improved 1,000,000x. The company also cites research from the ML.ENERGY Initiative at the University of Michigan showing that coordinated GPU speed tuning during training (running slower GPUs at lower clock speed while fast GPUs sprint) reduces total training energy by up to 25% without extending wall-clock time.

The efficiency problem is real; attribution is murkier

The efficiency problem is genuine. At megawatt to gigawatt scale, even single-digit percentage gains in tokens-per-watt unlock millions in margin or new capacity without buying new hardware. Inference directly drives revenue, so maximizing inference throughput per watt is a natural priority.

Where the story gets slippery is in mixing claims. The 1,000,000x improvement across six generations reflects Moore's Law and GPU architecture evolution, not a single product or technique. The 45°C liquid cooling, power smoothing, and dynamic reallocation are infrastructure wins. The MoE architecture and precision tuning are model selection wins. An operator cannot pick one lever and expect all three gains.

The ML.ENERGY training work is independent peer-facing research, but the MoE and precision claims rely on NVIDIA's own benchmarks without third-party reproduction. The DeepSeek-R1 example shows that MoE can outperform dense models on intelligence-per-token, but this is architectural, not a NVIDIA-specific advantage.

Separate the infrastructure wins from the model wins

If your data center is power-constrained and you have capital, liquid cooling and dynamic power allocation (DSX) are real ROI levers that apply to any workload. If you are software-focused, precision tuning (NVFP4 vs FP8) on your inference engine (TensorRT-LLM) is a starting point, but requires benchmarking on your exact workload mix. MoE selection is a model choice, not a facility choice, and trades inference latency for efficiency.

NVIDIA DSX is described as an "open platform" but is explicitly tied to NVIDIA compute and OEM partners. Operators locked into alternative accelerators or multi-vendor deployments will not see the full stack of gains claimed.

The most actionable claim is the training speed tuning: if you are running Megatron-LM training at scale, profiling your critical path and intentionally lowering clock speed on non-critical GPUs can recover 10–25% energy with no wall-clock penalty (per ML.ENERGY / NVIDIA research). This is a software knob, not a hardware buy.

Power costs 40% of AI factory OpEx. Here's how to cut it.

Our Take

Why it matters

Do this week

Power becomes the binding constraint on token economics

The efficiency problem is real; attribution is murkier

Separate the infrastructure wins from the model wins

Related stories

Nephrology trials cost $30M for Phase III. Biomarkers cut time to decision.

Three Pneumonia Subtypes Found in Lung Fluid, Not Blood Tests

80% of Medicare denials get overturned on appeal — but almost no one appeals