News & analysis, rated.
Breaking AI developments, in-depth guides, real-world case studies, and analysis — each one rated so you know what matters.
FP8 quantization cuts CLIP model size 34-50%, latency drops 1.4x on Ada GPUs
NVIDIA's TensorRT and ModelOpt toolchain converts FP8-quantized checkpoints into production engines. Real benchmarks on RTX 6000 Ada show image encoder latency falling from 166ms to 120ms. How to export, compile, and profile your own.
NVIDIA DGX Spark adds lifecycle management for air-gapped AI fleets
NVIDIA's Enterprise Manageability framework for DGX Spark systems covers provisioning, monitoring, and retirement over SSH without agents. Ships with six operational phases and works with existing IT tools.
AI Data Centers Need Battery Storage to Match Power Demand
NVIDIA outlines why battery energy storage systems are now essential infrastructure for AI factories running power-dense workloads. Learn what makes BESS design different from traditional data centers.
DiffusionGemma hits 1,000 tokens/sec on H100, cuts real-time AI latency
Google DeepMind's parallel text generation model runs on NVIDIA GPUs with up to 1,000 tokens per second. Developers get Day 0 support across local and production hardware—here's how to deploy.
Cohere's 30B Coding Model Matches 120B Rivals on agentic Tasks
North Mini Code, a sparse mixture-of-experts model with 3B active parameters, scores 33.4 on Artificial Analysis' Coding Index, outperforming much larger open-source models. Available now under Apache 2.0.
Seven ASR models tested on bilingual speech — Scribe V2 and Gemini 3 Flash lead
ServiceNow benchmarked frontier ASR systems on code-switched speech across four language pairs. ElevenLabs Scribe V2 topped transcription accuracy; Gemini 3 Flash excelled at preserving meaning for downstream tasks.
Read PyTorch traces to spot where your model wastes GPU time
Hugging Face shows how to profile nn.Linear and MLPs using PyTorch's built-in tracer. Learn why compile helps stacked ops but not single layers, and how to read kernel names.
Google DeepMind picks 15 European robotics startups for 3-month AI accelerator
Google DeepMind is backing 15 early-stage robotics companies across Europe with mentorship, AI models, and technical support. The cohort tackles healthcare, manufacturing, waste sorting, and ocean monitoring.
Google's Gemma 4 12B Runs Multimodal AI on Your Laptop
DeepMind shipped a 12-billion-parameter model with audio and vision support that fits in 16GB of RAM. No separate encoders, lower latency, and open-source weights: here's what works and what remains unproven.
Gemini 3.5 Live Translate hits 70 languages with seconds of latency
Google's new speech-to-speech model detects languages automatically and preserves speaker tone in near real-time. Rolling out to Google Meet, Translate, and developer APIs this month.
DeepMind funds $10M multi-agent AI safety research push
Google DeepMind and four partners launch a global research call to study how millions of independent AI agents will interact safely. Proposals due August 8, 2026.
DeepMind DiffusionGemma hits 4x faster text generation on GPUs
DeepMind's 26B Gemma model generates 256 tokens in parallel instead of one-by-one, reaching 1000+ tokens/sec on H100s. Built for local inference and interactive editing—but quality trails standard Gemma 4.