News & Analysis · 2684 stories

News & analysis, rated.

Breaking AI developments, in-depth guides, real-world case studies, and analysis — each one rated so you know what matters.

Incremental
June 11, 2026 · 2 min

FP8 quantization cuts CLIP model size 34-50%, latency drops 1.4x on Ada GPUs

NVIDIA's TensorRT and ModelOpt toolchain converts FP8-quantized checkpoints into production engines. Real benchmarks on RTX 6000 Ada show image encoder latency falling from 166ms to 120ms. How to export, compile, and profile your own.

Incremental
June 11, 2026 · 3 min

NVIDIA DGX Spark adds lifecycle management for air-gapped AI fleets

NVIDIA's Enterprise Manageability framework for DGX Spark systems covers provisioning, monitoring, and retirement over SSH without agents. Ships with six operational phases and works with existing IT tools.

Verified
June 11, 2026 · 3 min

AI Data Centers Need Battery Storage to Match Power Demand

NVIDIA outlines why battery energy storage systems are now essential infrastructure for AI factories running power-dense workloads. Learn what makes BESS design different from traditional data centers.

Incremental
June 11, 2026 · 3 min

DiffusionGemma hits 1,000 tokens/sec on H100, cuts real-time AI latency

Google DeepMind's parallel text generation model runs on NVIDIA GPUs with up to 1,000 tokens per second. Developers get Day 0 support across local and production hardware—here's how to deploy.

Incremental
June 11, 2026 · 3 min

Cohere's 30B Coding Model Matches 120B Rivals on agentic Tasks

North Mini Code, a sparse mixture-of-experts model with 3B active parameters, scores 33.4 on Artificial Analysis' Coding Index, outperforming much larger open-source models. Available now under Apache 2.0.

Incremental
June 11, 2026 · 3 min

Seven ASR models tested on bilingual speech — Scribe V2 and Gemini 3 Flash lead

ServiceNow benchmarked frontier ASR systems on code-switched speech across four language pairs. ElevenLabs Scribe V2 topped transcription accuracy; Gemini 3 Flash excelled at preserving meaning for downstream tasks.

Verified
June 11, 2026 · 3 min

Read PyTorch traces to spot where your model wastes GPU time

Hugging Face shows how to profile nn.Linear and MLPs using PyTorch's built-in tracer. Learn why compile helps stacked ops but not single layers, and how to read kernel names.

Verified
June 11, 2026 · 2 min

Google DeepMind picks 15 European robotics startups for 3-month AI accelerator

Google DeepMind is backing 15 early-stage robotics companies across Europe with mentorship, AI models, and technical support. The cohort tackles healthcare, manufacturing, waste sorting, and ocean monitoring.

Incremental
June 11, 2026 · 3 min

Google's Gemma 4 12B Runs Multimodal AI on Your Laptop

DeepMind shipped a 12-billion-parameter model with audio and vision support that fits in 16GB of RAM. No separate encoders, lower latency, and open-source weights: here's what works and what remains unproven.

Incremental
June 11, 2026 · 3 min

Gemini 3.5 Live Translate hits 70 languages with seconds of latency

Google's new speech-to-speech model detects languages automatically and preserves speaker tone in near real-time. Rolling out to Google Meet, Translate, and developer APIs this month.

Verified
June 11, 2026 · 2 min

DeepMind funds $10M multi-agent AI safety research push

Google DeepMind and four partners launch a global research call to study how millions of independent AI agents will interact safely. Proposals due August 8, 2026.

Incremental
June 11, 2026 · 3 min

DeepMind DiffusionGemma hits 4x faster text generation on GPUs

DeepMind's 26B Gemma model generates 256 tokens in parallel instead of one-by-one, reaching 1000+ tokens/sec on H100s. Built for local inference and interactive editing—but quality trails standard Gemma 4.