NVIDIA DAQIRI Streams 100+ Gbps Detector Data Straight to GPU

NVIDIA Ships Kernel-Bypass Streaming Library for Detector Data

NVIDIA released DAQIRI (Data Acquisition for Integrated Real-time Instruments), a networking library that routes high-bandwidth detector and sensor streams directly to GPU memory without transiting the Linux kernel. The library handles UDP and RoCE v2 traffic at 100+ Gbps line rate by leveraging the Data Plane Development Kit (DPDK) and GPU Direct Memory Access (DMA), reducing latency to the PCIe transit time alone.

The core mechanism is kernel bypass: incoming network packets land directly in GPU ring buffers instead of kernel network stack queues. DAQIRI automatically reorders packets into contiguous GPU tensors, converts data types (for example, int4 wire format to fp16 GPU format) during this step, and hands the application a single pointer to a GPU-resident batch ready for inference or filtering.

Configuration is YAML-driven. Developers specify NIC address, GPU affinity, flow rules (UDP port filtering), and reorder parameters. The application code then becomes a loop: call get_rx_burst(), receive a GPU tensor, run inference, return the buffer. NVIDIA provides C++ and Python APIs and sample configurations.

CERN Is Using It to Search Data It Currently Throws Away

The High-Luminosity LHC upgrade will increase collision rates by 10x. Even with improved selection hardware, ATLAS will still reject more than 99% of collisions in real time due to storage bandwidth limits. Today, those rejected events are lost.

CERN Openlab, University of Chicago, and UCL are exploring A-GHOST, which uses DAQIRI to stream the discarded collision data to a GPU farm adjacent to the detector. There, Convolutional Auto-Encoders, temporal CNNs, and transformer models can run on the full stream to identify anomalies or rare signals the hardware trigger missed. This is R&D: prototypes are being tested with FPGA-based hardware boards planned for HL-LHC deployment.

The insight is structural: if kernel overhead was the bottleneck, software-defined streaming can recover analysis capability that hardware constraints alone would forfeit. For CERN, that means potentially recovering rare physics events buried in rejected data.

Where This Applies and What to Watch

DAQIRI is relevant to any instrument or sensor pipeline generating 10+ Gbps sustained. Examples include LCLS-II (1 MHz photon pulse rate), industrial CT scanners, and high-bandwidth software-defined radios. The payoff is most acute when:

Storage or downstream processing is the bottleneck, not data generation.
Real-time filtering or inference at ingestion can reduce data volume or surface actionable events immediately.
You already have GPU compute co-located with the instrument.

Check whether your data path already uses DPDK or another kernel-bypass mechanism. If so, DAQIRI's value is reducing boilerplate and integration work, not removing a fundamental bottleneck. If you are still using standard Linux sockets, measure kernel CPU overhead on your actual traffic; DAQIRI's cost reduction is proportional to that overhead.

CERN's A-GHOST is exploratory, not a production deployment signal. Watch for ATLAS integration during the HL-LHC commissioning phase (expected 2029+) to assess whether GPU-side inference on rejected collisions produces novel physics.

NVIDIA DAQIRI Streams 100+ Gbps Detector Data Straight to GPU

Our Take

Why it matters

Do this week

NVIDIA Ships Kernel-Bypass Streaming Library for Detector Data

CERN Is Using It to Search Data It Currently Throws Away

Where This Applies and What to Watch

Related stories

Same Model, Different Results: Legal AI Scaffold Beats Raw Model Power

1 in 3 lawyers use unapproved AI; 25% want to leave

Your Legal Team Is Drowning in Volume, Not Complexity