Our Take
DAQIRI solves a real plumbing problem (kernel bottleneck on detector streams) but is infrastructure, not a capability leap; CERN's A-GHOST use case is exploratory R&D, not deployment.
Why it matters
Scientific instruments now generate data faster than traditional acquire-store-analyze pipelines can handle. Researchers who can filter and infer at stream ingestion time will recover signal lost to real-time rejection.
Do this week
If you operate high-bandwidth detectors or sensors (>10 Gbps), audit whether your current data path uses kernel stack or kernel bypass; if the former, evaluate DAQIRI against DPDK alternatives before next hardware procurement.
NVIDIA Ships Kernel-Bypass Streaming Library for Detector Data
NVIDIA released DAQIRI (Data Acquisition for Integrated Real-time Instruments), a networking library that routes high-bandwidth detector and sensor streams directly to GPU memory without transiting the Linux kernel. The library handles UDP and RoCE v2 traffic at 100+ Gbps line rate by leveraging the Data Plane Development Kit (DPDK) and GPU Direct Memory Access (DMA), reducing latency to the PCIe transit time alone.
The core mechanism is kernel bypass: incoming network packets land directly in GPU ring buffers instead of kernel network stack queues. DAQIRI automatically reorders packets into contiguous GPU tensors, converts data types (for example, int4 wire format to fp16 GPU format) during this step, and hands the application a single pointer to a GPU-resident batch ready for inference or filtering.
Configuration is YAML-driven. Developers specify NIC address, GPU affinity, flow rules (UDP port filtering), and reorder parameters. The application code then becomes a loop: call get_rx_burst(), receive a GPU tensor, run inference, return the buffer. NVIDIA provides C++ and Python APIs and sample configurations.
CERN Is Using It to Search Data It Currently Throws Away
The High-Luminosity LHC upgrade will increase collision rates by 10x. Even with improved selection hardware, ATLAS will still reject more than 99% of collisions in real time due to storage bandwidth limits. Today, those rejected events are lost.
CERN Openlab, University of Chicago, and UCL are exploring A-GHOST, which uses DAQIRI to stream the discarded collision data to a GPU farm adjacent to the detector. There, Convolutional Auto-Encoders, temporal CNNs, and transformer models can run on the full stream to identify anomalies or rare signals the hardware trigger missed. This is R&D: prototypes are being tested with FPGA-based hardware boards planned for HL-LHC deployment.
The insight is structural: if kernel overhead was the bottleneck, software-defined streaming can recover analysis capability that hardware constraints alone would forfeit. For CERN, that means potentially recovering rare physics events buried in rejected data.
Where This Applies and What to Watch
DAQIRI is relevant to any instrument or sensor pipeline generating 10+ Gbps sustained. Examples include LCLS-II (1 MHz photon pulse rate), industrial CT scanners, and high-bandwidth software-defined radios. The payoff is most acute when:
- Storage or downstream processing is the bottleneck, not data generation.
- Real-time filtering or inference at ingestion can reduce data volume or surface actionable events immediately.
- You already have GPU compute co-located with the instrument.
Check whether your data path already uses DPDK or another kernel-bypass mechanism. If so, DAQIRI's value is reducing boilerplate and integration work, not removing a fundamental bottleneck. If you are still using standard Linux sockets, measure kernel CPU overhead on your actual traffic; DAQIRI's cost reduction is proportional to that overhead.
CERN's A-GHOST is exploratory, not a production deployment signal. Watch for ATLAS integration during the HL-LHC commissioning phase (expected 2029+) to assess whether GPU-side inference on rejected collisions produces novel physics.