Tartan: Evaluating Modern GPU Interconnect via a Multi-GPU Benchmark Suite
-
Updated
Sep 12, 2018 - Cuda
Tartan: Evaluating Modern GPU Interconnect via a Multi-GPU Benchmark Suite
NUMA-aware multi-CPU multi-GPU data transfer benchmarks
This script collects some informations about NVLink and PCI bus traffic of NVidia GPUs. Results are published as prometheus metrics via a websocket.
Ares: Multi-Cluster Kubernetes Scheduler with GPU Topology Optimization (Intra-Node, Inter-Node, Inter-Cluster) and Exactly-Once Execution Semantics
GPU-native agent-swarm orchestration for the NVIDIA AI stack — NeMo, NIM, Triton, DCGM, NGC, NIXL, OpenShell. Spawn GPU-pinned agent teams across DGX/HGX nodes with NVLink-aware scheduling, task DAGs, adaptive scheduling, and full observability.
Close-to-metal C/CUDA lab for RL inference fast paths: persistent GPU workers, hugepage KV arenas, cacheline-aware command rings, and async reward handoff. Goal: remove page faults, malloc/free, scheduler wakeups, CPU round-trips, and KV migration from the per-token path.
C++ command-line tool for managing NVIDIA Fabric Manager partitions. Supports non-interactive mode and advanced partition operations.
Open hardware desktop AI node: 4× Tesla V100, 128GB HBM2, PCIe/NVLink topology and V-Core liquid/air cooling.
V-AXION-512: Dual-Tier Post-Entropic Framework. I. PROTOCOL: SR-512, GS-512 & G-STORM-512 for O(1) deterministic state recovery. II. ECOSYSTEM: NEPTUNE-PHX, PHX-BUSLINK, KALMAN-ANCHOR, PHX-GENESIS, DIRECT-FABRIC & AETERNA-FLUX for Sigma-H energy harvesting. — Juho Artturi Hemminki (2026)
What to consider when running AI Inference at scale on Kubernetes
A hybrid testbed for evaluating top open-source LLMs (like gpt-oss-20b and Llama 3.3) on local, cloud GPUs, and AWS Inferentia2/Trainium instances, focusing on vLLM optimization, capacity management, kernel bypass, hardware-software co-design, as well as supporting infrastructure such as NCCL, RDMA, NVMeoF.
Comprehensive NCA-AIIO exam prep: study notes, diagrams, screenshots, and field experience for the NVIDIA Certified Associate: AI Infrastructure and Operations certification.
NCCL collective benchmarks on an 8×H100 NVSwitch host — busbw vs link budget, NVLS/Ring/Tree, small-message latency floors (eager vs CUDA Graph vs symmetric memory), and the TP-decode comms ceiling they imply. Includes a quiet-box rerun methodology for attribution.
Open-source stencil-aware multi-GPU Conjugate Gradient solver on 8× A100 NVLink. 2.07× SpMV vs cuSPARSE · 1.44× above NVIDIA AmgX · 93.5% strong scaling efficiency. Profiled with Nsight Systems & Nsight Compute.
Provide open-source access to detailed AI hardware specs, benchmarks, and infrastructure data for informed decision-making and analysis.
Add a description, image, and links to the nvlink topic page so that developers can more easily learn about it.
To associate your repository with the nvlink topic, visit your repo's landing page and select "manage topics."