Pinned Loading
-
federated-learning-lab
federated-learning-lab PublicFrom-scratch federated learning: FedAvg / FedProx / SCAFFOLD, DP-SGD & secure aggregation, plus FedPer / Byzantine-robust / FedAdam / FedLoRA. 33/33 tests, literature-cross-validated, with honest n…
Python
-
nccl-collectives-bench
nccl-collectives-bench PublicNCCL collective benchmarks on an 8×H100 NVSwitch host — busbw vs link budget, NVLS/Ring/Tree, small-message latency floors (eager vs CUDA Graph vs symmetric memory), and the TP-decode comms ceiling…
Python
-
nim-agent-blueprint
nim-agent-blueprint PublicAgentic RAG hallucination evaluation on adversarial SQuAD 2.0 (N=200) — nine gate methods compared (self/cross-family/70B judges, PoLL, CoT, MiniCheck, semantic entropy): grounding beats capacity, …
Python
-
trtllm-triton-serving
trtllm-triton-serving PublicTensorRT-LLM vs vLLM controlled head-to-head on H100 — 12 studies including a knob-by-knob waterfall reproducing NVIDIA's published 27.7k tok/s (100.3%) and attributing the gap to real serving, plu…
Python
-
blackwell-tensorcore-kernels
blackwell-tensorcore-kernels PublicHand-written CUDA Tensor Core GEMM kernels on Blackwell (sm_120) and Hopper (sm_90) — raw mma.sync reaching 106% of the cuBLAS-TC kernel on sm_120, CUTLASS 3.x wgmma at 85.5% of nvjet on H100, and …
Cuda
-
physgate
physgate PublicValidate LLM-generated robot plans in GPU physics simulation — best-of-N plan selection against the highest-quality verifier (Isaac Lab + ROS2 + MCP). Research prototype.
Python
If the problem persists, check the GitHub status page or contact support.


