Rust Jobs for Rustaceans 
The hottest Rust jobs in one place. Bookmark this page and tell a friend :)
Latest jobs
Showing 151-151 of 151 jobs

RDMA Engineer - Supercomputing
xAI
Active - posted 19 days ago
On-site
United States
Senior
AI
$180K/yr - $440K/yr
Job Description
Key Responsibilities:
- Develop and tune RDMA-based communication systems leveraging NVIDIA GPUs and Mellanox NICs (InfiniBand, RoCE) for ultra-fast data transfer between nodes.
- Implement and optimize GPUDirect RDMA to enable direct memory access between GPUs and network interfaces, minimizing CPU overhead.
- Integrate RDMA solutions with Kubernetes-based workloads, ensuring seamless operation across distributed compute and storage systems.
- Collaborate with AI researchers and infrastructure teams to accelerate data pipelines and collective communications using NCCL and MPI.
- Troubleshoot and resolve performance bottlenecks in high-throughput, low-latency networking environments.
Requirements:
- Hands-on experience with NVIDIA RDMA technologies (e.g., GPUDirect RDMA, RoCE, InfiniBand) in HPC or AI supercomputing environments.
- Proficiency in programming with Rust, C, or C++ for low-level networking and system optimization.
- Familiarity with NVIDIA’s networking stack, including Mellanox drivers, libraries (e.g., libibverbs), and tools (e.g., NVPeerMemory).
- Experience optimizing distributed systems with MPI, NCCL, or similar frameworks for GPU-accelerated workloads.
- Knowledge of Kubernetes networking and integrating RDMA into containerized environments.
- Bonus: Background in AI/ML training workflows and their networking demands (e.g., large-scale parameter synchronization).
Tech Stack
- NVIDIA GPUs and Mellanox networking (InfiniBand, RoCE)
- RDMA protocols (e.g., GPUDirect RDMA, RoCEv2)
- Kubernetes
- Rust and C/C++
- MPI (Message Passing Interface) and NCCL (NVIDIA Collective Communications Library)