Enable job alerts via email!

ML Systems/Infrastructure Engineer

Oriole Networks

Greater London

On-site

GBP 80,000 - 100,000

Full time

3 days ago

Be an early applicant

Generate a tailored resume in minutes

Land an interview and earn more. Learn more

Job summary

A tech-focused network company in Greater London is seeking a talented ML Systems/Infrastructure Engineer. The role focuses on optimizing the AI/ML software stack with advanced network hardware. Candidates should be proficient in C++ and Python, with extensive experience in GPU programming and communication protocols. Responsibilities include designing GPU communication kernels, debugging applications, and collaborating on system architecture. Competitive compensation and engaging projects await.

Qualifications

Strong track record in high-performance computing or machine learning projects.
Deep knowledge of GPU memory hierarchies and kernel optimization.
Hands-on experience in deploying and optimizing workloads in production.

Responsibilities

Design and optimize custom GPU communication kernels.
Develop distributed communication frameworks for deep learning models.
Profile and debug GPU applications.
Integrate optimized kernels with next-generation hardware and software.
Contribute to system-level architecture decisions for GPU clusters.

Skills

C++

Python

GPU programming with CUDA

Debugging GPU kernels

Communication libraries and protocols

HPC networking protocols/libraries

Distributed deep learning frameworks

Large-scale deployment

Tools

Cuda-gdb

Cuda Memcheck

NSight Systems

Docker

Kubernetes

SLURM

OpenMPI

GPU drivers

Oriole is seeking a talented ML Systems/Infrastructure Engineer to help co‑optimize our AI/ML software stack with cutting‑edge network hardware. You’ll be a key contributor to a high‑impact, agile team focused on integrating middleware communication libraries and modelling the performance of large‑scale AI/ML workloads.

Key Responsibilities

Design and optimize custom GPU communication kernels to enhance performance and scalability across multi‑node environments.
Develop and maintain distributed communication frameworks for large‑scale deep learning models, ensuring efficient parallelization and optimal resource utilization.
Profile, benchmark, and debug GPU applications to identify and resolve bottlenecks in communication and computation pipelines.
Collaborate closely with hardware and software teams to integrate optimized kernels with Oriole’s next‑generation network hardware and software stack.
Contribute to system‑level architecture decisions for large‑scale GPU clusters, focusing on communication efficiency, fault tolerance, and novel architectures for advanced optical network infrastructure.

Required Skills & Experience

Proficient in C++ and Python, with a strong track record in high‑performance computing or machine learning projects.
Expertise in GPU programming with CUDA, including deep knowledge of GPU memory hierarchies and kernel optimization.
Hands‑on experience debugging GPU kernels using tools such as Cuda‑gdb, Cuda Memcheck, NSight Systems, PTX, and SASS.
Strong understanding of communication libraries and protocols, including NCCL, NVSHMEM, OpenMPI, UCX, or custom collective communication implementations.
Familiarity with HPC networking protocols/libraries such as RoCE, Infiniband, Libibverbs, and libfabric.
Experience with distributed deep learning MoE frameworks, including PyTorch Distributed, vLLM, or DeepEP.
Solid understanding of deploying and optimizing large‑scale distributed deep learning workloads in production environments, including Linux, Kubernetes, SLURM, OpenMPI, GPU drivers, Docker, and CI/CD automation.

Get your free, confidential resume review.

or drag and drop a PDF, DOC, DOCX, ODT, or PAGES file up to 5MB.

Top cities

Top companies

Popular jobs