Enable job alerts via email!

Engineer/Senior Engineer, AI Infrastructure (Perception & Planning)

BLACK SESAME TECHNOLOGIES (SINGAPORE) PTE. LTD.

Singapore

On-site

SGD 80,000 - 120,000

Full time

Today

Be an early applicant

Generate a tailored resume in minutes

Land an interview and earn more. Learn more

Job summary

A tech company in Singapore is seeking a highly skilled engineer to design and optimize the GPU/AI infrastructure for their Perception & Planning stack. The ideal candidate will have a strong ML/CV background and expert coding skills in C++ and Python. Responsibilities include architecting large-scale training pipelines, profiling and eliminating bottlenecks, implementing key performance components, and leading distributed training efforts. A Master's or Ph.D. in a relevant field is required, alongside experience with GPU profiling and tuning.

Qualifications

Master’s or Ph.D. in Computer Science, Electrical/Computer Engineering, or related technical discipline.
Strong foundation in ML/CV with proven experience in GPU/AI infrastructure and performance optimization.
Hands-on experience with GPU profiling and tuning.

Responsibilities

Architect and optimize large-scale training pipelines.
Profile end-to-end pipelines and eliminate bottlenecks.
Implement performance-critical components in CUDA/C++.
Tune GPU utilization and memory hierarchy.
Drive model conversion and deployment workflows.
Lead distributed training scaling and orchestration.
Build reliability and observability into systems.
Maintain benchmarks and profiling reports.

Skills

Expert-level coding in C++

Expert-level coding in Python

Strong foundation in ML/CV

GPU profiling

Performance optimization

Education

Master’s or Ph.D. in Computer Science, Electrical/Computer Engineering

Tools

CUDA

ONNX

TensorRT

NCCL

Position Overview:

We are looking for a highly skilled engineer to design and optimize the GPU/AI infrastructure behind our Perception & Planning stack, covering object detection, segmentation, depth estimation, and trajectory planning.

This role is technical: you will push the limits of GPU efficiency, distributed training, and real-time inference, turning state-of-the‑art research into production‑ready systems.

Responsibilities

Architect and optimize large-scale training pipelines with advanced techniques (FSDP/ZeRO-DP, tensor/pipeline parallelism, activation checkpointing, CPU/NVMe offloading, FlashAttention, mixed precision/bfloat16, comm/comp overlap).
Profile end‑to‑end pipelines (data → GPU kernels → inference) and eliminate bottlenecks using tools such as torch.profiler, Nsight Systems, Nsight Compute, TensorBoard Profiler, and low‑level debuggers (perf, NVTX/NCCL tracing).
Implement performance‑critical components in CUDA/C++ (custom kernels, TensorRT plugins, efficient memory layouts).
Tune GPU utilization, memory hierarchy (HBM, L2, shared), and communication efficiency (PCIe/NVLink/NCCL) to maximize throughput and minimize latency.
Drive model conversion and deployment workflows (ONNX/TensorRT, mixed precision, quantization) with strict real‑time FPS requirements.
Lead distributed training scaling and orchestration (multi‑node DDP/FSDP, NCCL tuning, experiment automation).
Build reliability and observability into systems with low‑overhead logging, metrics, and health monitoring.
Maintain benchmarks, profiling reports, and best‑practice documentation to guide the team.

Qualifications

Master’s or Ph.D. in Computer Science, Electrical/Computer Engineering, or related technical discipline.
Strong foundation in ML/CV with proven experience in GPU/AI infrastructure and performance optimization.
Expert‑level coding in C++ and Python; ability to implement, debug, and optimize CUDA kernels.
Hands‑on experience with GPU profiling and tuning, with a track record of improving throughput, utilization, and memory efficiency.
Familiarity with ONNX, TensorRT, NCCL, and other performance‑oriented frameworks and libraries.
Demonstrated success deploying real‑time inference systems on GPUs/edge devices.
Strong problem‑solving, debugging, and performance‑analysis skills; thrives in low‑level, high‑performance system challenges.

Get your free, confidential resume review.

or drag and drop a PDF, DOC, DOCX, ODT, or PAGES file up to 5MB.

Top companies

Popular jobs