Enable job alerts via email!

Senior Benchmark & Performance Engineer - AI & Storage Systems

DataDirect Networks

United States

Remote

USD 120,000 - 150,000

Full time

Today

Be an early applicant

Job summary

A technology company is seeking a Senior Benchmark & Performance Engineer with expertise in AI workloads and performance engineering. The role involves designing benchmarks, optimizing performance across infrastructures, and collaborating with engineering teams. Ideal candidates have over 7 years of experience in performance engineering and familiarity with AI frameworks. This position offers remote work opportunities.

Qualifications

7+ years of experience in performance engineering or HPC/AI systems.
Deep experience with AI/ML frameworks like PyTorch and TensorFlow.
Strong Linux skills including debugging and tuning.

Responsibilities

Design and execute performance benchmarks across AI and HPC platforms.
Compile and debug applications written in C/C++ and Python.
Write clear performance reports for technical and non-technical stakeholders.

Skills

Performance engineering

Benchmarking

AI workloads

Parallel applications

Storage systems

Python scripting

Linux debugging

Communication skills

Tools

PyTorch

TensorFlow

Excel

CUDA

MPI

Slurm

Prometheus

Grafana

Overview

Senior Benchmark & Performance Engineer - AI & Storage Systems

Job Locations: US-Remote

Job ID: 2025-5400

Name Linked: Remote: US

Country: United States

City: Remote

Worker Type: Regular Full-Time Employee

Job Description

We are seeking an experienced Senior Benchmark Engineer with deep expertise in AI workloads, parallel applications, and storage systems. You will be responsible for designing, executing, and analyzing complex benchmarks to evaluate and optimize performance across a range of infrastructure stacks including AI inference, training, NVIDIA NIMs, RAG pipelines, and MPI-based HPC codes.

This role involves compiling and debugging large-scale distributed applications, creating automated benchmark pipelines, writing up detailed technical reports, and working closely with both engineering and field teams to communicate findings and architectural advantages.

Key Responsibilities

Design and execute performance benchmarks across AI, HPC, and storage platforms.
Run and tune AI inference workloads using frameworks such as PyTorch, TensorFlow, Triton, NVIDIA NIMs, and vector databases.
Benchmark large-scale RAG pipelines including data ingestion, retrieval, and inference performance.
Profile and optimize MPI and multi-node distributed applications.
Compile and debug C/C++, Python, and CUDA-based codes across heterogeneous systems.
Generate automated test scripts and benchmarking workflows (e.g., with Bash, Python, or Slurm job scripts).
Analyze and visualize results using Excel, Jupyter, or reporting tools; create comparison graphs and KPIs.
Write clear, concise performance reports for both technical and non-technical stakeholders.
Present findings internally and externally, translating results into architectural guidance for field engineers and sales teams.
Collaborate with system engineers, product managers, and partners to tune and improve software/hardware stack performance.
Validate and tune performance on storage systems including parallel file systems (e.g., Lustre, GPFS), object storage, and NVMe over Fabrics.
Contribute to internal tooling to automate test cycles and performance regression tracking.

Required Qualifications

7+ years of experience in performance engineering, benchmarking, or HPC/AI systems.
Deep experience with AI/ML and deep learning frameworks (PyTorch, TensorFlow, ONNX, Triton).
Familiarity with NVIDIA NIMs and containerized model serving stacks.
Proven expertise with MPI, OpenMP, Slurm or similar schedulers in large-scale compute environments.
Solid understanding of file and storage systems (e.g., POSIX, Lustre, S3, NVMe-oF).
Strong Linux skills (debugging, tuning, networking, storage stack).
Proficiency in scripting (e.g., Bash, Python) for job orchestration and result parsing.
Ability to create clear Excel graphs and presentations from raw benchmark data.
Strong communication skills - able to convey technical results and trade-offs to engineering and customer-facing teams.

Preferred Skills

Experience with RAG pipelines, vector databases (e.g., FAISS, Milvus, Qdrant).
Familiarity with Kubernetes and CSI-based persistent volume systems.
Understanding of GPU profiling tools (Nsight, nvprof, PyTorch Profiler).
Knowledge of telemetry and monitoring frameworks (e.g., Prometheus, Grafana).
Prior work publishing or presenting technical performance results.

Personal Attributes

Self-driven, resourceful, and capable of independent problem-solving.
Able to context-switch between deep technical work and high-level communication.
Comfortable working across distributed teams and time zones.

DDN has a very strong orientation towards these 4 characteristics and any successful employee will demonstrate these capabilities:

Self-Starter - Takes independent action to identify and solve problems. Seeks out relevant information needed to make decisions. Gets involved with new initiatives.

Success/Achievement Orientation - Delivers quality results consistently. Targets, achieves (or exceeds) measurable results. Sets challenging goals, focuses on critical priorities, and is accountable.

Problem Solving - Recognizes problems and responds with a systematic assessment that identifies and addresses cause of issue. Practical, realistic, and resourceful.

Innovative - Builds and improves key business processes that enhance the effectiveness of DDN. Generates new ideas, challenges the status quo, and solves problems creatively.

DataDirect Networks, Inc. is an Equal Opportunity/Affirmative Action employer. All qualified applicants will receive consideration for employment without regard to race, color, religion, gender, gender identity, gender expression, transgender, sex stereotyping, sexual orientation, national origin, disability, protected Veteran Status, or any other characteristic protected by applicable federal, state, or local law.

#LI-Remote

Get your free, confidential resume review.

or drag and drop a PDF, DOC, DOCX, ODT, or PAGES file up to 5MB.

Top cities

Top companies

Popular jobs

Senior Benchmark & Performance Engineer - AI & Storage Systems

DataDirect Networks

United States

Remote

USD 120,000 - 150,000