Enable job alerts via email!

Senior Benchmark & Performance Engineer – AI & Storage Systems

Data Direct Networks

United States

Remote

USD 120,000 - 160,000

Full time

Today
Be an early applicant

Job summary

A leading data storage company is seeking a Senior Benchmark Engineer to optimize performance across AI workloads and storage systems. This role involves crafting benchmarks, collaborating with engineering teams, and providing insights to improve system performance. Ideal candidates should have extensive experience with AI frameworks and strong Linux skills. Join to shape the future of AI data management in a dynamic environment.

Benefits

Equal Opportunity employer
Opportunities for innovation
Flexible working hours

Qualifications

  • 7+ years of experience in performance engineering or HPC/AI systems.
  • Deep experience with AI/ML frameworks like PyTorch and TensorFlow.
  • Proficiency in debugging, tuning, and networking in Linux.

Responsibilities

  • Design and execute performance benchmarks across various platforms.
  • Analyze and visualize results using reporting tools.
  • Collaborate with teams to improve software/hardware performance.

Skills

AI/ML frameworks expertise
Parallel applications
Storage systems knowledge
Performance benchmarking
Strong communication skills

Tools

Python
Bash
Excel
CUDA
Slurm
Job description
Overview

This is an incredible opportunity to be part of a company that has been at the forefront of AI and high-performance data storage innovation for over two decades. DataDirect Networks (DDN) is a global market leader renowned for powering many of the world's most demanding AI data centers, in industries ranging from life sciences and healthcare to financial services, autonomous cars, Government, academia, research and manufacturing.

"DDN's A3I solutions are transforming the landscape of AI infrastructure." – IDC

“The real differentiator is DDN. I never hesitate to recommend DDN. DDN is the de facto name for AI Storage in high performance environments” - Marc Hamilton, VP, Solutions Architecture & Engineering | NVIDIA

DDN is the global leader in AI and multi-cloud data management at scale. Our cutting-edge data intelligence platform is designed to accelerate AI workloads, enabling organizations to extract maximum value from their data. With a proven track record of performance, reliability, and scalability, DDN empowers businesses to tackle the most challenging AI and data-intensive workloads with confidence.

Our success is driven by our unwavering commitment to innovation, customer-centricity, and a team of passionate professionals who bring their expertise and dedication to every project. This is a chance to make a significant impact at a company that is shaping the future of AI and data management.

Our commitment to innovation, customer success, and market leadership makes this an exciting and rewarding role for a driven professional looking to make a lasting impact in the world of AI and data storage.

Job Description

We are seeking an experienced Senior Benchmark Engineer with deep expertise in AI workloads, parallel applications, and storage systems. You will be responsible for designing, executing, and analyzing complex benchmarks to evaluate and optimize performance across a range of infrastructure stacks — including AI inference, training, NVIDIA NIMs, RAG pipelines, and MPI-based HPC codes.

This role involves compiling and debugging large-scale distributed applications, creating automated benchmark pipelines, writing up detailed technical reports, and working closely with both engineering and field teams to communicate findings and architectural advantages.

Key Responsibilities:

  • Design and execute performance benchmarks across AI, HPC, and storage platforms.
  • Run and tune AI inference workloads using frameworks such as PyTorch, TensorFlow, Triton, NVIDIA NIMs, and vector databases.
  • Benchmark large-scale RAG pipelines including data ingestion, retrieval, and inference performance.
  • Profile and optimize MPI and multi-node distributed applications.
  • Compile and debug C/C++, Python, and CUDA-based codes across heterogeneous systems.
  • Generate automated test scripts and benchmarking workflows (e.g., with Bash, Python, or Slurm job scripts).
  • Analyze and visualize results using Excel, Jupyter, or reporting tools; create comparison graphs and KPIs.
  • Write clear, concise performance reports for both technical and non-technical stakeholders.
  • Present findings internally and externally, translating results into architectural guidance for field engineers and sales teams.
  • Collaborate with system engineers, product managers, and partners to tune and improve software/hardware stack performance.
  • Validate and tune performance on storage systems including parallel file systems (e.g., Lustre, GPFS), object storage, and NVMe over Fabrics.
  • Contribute to internal tooling to automate test cycles and performance regression tracking.

Required Qualifications:

  • 7+ years of experience in performance engineering, benchmarking, or HPC/AI systems.
  • Deep experience with AI/ML and deep learning frameworks (PyTorch, TensorFlow, ONNX, Triton).
  • Familiarity with NVIDIA NIMs and containerized model serving stacks.
  • Proven expertise with MPI, OpenMP, Slurm or similar schedulers in large-scale compute environments.
  • Solid understanding of file and storage systems (e.g., POSIX, Lustre, S3, NVMe-oF).
  • Strong Linux skills (debugging, tuning, networking, storage stack).
  • Proficiency in scripting (e.g., Bash, Python) for job orchestration and result parsing.
  • Ability to create clear Excel graphs and presentations from raw benchmark data.
  • Strong communication skills — able to convey technical results and trade-offs to engineering and customer-facing teams.

Preferred Skills:

  • Experience with RAG pipelines, vector databases (e.g., FAISS, Milvus, Qdrant).
  • Familiarity with Kubernetes and CSI-based persistent volume systems.
  • Understanding of GPU profiling tools (Nsight, nvprof, PyTorch Profiler).
  • Knowledge of telemetry and monitoring frameworks (e.g., Prometheus, Grafana).
  • Prior work publishing or presenting technical performance results.

Personal Attributes:

  • Self-driven, resourceful, and capable of independent problem-solving.
  • Able to context-switch between deep technical work and high-level communication.
  • Comfortable working across distributed teams and time zones.
DDN

DDN has a very strong orientation towards these 4 characteristics and any successful employee will demonstrate these capabilities:

Self-Starter - Takes independent action to identify and solve problems. Seeks out relevant information needed to make decisions. Gets involved with new initiatives.

Success/Achievement Orientation - Delivers quality results consistently. Targets, achieves (or exceeds) measurable results. Sets challenging goals, focuses on critical priorities, and is accountable.

Problem Solving - Recognizes problems and responds with a systematic assessment that identifies and addresses cause of issue. Practical, realistic, and resourceful.

Innovative - Builds and improves key business processes that enhance the effectiveness of DDN. Generates new ideas, challenges the status quo, and solves problems creatively.

DataDirect Networks, Inc. is an Equal Opportunity/Affirmative Action employer. All qualified applicants will receive consideration for employment without regard to race, color, religion, gender, gender identity, gender expression, transgender, sex stereotyping, sexual orientation, national origin, disability, protected Veteran Status, or any other characteristic protected by applicable federal, state, or local law.

#LI-Remote

Get your free, confidential resume review.
or drag and drop a PDF, DOC, DOCX, ODT, or PAGES file up to 5MB.