Enable job alerts via email!

Director of Engineering - Infinia AI Performance

DataDirect Networks

United States

Remote

USD 130,000 - 170,000

Full time

9 days ago

Boost your interview chances

Create a job specific, tailored resume for higher success rate.

Job summary

An innovative company is seeking a Director of Engineering to lead their AI Engineering organization. This role involves overseeing the design and optimization of large-scale AI/ML training and inference pipelines, guiding a talented team of engineers, and collaborating with cross-functional teams. The ideal candidate will have extensive experience in machine learning engineering and a proven track record of managing high-performing teams. Join a forward-thinking company that is at the forefront of AI and data storage innovation, and make a significant impact in shaping the future of AI infrastructure.

Qualifications

  • 15+ years in machine learning engineering with leadership experience.
  • Proven track record of building and scaling AI/ML pipelines.

Responsibilities

  • Lead a team of senior ML and data engineers, fostering innovation.
  • Oversee the design of optimization for training and inference.

Skills

Machine Learning Engineering
Leadership
Data Streaming
Performance Optimization
Cloud Infrastructure
Problem Solving

Education

Bachelor's or Master's in Computer Science

Tools

Apache Spark
Apache Airflow
MLFlow
Docker
Kubernetes
Terraform

Job description







Director of Engineering - Infinia AI Performance




Job Locations

US-Remote


























Job ID
2025-5114


Name Linked

Remote: US


Country

United States


City

Remote

Worker Type
Regular Full-Time Employee





Overview




This is an incredible opportunity to be part of a company that has been at the forefront of AI and high-performance data storage innovation for over two decades. DataDirect Networks (DDN) is a global market leader renowned for powering many of the world's most demanding AI data centers, in industries ranging from life sciences and healthcare to financial services, autonomous cars, Government, academia, research and manufacturing.

"DDN's A3I solutions are transforming the landscape of AI infrastructure." - IDC

"The real differentiator is DDN. I never hesitate to recommend DDN. DDN is the de facto name for AI Storage in high performance environments" - Marc Hamilton, VP, Solutions Architecture & Engineering | NVIDIA

DDN is the global leader in AI and multi-cloud data management at scale. Our cutting-edge data intelligence platform is designed to accelerate AI workloads, enabling organizations to extract maximum value from their data. With a proven track record of performance, reliability, and scalability, DDN empowers businesses to tackle the most challenging AI and data-intensive workloads with confidence.

Our success is driven by our unwavering commitment to innovation, customer-centricity, and a team of passionate professionals who bring their expertise and dedication to every project. This is a chance to make a significant impact at a company that is shaping the future of AI and data management.

Our commitment to innovation, customer success, and market leadership makes this an exciting and rewarding role for a driven professional looking to make a lasting impact in the world of AI and data storage.






Job Description




We are seeking an experienced and accomplished Director of Engineering to lead our AI Engineering organization. In this role, you will oversee the design, deployment, and optimization of large-scale AI/ML training and inference pipelines using Infinia as foundational data platform. You will lead the development of connectors to open-source frameworks for data streaming, such as, Mosaic Streaming, Ray Data, and Tf.Data and inference optimizations such as K-V caching and LORAX. You will guide a talented organization of engineers focused on advanced end-to-end data platform for ingestion, transformation, preparation, and streaming on high-performance AI applications. Collaborating closely with software developers, product teams, and partners, you will lead experiments with state-of-the-art models using open-source tools and cloud platforms.

Key Responsibilities:

Leadership & Management:

    Lead, mentor, and grow a team of senior ML and data engineers, fostering a culture of innovation and excellence.
  • Set strategic direction for the ML engineering team in alignment with company goals.
  • Lead strategic partnerships on all areas of AI, from conception to execution to delivering, communicating complex technical concepts to non-technical stakeholders effectively.
  • Track, report, and manage the team's performance against project milestones, ensuring on-time delivery of high-quality solutions.
  • Partner with architects, engineers, and cross-functional teams to ensure the delivery of innovative, high-quality technical designs.
  • Implement and refine engineering best practices, driving continuous improvements in quality, performance, and operational efficiency.

Technical Oversight:

  • Lead the integration of data ingestion and streaming pipelines open-source tools, like Ray Data, Mosaic Streaming, Tf.data, Torch Dataloader.
  • Oversee the design of optimization for training like asynchronous checkpointing, and inference, like K-V caching and LORAX.
  • Guide the integration of MLFlow with DDN's Infinia product for comprehensive experiment tracking, model versioning, and deployment.
  • Drive the implementation and scaling of Retrieval-Augmented Generation (RAG) pipelines to enhance generative model performance.
  • Stay abreast of the latest developments in MLOps, AI/ML frameworks, and tooling.
  • Identify and implement solutions to optimize pipeline performance, runtime, and resource utilization on Infinia.

Required Qualifications:

  • Bachelor's or Master's degree in Computer Science, Data Science, Machine Learning, or a related field.
  • 15+ years of experience in machine learning engineering, with at least 10 years in a leadership role.
  • Proven track record of building and scaling AI/ML pipelines and managing high-performing engineering teams.
  • Extensive experience with Apache Spark, Apache Airflow, and MLFlow or equivalent tools.
  • Deep understanding of machine learning frameworks and libraries (TensorFlow, PyTorch, NVIDIA NeMo).
  • Experience deploying open-source vector databases at scale.
  • Proficiency with containerization tools (Docker, Kubernetes) and infrastructure as code (Terraform, Ansible).
  • Solid understanding of cloud infrastructure (AWS, GCP, Azure) and distributed computing.
  • Excellent problem-solving and troubleshooting abilities with a keen eye for performance optimization.
  • Strong leadership, communication, and interpersonal skills.
  • Ability to drive strategic initiatives and manage multiple projects simultaneously.
  • This position requires participation in an on-call rotation to provide after-hours support as needed.

Preferred Skills:

  • Knowledge of NLP techniques and tools for model deployment.
  • Implementation-level understanding of ML frameworks, data loaders and data formats.
  • Experience with scaling RAG pipelines and integrating them with generative AI models.
  • Experience in operationalizing AI/ML models in production environments.

This role offers an exceptional opportunity to lead a high-impact engineering organization at the core of DDN's cutting-edge data solutions. If you are passionate about solving complex technical challenges and driving innovation in high-performance systems, we encourage you to apply.






DDN




Join our dynamic and driven team, where engineering excellence is at the heart of everything we do. We seek individuals who love to challenge themselves and are fueled by curiosity. Here, you'll have the opportunity to work across various areas of the company, thanks to our flat organizational structure that encourages hands-on involvement and direct contributions to our mission. Leadership is earned by those who take initiative and consistently deliver outstanding results, both in their work ethic and deliverables, making strong prioritization skills essential. Additionally, we value strong communication skills in all our engineers and researchers, as they are crucial for the success of our teams and the company as a whole.

Interview Process: After submitting your application, one of our recruiters will review your resume. If your application passes this stage, you will be invited to a 30-minute interview during which a member of our team will ask some basic questions. If you clear the interview, you will enter the main process, which can consist of up to four interviews in total:

  • Coding assessment: Often in a language of your choice.
  • Systems design: Translate high-level requirements into a scalable, fault-tolerant service (depending on role).
  • Real-time problem-solving: Demonstrate practical skills in a live problem-solving session.
  • Meet and greet with the wider team.
  • Our goal is to finish the main process in 2-3 weeks at most.

DataDirect Networks (DDN) is an Equal Opportunity/Affirmative Action employer. All qualified applicants will receive consideration for employment without regard to race, color, religion, gender, gender identity, gender expression, transgender, sex stereotyping, sexual orientation, national origin, disability, protected Veteran Status, or any other characteristic protected by applicable federal, state, or local law.

#LI-Remote





Get your free, confidential resume review.
or drag and drop a PDF, DOC, DOCX, ODT, or PAGES file up to 5MB.

Similar jobs

Director of Engineering - Infinia AI Performance

Data Direct Networks

Remote

USD 120,000 - 180,000

Yesterday
Be an early applicant

Director of Engineering - Infinia AI Performance

DataDirect Networks, Inc.

Remote

USD 120,000 - 180,000

30+ days ago

Director of Engineering - Infinia AI Performance

DDN

Remote

USD 120,000 - 180,000

30+ days ago