Enable job alerts via email!

Director, Software Engineering - DGX Cloud Infrastructure

NVIDIA

United States

Remote

USD 284,000 - 426,000

Full time

16 days ago

Boost your interview chances

Create a job specific, tailored resume for higher success rate.

Job summary

NVIDIA seeks a Director, Software Engineering, to lead a key organization focused on GPU cloud infrastructure. This role involves scaling engineering teams, driving automation for large-scale operations, and collaborating with cloud partners to ensure production excellence. Candidates must have extensive experience in software engineering and a strong leadership background, with a Bachelor's or Master's degree in Computer Science.

Benefits

Equity opportunities
Comprehensive benefits

Qualifications

  • Minimum 10+ years of experience in software engineering, with 5+ years in a management role.
  • Experience with cloud infrastructure and automation.
  • Proven experience leading software engineering teams.

Responsibilities

  • Build and lead teams for large-scale GPU infrastructure automation.
  • Lead design and delivery of automation frameworks.
  • Interface with NVIDIA Cloud Partners to ensure production excellence.

Skills

Leadership
Communication
Infrastructure Automation
Distributed Systems

Education

Bachelor of Science in Computer Science
Master of Science in Computer Science

Tools

Kubernetes
Linux
Infiniband
NVIDIA BCM
Slurm
BlueField DPUs

Job description

Director, Software Engineering - DGX Cloud Infrastructure

NVIDIA is seeking a strategic and technically grounded Director of Engineering to lead a high-impact organization at the intersection of core compute cloud infrastructure for AI factories. This organization is a key pillar in NVIDIA’s DGX Cloud ecosystem, building shared automation and reliability tooling that enables a sizable portion of our GPU-accelerated compute fleet.

You will further develop and scale an organization of engineers focused on running production software for large scale GPU-accelerated infrastructure. This organization partners closely with storage, networking, and several other teams across NVIDIA. You will be the engineering leader responsible for interfacing with some of our NVIDIA Cloud Partners to continuously meet our production excellence goals.

What You’ll Be Doing:

Build and grow a team of software engineers and leaders focused on automating day 0, 1, and 2 for large-scale GPU clusters running on bare metal and public clouds with service levels of various kinds.

Lead the design and continuous delivery of shared automation frameworks aligned with SLOs and error budgets.

Liaise with some of our NVIDIA Cloud Partners to ensure aligned priorities and sustained production excellence.

Drive clarity and execution through high ambiguity, translating broad, and ever evolving objectives into iterative delivery milestones.

Enable internal teams by reducing operational friction and improving automation coverage across the stack.

What We Need To See:

Proven experience leading software engineering teams (incl. SRE and/or DevOps) responsible for infrastructure automation, and distributed systems.

Demonstrated ability to build software engineering organizations, driving continuous incremental execution across teams, and operate effectively in highly ambiguous environments with ever evolving objectives.

Hands-on experience designing, running, or automating cloud infrastructure atop bare metal platforms and/or VMs.

Experience deploying cloud-native services on public clouds.

Track record of representing your company or division in external partnerships with public clouds, infrastructure vendors, and to internal partner teams.

Strong foundation in incremental delivery, and technical program execution.

Excellent written and verbal communication skills, with the ability to influence across levels and disciplines.

Bachelor of Science (or equivalent experience) or Master of Science degree in Computer Science or related field, with a minimum of 10+ overall years of experience developing and leading cloud infrastructure teams, and 5+ yrs of management experience

Ways to stand out from the crowd:

Relevant experience developing organizations at public cloud companies. Background leading teams running large-scale GPU clusters. Familiarity with technologies like Linux, NVIDIA BCM, Slurm, Infiniband, Kubernetes, Slurm, distributed storage, or BlueField DPUs.

Experience developing both internal-facing platform teams and customer-facing infrastructure as a service ones.

Track record of collaboration with security, or compliance teams including in regulated environments. Familiarity with AI/ML platform workloads and their reliability or performance characteristics.

NVIDIA is widely considered to be one of the technology world’s most desirable employers. We have some of the most forward-thinking and hardworking people in the world working for us. If you're creative, hard-working and self-motivated, we want to hear from you! NVIDIA is leading the way in groundbreaking developments in Artificial Intelligence, High-Performance Computing and Visualization. The GPU, our invention, serves as the visual cortex of modern computers and is at the heart of our products and services. NVIDIA leads the way in groundbreaking developments in Artificial Intelligence, High-Performance Computing (HPC) and Visualization. DGX Cloud provides a serverless generative AI infrastructure to the world enabling NVIDIA’s AI supercomputer technologies to be used by anyone.

The base salary range is 284,000 USD - 425,500 USD. Your base salary will be determined based on your location, experience, and the pay of employees in similar positions.

NVIDIA is seeking a strategic and technically grounded Director of Engineering to lead a high-impact organization at the intersection of core compute cloud infrastructure for AI factories. This organization is a key pillar in NVIDIA’s DGX Cloud ecosystem, building shared automation and reliability tooling that enables a sizable portion of our GPU-accelerated compute fleet.

You will further develop and scale an organization of engineers focused on running production software for large scale GPU-accelerated infrastructure. This organization partners closely with storage, networking, and several other teams across NVIDIA. You will be the engineering leader responsible for interfacing with some of our NVIDIA Cloud Partners to continuously meet our production excellence goals.

What You’ll Be Doing:

  • Build and grow a team of software engineers and leaders focused on automating day 0, 1, and 2 for large-scale GPU clusters running on bare metal and public clouds with service levels of various kinds.

  • Lead the design and continuous delivery of shared automation frameworks aligned with SLOs and error budgets.

  • Liaise with some of our NVIDIA Cloud Partners to ensure aligned priorities and sustained production excellence.

  • Drive clarity and execution through high ambiguity, translating broad, and ever evolving objectives into iterative delivery milestones.

  • Enable internal teams by reducing operational friction and improving automation coverage across the stack.

What We Need To See:

  • Proven experience leading software engineering teams (incl. SRE and/or DevOps) responsible for infrastructure automation, and distributed systems.

  • Demonstrated ability to build software engineering organizations, driving continuous incremental execution across teams, and operate effectively in highly ambiguous environments with ever evolving objectives.

  • Hands-on experience designing, running, or automating cloud infrastructure atop bare metal platforms and/or VMs.

  • Experience deploying cloud-native services on public clouds.

  • Track record of representing your company or division in external partnerships with public clouds, infrastructure vendors, and to internal partner teams.

  • Strong foundation in incremental delivery, and technical program execution.

  • Excellent written and verbal communication skills, with the ability to influence across levels and disciplines.

  • Bachelor of Science (or equivalent experience) or Master of Science degree in Computer Science or related field, with a minimum of 10+ overall years of experience developing and leading cloud infrastructure teams, and 5+ yrs of management experience

Ways to stand out from the crowd:

  • Relevant experience developing organizations at public cloud companies. Background leading teams running large-scale GPU clusters. Familiarity with technologies like Linux, NVIDIA BCM, Slurm, Infiniband, Kubernetes, Slurm, distributed storage, or BlueField DPUs.

  • Experience developing both internal-facing platform teams and customer-facing infrastructure as a service ones.

  • Track record of collaboration with security, or compliance teams including in regulated environments. Familiarity with AI/ML platform workloads and their reliability or performance characteristics.

NVIDIA is widely considered to be one of the technology world’s most desirable employers. We have some of the most forward-thinking and hardworking people in the world working for us. If you're creative, hard-working and self-motivated, we want to hear from you! NVIDIA is leading the way in groundbreaking developments in Artificial Intelligence, High-Performance Computing and Visualization. The GPU, our invention, serves as the visual cortex of modern computers and is at the heart of our products and services. NVIDIA leads the way in groundbreaking developments in Artificial Intelligence, High-Performance Computing (HPC) and Visualization. DGX Cloud provides a serverless generative AI infrastructure to the world enabling NVIDIA’s AI supercomputer technologies to be used by anyone.

The base salary range is 284,000 USD - 425,500 USD. Your base salary will be determined based on your location, experience, and the pay of employees in similar positions.

You will also be eligible for equity and benefits . NVIDIA accepts applications on an ongoing basis.

NVIDIA is committed to fostering a diverse work environment and proud to be an equal opportunity employer. As we highly value diversity in our current and future employees, we do not discriminate (including in our hiring and promotion practices) on the basis of race, religion, color, national origin, gender, gender expression, sexual orientation, age, marital status, veteran status, disability status or any other characteristic protected by law.

About the company

9637389 Nvidia Corporation is an American multinational technology company incorporated in Delaware and based in Santa Clara, California.

Notice

Talentify is an Equal Opportunity Employer. All qualified applicants will receive consideration for employment without regard to race, color, religion, sex, sexual orientation, gender identity, national origin, disability, or protected veteran status.

Talentify provides reasonable accommodations to qualified applicants with disabilities, including disabled veterans. Request assistance at accessibility@talentify.io or 407-000-0000.

Federal law requires every new hire to complete Form I-9 and present proof of identity and U.S. work eligibility.

An Automated Employment Decision Tool (AEDT) will score your job-related skills and responses. Bias-audit & data-use details: www.talentify.io/bias-audit-report . NYC applicants may request an alternative process or accommodation at aedt@talentify.io or 407-000-0000.

Get your free, confidential resume review.
or drag and drop a PDF, DOC, DOCX, ODT, or PAGES file up to 5MB.

Similar jobs

Principal System Cloud Architect

NVIDIA Corporation

New York

Remote

USD 272,000 - 426,000

4 days ago
Be an early applicant

Group Director, Software Engineering

Lensa

Sunnyvale

On-site

USD 254,000 - 481,000

3 days ago
Be an early applicant

Distinguished Engineer, Cloud Architecture

NVIDIA

Remote

USD 308,000 - 472,000

3 days ago
Be an early applicant

Principal System Cloud Architect

NVIDIA

Remote

USD 272,000 - 426,000

3 days ago
Be an early applicant

Sr. Director, Software Engineering - Card Account Acquisitions

Capital One

McLean

On-site

USD 308,000 - 353,000

5 days ago
Be an early applicant

Linux GPU System Software Engineering Manager

NVIDIA

California

On-site

USD 224,000 - 426,000

8 days ago

Distinguished Engineer, Cloud Architecture

NVIDIA

New York

On-site

USD 308,000 - 472,000

4 days ago
Be an early applicant

Senior Director, NVIDIA Partnership

DataDirect Networks

San Francisco

On-site

USD 235,000 - 436,000

30+ days ago