Enable job alerts via email!

Senior AI Infrastructure Engineer - DGX Cloud

NVIDIA Corporation

Santa Clara (CA)

On-site

USD 148,000 - 288,000

Full time

Yesterday
Be an early applicant

Boost your interview chances

Create a job specific, tailored resume for higher success rate.

Job summary

A leading tech employer is seeking AI Infrastructure Engineers to ensure reliability of GPU cloud services. Responsibilities include designing internal tooling, developing data pipelines, and automating incident responses. Ideal candidates will have a strong computer science background and experience with cloud environments.

Qualifications

  • 5+ years of relevant experience.
  • Experience with infrastructure automation and distributed systems.

Responsibilities

  • Design, build, deploy, and operate internal tooling on cloud infrastructure.
  • Develop and maintain data pipelines and reporting tools.

Skills

Python
Go
Typescript
C/C++
Java

Education

BS degree in Computer Science

Tools

Kubernetes
Terraform
Docker
Helm
Temporal
Hive
Spark
Pytorch
Looker
Tableau
PowerBI

Job description

AI Infrastructure Engineers at NVIDIA

Ensure that our internal and external facing GPU cloud services run with maximum reliability and uptime, while enabling developers to make changes through careful planning and capacity management.

We seek engineers with a strong background in computer science fundamentals interested in building tooling, reporting, automation, and AI to support a dynamic organization.

What you’ll be doing:
  1. Design, build, deploy, and operate internal tooling on cloud infrastructure.
  2. Develop and maintain data pipelines, data lakes, and reporting tools used by leadership for decision-making.
  3. Integrate tooling with workflows and cloud providers to streamline incident, change, and problem management.
  4. Automate incident response and maintenance tasks using software automation and AI/ML solutions.
What we need to see:
  • BS degree in Computer Science or related field, or equivalent experience.
  • 5+ years of relevant experience.
  • Team player with adaptability in a dynamic environment.
  • Proven ability to initiate projects and collaborate effectively.
  • Experience with infrastructure automation and distributed systems in large-scale cloud environments.
  • Skills in Python, Go, Typescript, C/C++, Java.
  • Knowledge of Linux, Networking, Storage, and Containers.
Ways to stand out:
  • Experience with incident tooling like FireHydrant, Rootly, incident.io, blameless.
  • Building plugins and schemas in Backstage.
  • Knowledge of Kubernetes, terraform, docker, helm, Temporal.
  • Familiarity with ML/data science tools like Hive, Spark, Pytorch.
  • Experience with analytics tools such as Looker, Tableau, PowerBI.

NVIDIA is a leading employer in tech innovation, especially in AI, HPC, and Visualization. We value creativity, autonomy, and challenge-seeking individuals. Our invention, the GPU, is central to modern computing.

The salary range is $148,000 - $287,500, determined by location, experience, and market rates. Benefits and equity are also offered. We accept applications continuously and are committed to diversity and equal opportunity.

Get your free, confidential resume review.
or drag and drop a PDF, DOC, DOCX, ODT, or PAGES file up to 5MB.

Similar jobs

Senior AI Infrastructure Engineer - DGX Cloud

NVIDIA

Santa Clara

On-site

USD 148,000 - 288,000

6 days ago
Be an early applicant

Senior Director, NVIDIA Partnership

DataDirect Networks

San Francisco

Remote

USD 235,000 - 436,000

3 days ago
Be an early applicant

Senior Director, NVIDIA Partnership

DDN

San Francisco

Remote

USD 235,000 - 436,000

3 days ago
Be an early applicant

Senior AI Infrastructure Engineer - DGX Cloud

Nvidia Corporation in

Santa Clara

On-site

USD 144,000 - 271,000

14 days ago

HPC Engineer

RCH Solutions

San Francisco

Remote

USD 90,000 - 150,000

12 days ago

Senior AI Infrastructure Engineer - DGX Cloud

NVIDIA

Remote

USD 144,000 - 271,000

16 days ago

Senior Engineer, DevOps Platform - Cloud Software

NVIDIA

Remote

CAD 112,000 - 234,000

Yesterday
Be an early applicant

Platform Architect - AWS

Quantiphi

Marlborough

Remote

USD 125,000 - 228,000

4 days ago
Be an early applicant

Technical Support Engineer, Linux and HPC Admin - DGX Cloud

NVIDIA Corporation

Santa Clara

On-site

USD 108,000 - 202,000

5 days ago
Be an early applicant