Enable job alerts via email!

Senior DevOps Engineer, IPP Sanity Engineering

Nvidia Corporation in

Santa Clara (CA)

On-site

USD 168,000 - 334,000

Full time

Yesterday

Be an early applicant

Boost your interview chances

Create a job specific, tailored resume for higher success rate.

Job summary

A leading technology company is seeking a Senior DevOps Engineer to join their Infrastructure, Planning and Process team. The role involves leading GPU product bringups, optimizing resource utilization, and automating configurations. Ideal candidates will have extensive experience in DevOps, cloud services, and a strong programming background. The company offers competitive salaries, comprehensive benefits, and a dynamic work environment.

Benefits

Equity

Comprehensive benefits

Dynamic work environment

Qualifications

10+ years of relevant experience in DevOps.
Hands-on coding and debugging experience.
Experience with large-scale enterprise production systems.

Responsibilities

Lead end-to-end infrastructure bringup for new NVIDIA GPU products.
Automate configurations using tools like Chef, Puppet, Ansible.
Collaborate with partner teams to onboard new products into CI/CD pipelines.

Skills

Python

Unix

TCL shell scripting

Debugging

Education

Bachelor's Degree in Computer Science

Master's Degree in Software Engineering

Tools

Linux

Windows

Docker

Kubernetes

Terraform

Ansible

Chef

MySQL

GIT

Perforce

Senior DevOps Engineer, IPP Sanity Engineering (Finance)

NVIDIA is seeking a Senior DevOps Engineer to join the IPP (Infrastructure, Planning and Process) Sanity Engineering team, focusing on executing Nvidia product bringups. IPP is a core software infrastructure organization within NVIDIA, collaborating with various groups such as Graphics Processors, Mobile Processors, Deep Learning, Artificial Intelligence, and Driverless Cars to support their infrastructure needs. Our cloud services handle nearly half a million automated jobs daily across thousands of distributed datacenters, enhancing productivity for NVIDIA's global software engineers.

Our cloud infrastructure hosts a diverse array of machines and devices with different operating systems (Windows/Linux/Android) and hardware platforms including NVIDIA GPUs and Tegra Processors. If you are passionate about distributed infrastructure, eager to build next-generation cloud services for chip bringups, and interested in solving complex problems, we would love to hear from you.

What you'll be doing

Lead end-to-end infrastructure bringup for new NVIDIA GPU products.
Understand NVIDIA GPU hardware and display driver stack, SBIOS, VBIOS, and enhance automation for farm-wide updates.
Address complex issues on pre-release products, lead GPU product bringups (PCIe & Enterprise), integrate GPU test suites, and scale distributed infrastructure across multiple sites.
Optimize GPU resource utilization by identifying appropriate regression test coverage.
Automate configurations using tools like Chef, Puppet, Ansible, Terraform, etc.
Manage bringup of specialized products for accelerated computing and AI in fast-paced environments.
Oversee service charter development, focusing on telemetry and automation of the bringup infrastructure.
Automate regression test frameworks, develop self-healing and recovery solutions for multi-geo regression farms.
Collaborate with partner teams to onboard new products into CI/CD pipelines.
Implement multiple parallel bringups within NVIDIA's product landscape.

What we need to see

Bachelor's or Master's Degree in Computer Science, Software Engineering, or equivalent experience.
10+ years of relevant experience.
Hands-on coding and debugging experience, including cross-platform source code compilation, issue triage, and resolution.
Experience maintaining and setting up Linux, Windows (x64 and ARM), VM, and container environments.
Programming experience with Python (preferred), Java, or similar languages.
Proficiency in Unix & TCL shell scripting.
Experience with MySQL/NoSQL databases and writing complex queries.
Familiarity with version control systems like Perforce and GIT.
Experience working with large-scale enterprise production systems (7+ years).

Ways to stand out from the crowd

Experience automating bare metal and VM provisioning.
Knowledge of GPU isolation for Nvidia Confidential Computing.
Experience with public cloud platforms (AWS, GCP, Azure), virtualization technologies (VMware, KVM, HyperV), and container orchestration (Docker, Kubernetes).
Background in debugging GPU performance issues, embedded software development, driver development, and CUDA/TensorRT applications.

We are considered one of the most desirable employers in the tech industry, with innovative and dedicated professionals. If you're passionate, creative, and autonomous, we want to hear from you. We offer competitive salaries, comprehensive benefits, and a dynamic work environment. Our teams are growing rapidly due to our success and commitment to excellence.

The base salary range is $168,000 - $333,500, determined by factors such as location, experience, and market rates. You may also be eligible for equity and additional benefits. NVIDIA is an equal opportunity employer committed to diversity and inclusion, welcoming applicants regardless of race, religion, gender, age, or other protected characteristics.

Get your free, confidential resume review.

or drag and drop a PDF, DOC, DOCX, ODT, or PAGES file up to 5MB.

Similar jobs