Enable job alerts via email!

Infrastructure/GPU Engineer

Cognizant

Denver (CO)

Remote

USD 99,000 - 116,000

Full time

Yesterday

Be an early applicant

Job summary

A leading technology company is seeking a hands-on Infrastructure Engineer to design and deploy AI-optimized environments leveraging NVIDIA DGX systems. The ideal candidate will possess deep expertise in infrastructure deployment, workload orchestration, and performance optimization. This remote role offers a salary range of $99,000 to $116,000, depending on experience. Applicants are encouraged to apply before 10/21/2025.

Qualifications

Deep understanding of NVIDIA DGX architecture and GPU compute.
Strong Linux system administration skills and shell scripting expertise.
Experience with Slurm, parallel filesystems, and high-speed networking.

Responsibilities

Architect and deploy NVIDIA DGX systems and GPU-based compute clusters.
Configure and manage Slurm Workload Manager for job scheduling.
Implement system health checks and diagnostics across compute, storage, and network layers.

Skills

NVIDIA DGX architecture

Linux system administration

Slurm

High-speed networking (InfiniBand/RDMA/RoCE)

Shell scripting

Containerization (Docker)

Orchestration (Kubernetes)

Automation tools (Ansible, Redfish)

Tools

TerraForm

PXE boot

Run.ai

ClearML

Overview

Cognizant is seeking a highly skilled hands-on Infrastructure Engineer with proven experience in the physical and technical deployment of AI-ready environments optimized for AI and machine learning workloads. This role focuses on NVIDIA DGX or similar systems, GPU-accelerated compute clusters, high-speed networking, and scalable storage solutions. The ideal candidate will have deep expertise in infrastructure design, deployment, workload orchestration, and performance optimization in enterprise environments.

This is a remote role in the US. Salary range for this role is between $99,000 and $116,000 depending on skills and qualifications of the candidate. Applications will be accepted till 10/21/2025.

Key Responsibilities

System Design & Deployment

Help in rightsizing GPU investment
Architect and deploy NVIDIA DGX systems and GPU-based compute clusters.
Design and implement scalable parallel filesystems (e.g., Lustre, BeeGFS, GPFS).
Integrate high-speed interconnects using InfiniBand, RoCE, and RDMA.
Collaborate on rack planning and airflow optimization.

Cluster & Infrastructure Management

Configure and manage Slurm Workload Manager for job scheduling.
Deploy and maintain cluster orchestration tools
Automate provisioning using PXE boot, Terraform, Redfish, and Kubernetes.
Perform firmware updates, BIOS/IPMI/BMC configuration, and OS provisioning
Knowledge of Run.ai, ClearML or similar platform

Networking & Performance Optimization

Design and validate network topologies including IPMI, internal/external networks, and InfiniBand fabrics.
Optimize RDMA and RoCE configurations for low-latency, high-throughput data transfers.
Conduct performance benchmarking using GPU-Burn, NCCL, and NVSM.

Monitoring & Troubleshooting

Implement system health checks and diagnostics across compute, storage, and network layers.
Troubleshoot hardware/software issues and ensure reliable infrastructure operation.

Required Skills & Qualifications

Technical Expertise

Deep understanding of NVIDIA DGX architecture, CUDA, and GPU compute.
Strong Linux system administration and shell scripting skills.
Experience with Slurm, parallel filesystems, and high-speed networking (InfiniBand/RDMA/RoCE).
Familiarity with containerization (Docker), orchestration (Kubernetes), and automation tools (Ansible, Redfish).

Preferred Qualifications

Experience with BBCM, and DGX BasePOD/SuperPOD configuration

Certifications by Nvidia or equivalent OEM.

Get your free, confidential resume review.

or drag and drop a PDF, DOC, DOCX, ODT, or PAGES file up to 5MB.

Infrastructure/GPU Engineer

Cognizant

Denver (CO)

Remote

USD 99,000 - 116,000

Full time

Job summary

Qualifications

Responsibilities

Skills

Tools

Company

Services

Free resources

Support

Infrastructure/GPU Engineer

Cognizant

Denver (CO)

Remote

USD 99,000 - 116,000

Full time

Job summary

Qualifications

Responsibilities

Skills

Tools

Follow us

Company

Services

Free resources

Support