Job Search and Career Advice Platform

Enable job alerts via email!

Genai Systems Administrator Is Needed For Long Term Contract

DWI Consulting Ltd

Remote

GBP 60,000 - 90,000

Full time

3 days ago
Be an early applicant

Generate a tailored resume in minutes

Land an interview and earn more. Learn more

Job summary

A technology consulting firm is seeking an experienced Consultant specializing in GenAI system administration and GPU operations to support a large-scale AI and GPU data center environment. The role will initially be onsite in Norway for several months before transitioning to remote work. You'll be responsible for ensuring operational stability and performance while collaborating closely with engineering teams. Strong experience with NVIDIA and AMD hardware, troubleshooting, and monitoring platforms like Grafana are essential. Fluent English is required.

Qualifications

  • Extensive experience in GPU hardware and data center environments.
  • Hands-on knowledge of diagnosing complex GPU systems.
  • Fluency in English required.

Responsibilities

  • Ensure operational stability and performance of GPU-based AI platforms.
  • Monitor and manage infrastructure, responding to operational requests.
  • Produce regular GPU utilization reports and system health analyses.
  • Collaborate with engineers to resolve hardware and operational issues.

Skills

GenAI system administration
GPU operations expertise
Operational troubleshooting
Monitoring platforms (Grafana, Prometheus)
Dell OpenManage
Red Hat Enterprise Linux
Ubuntu
NVIDIA Bright Cluster
Issue tracking and escalation

Tools

NVIDIA hardware
AMD hardware
Job description

We are seeking an experienced Consultant with strong GenAI system administration and GPU operations expertise to support a large-scale AI and GPU data center environment. This role is designed as a long-term position, embedding you directly into daily operations where you will act as a trusted technical advisor and operational specialist.

The engagement will begin with an onsite phase of approximately four to six months in Norway, after which the role can transition to remote working.

As a GenAI Systems Administrator Resident, you will focus on the operational stability, performance, and observability of GPU-based AI platforms. You will work under the client's direction, aligning closely with evolving operational and business needs, while ensuring systems are healthy, performant, and ready to support production workloads.

You will bring hands-on experience with GPU hardware from NVIDIA and AMD, along with a strong background operating in large data center environments. Confidence working with Dell OpenManage, Red Hat Enterprise Linux, Ubuntu, NVIDIA Bright Cluster, Omnia, Grafana, and Prometheus is essential. Your expertise must be practical and demonstrable, particularly in diagnosing and resolving issues in complex GPU-based systems.

In day-to-day operations, you will monitor, review, and manage infrastructure, respond to user and operational requests, and analyse system and application logs. You will produce regular operational and GPU utilisation reports, helping teams understand system health, performance trends, and potential risks before they impact workloads.

A key part of the role involves strong operational troubleshooting. You will be comfortable diagnosing systems that are not behaving optimally, with a deep understanding of GPU failure modes and how to detect early warning signs. You will help surface the right metrics, alerts, and conditions through monitoring platforms such as Grafana and Prometheus, ensuring system health is visible and actionable.

You will support change and problem management activities, evaluate proposed changes, and provide clear recommendations. Post-implementation, you will contribute to planning and continuous improvement while ensuring knowledge is shared effectively across teams. Issue tracking and escalation are also central to the role-you will work closely with engineering teams, raise and track issues, support investigations, and represent the client's operational perspective throughout the resolution process.

Collaboration is fundamental. As the onsite go-to technical resource, you will work closely with Designated Support Engineers and Onsite Field Service Engineers to resolve hardware and operational issues quickly. For major incidents or upgrades, you will coordinate with remote experts to minimise downtime and operational impact.

Fluent English is required.

Get your free, confidential resume review.
or drag and drop a PDF, DOC, DOCX, ODT, or PAGES file up to 5MB.