Enable job alerts via email!

Software Engineer (Infrastructure) Cambridge

Darktrace Ltd

Cambridge

On-site

GBP 40,000 - 80,000

Full time

30+ days ago

Boost your interview chances

Create a job specific, tailored resume for higher success rate.

Job summary

An established industry player is seeking a skilled Infrastructure Engineer to manage NVIDIA GPU servers and cloud environments for innovative AI and HPC projects. This hybrid role requires expertise in Linux systems, server optimization, and collaboration with researchers and engineers. You will be responsible for ensuring high availability, implementing security protocols, and developing tools for HPC environments. This is a fantastic opportunity to work in a collaborative team, where your contributions will directly impact cutting-edge technology and machine learning initiatives. If you thrive in a dynamic environment and have a passion for innovation, this role is perfect for you.

Benefits

23 days holiday + public holidays
Additional day off for birthday
Private medical insurance
Life insurance
Salary sacrifice pension scheme
Enhanced family leave
Confidential Employee Assistance Program
Cycle to work scheme

Qualifications

  • Experience in system administration with a focus on HPC platforms.
  • Familiarity with AI and HPC provisioning and management.

Responsibilities

  • Manage, maintain, and optimize NVIDIA GPU server and cloud environments.
  • Monitor server performance and implement security measures.

Skills

Problem-solving skills
Creative thinking
Excellent communication
Independent thinking

Tools

Linux operating system
NVIDIA HGX server
AWS
Azure
NVIDIA GPU technologies
PyTorch
NAS servers
Data version control systems

Job description

What will I be doing:

Darktrace is seeking an experienced Infrastructure Engineer to manage, maintain, and optimize a dedicated NVIDIA GPU server and cloud environments for innovation projects. Responsibilities include setting up, configuring, and maintaining the servers and software stack. A successful candidate will work directly with Darktrace researchers and software engineers, ensuring optimal performance and availability for ongoing AI and HPC (high-performance computing) projects.

This is a hybrid role, with a compulsory attendance of 2 days a week in the Cambridge office.

This role focuses on maintaining and optimising the Linux operating system, file systems, and software stack (Cuda, PyTorch, Python etc) for machine learning projects as well as setting up and configuring NVIDIA HGX servers (installing and updating software, managing user access, and ensuring optimal performance) and cloud infrastructure for GPU compute projects (managing access and ensuring optimal performance). Additional responsibilities include:

  • Monitoring server and application performance, identifying bottlenecks, and taking corrective actions to maintain high availability,
  • Implementing and maintaining server security, including patch management, vulnerability scanning, and intrusion detection,
  • Collaborating with network administrators, hardware engineers, and researchers to troubleshoot and resolve server and software-related issues,
  • Working closely with the project manager to ensure efficient resource allocation, server utilisation and scaling across multiple teams,
  • Collaborating with data scientists and machine learning engineers to understand their software requirements and provide guidance on best practices,
  • Assisting in training team members on the capabilities and usage of the HGX servers and the software environment,
  • Developing multi-use tooling to work with the HPC environments.

What experience do I need:

We welcome applications from engineers with strong problem-solving and creative thinking skills as well as excellent communication and the ability to work in a collaborative team environment. You will be an independent thinker with a startup mindset. Technology-wise, you will have experience in system administration, preferably with a focus on HPC platforms, GPU-based servers, and machine learning software environment as well as a familiarity with AI and HPC provisioning and management, both on-premises and in the cloud. You will have experience with server virtualization technologies and containerization and well versed with the linux operating system. You'll also ideally have:

  • Strong knowledge of NVIDIA HGX server architectures and components,
  • Strong knowledge of AWS or Azure Cloud environments,
  • Experience with NVIDIA GPU technologies, such as NVLink, NVSwitch, and Tensor Core GPUs,
  • Experience with machine learning frameworks and libraries, such as PyTorch and associated system optimisations,
  • Experience with NAS servers,
  • Experience with data version control systems.

Benefits we offer:

  • 23 days’ holiday + all public holidays, rising to 25 days after 2 years of service,
  • Additional day off for your birthday,
  • Private medical insurance which covers you, your cohabiting partner and children,
  • Life insurance of 4 times your base salary,
  • Salary sacrifice pension scheme,
  • Enhanced family leave,
  • Confidential Employee Assistance Program,
  • Cycle to work scheme.
Get your free, confidential resume review.
or drag and drop a PDF, DOC, DOCX, ODT, or PAGES file up to 5MB.

Similar jobs

3rd Line Infrastructure Engineer

Leidos

Cambridge

On-site

GBP 45.000 - 75.000

10 days ago

Senior Infrastructure Engineer

TN United Kingdom

Cambridge

Hybrid

GBP 60.000 - 100.000

12 days ago

Graduate Structural Engineer

ICE Recruit

Cambridge

Hybrid

GBP 37.000 - 42.000

3 days ago
Be an early applicant

Software Engineer (Infrastructure) @ Darktrace

Cyber Crime

Cambridge

Hybrid

GBP 50.000 - 90.000

30+ days ago

Engineer/Senior Engineer, Infrastructure team

TN United Kingdom

Cambridge

On-site

GBP 40.000 - 80.000

12 days ago

Senior Software Engineer (Infrastructure)

TN United Kingdom

Cambridge

Hybrid

GBP 50.000 - 90.000

25 days ago

Senior / Principal Civil Engineer - Infrastructure

TN United Kingdom

Cambridge

Hybrid

GBP 50.000 - 80.000

12 days ago

Principal Engineer, Infrastructure team

TN United Kingdom

Cambridge

On-site

GBP 60.000 - 100.000

11 days ago

Senior / Principal Civil Engineer - Infrastructure

Stantec

Cambridge

On-site

GBP 40.000 - 80.000

9 days ago