Enable job alerts via email!

Data Centre Engineer, Field Operations

Sustainable Metal Cloud

Singapore

On-site

SGD 70,000 - 90,000

Full time

Today
Be an early applicant

Job summary

A leading technology firm in Singapore is seeking a Data Centre Engineer to support the daily operations of AI-accelerated high-performance computing infrastructure. Responsibilities include the configuration and maintenance of servers and troubleshooting hardware issues. Applicants should hold a Bachelor’s degree and have 5+ years of relevant experience. The company promotes a diverse and inclusive workplace dedicated to sustainable practices.

Qualifications

  • 5+ years of experience in field service technical areas.
  • Strong understanding of server hardware technology and Linux environments.
  • Experience with scripting languages such as Bash or Python.

Responsibilities

  • Support in the deployment and maintenance of GPU servers and networking equipment.
  • Troubleshoot incidents and escalate critical issues.
  • Document incident details and resolutions.

Skills

Server hardware technology
Linux environments
Troubleshooting hardware problems
Scripting languages (Bash, Python)
Workload manager (Slurm, Kubernetes)
Observability tools (Prometheus, Grafana, ELK)
Excellent problem-solving skills
Strong communication skills

Education

Bachelor’s degree in computer engineering or related field
Job description
Overview

Firmus Technologies is seeking a skilled Data Centre Engineer to join our Operations team, supporting the daily operations and maintenance of our AI-accelerated high-performance computing (HPC) infrastructure. This role will work closely with Field Service Engineers, HPC and Network Engineering teams, and assist the Global Operations Centre (GOC). This is a unique opportunity to contribute directly to the stability and growth of cutting-edge AI infrastructure.

Responsibilities
  • Support in the deployment, configuration, and maintenance of various high-end GPU servers, storage servers, networking equipment and software components in highly secure environments.
  • Perform hardware diagnostics, systems functionality and firmware updates as required.
  • Collaborate with engineering teams to assist in tailored customer environments deployment (eg: bare-metal systems, HPC Clusters, Kubernetes, Slurm etc).
  • Serve as first line of engineering support for onsite operational issues, including troubleshooting hardware, network and software problems.
  • Troubleshoot incidents, escalate critical issues and provide feedback to appropriate teams for improvements.
  • Participate in an on-call rotation to ensure 24/7 availability and responsiveness to critical issues.
  • Provide technical support to the GOC Support Specialist team in troubleshooting HPC-related problems.
  • Document incident details, resolutions, and lessons learned to enhance future problem-solving.
  • Maintain clear, accurate, and up-to-date documentation to promote effective knowledge sharing across the team.
  • Communicate effectively with GOC, HPC Engineers, internal teams, stakeholders, and end-users to ensure alignment on issue resolution.
  • Take part in team meetings and knowledge-sharing sessions to foster collaboration and continuous learning.
Skills and Experience
  • Bachelor’s degree in computer engineering, computer science, or a related technical field.
  • 5+ years of experience in field service technical areas.
  • Strong understanding of server hardware technology, Linux environments and troubleshooting hardware problems, with adherence to physical and system-level security standards.
  • Experience with scripting languages (eg: Bash, Python)
  • Familiarity with using workload manager and cluster softwares (eg: Slurm, Kubernetes, Nvidia BCM) and Observability tools (eg: Prometheus, Grafana, ELK, etc)
  • Excellent problem-solving and analytical skills.
  • Ability to work independently and as part of a team.
  • Strong communication skills, both written and verbal.
Employment Basis

Full Time

At Firmus, we are committed to building a diverse and inclusive workplace. We encourage applications from candidates of all backgrounds who are passionate about creating a more sustainable future through innovative engineering solutions.

Join us in our mission to revolutionize the AI industry through sustainable practices and cutting-edge engineering. Apply now to be part of shaping the future of sustainable AI infrastructure.

Get your free, confidential resume review.
or drag and drop a PDF, DOC, DOCX, ODT, or PAGES file up to 5MB.