Job Search and Career Advice Platform

Enable job alerts via email!

Senior DevOps Engineer, AI & Applications

FIRMUS METAL INTERNATIONAL PTE. LTD.

Singapore

On-site

SGD 100,000 - 130,000

Full time

Today
Be an early applicant

Generate a tailored resume in minutes

Land an interview and earn more. Learn more

Job summary

A leading technology firm in Singapore is looking for a Senior DevOps Engineer to build and manage CI/CD pipelines, mentor the team, and ensure deployment safety. The ideal candidate will have 5–7 years of experience, expertise in automation and Kubernetes, and a strong focus on compliance. This full-time position offers a chance to impact sustainable AI practices. Join us in shaping the future of technology.

Qualifications

  • 5–7 years of experience in CI/CD engineering or DevOps.
  • Deep expertise in GitHub Actions or Jenkins with multi-stage design.
  • Strong understanding of Kubernetes and its components.

Responsibilities

  • Design and maintain CI/CD pipelines aligned with GPU cluster standards.
  • Implement release best practices ensuring safety and compliance.
  • Mentor the team on deployment strategies and incident response.

Skills

CI/CD engineering
Automation scripting (Python, Go, Bash)
Kubernetes fundamentals
Release engineering best practices
Debugging Kubernetes rollout issues
Artifact management

Tools

GitHub Actions
Jenkins
ArgoCD
Job description
Overview

Role Summary: Every AI feature we ship touches thousands of GPUs. The Senior DevOps Engineer will build the release engineering backbone—CI/CD pipelines, automated testing gates, one-click deployments with instant rollback—that lets Firmus scale fast and responsibly.

You're the bridge between engineering and operations: setting Firmus standards for how code gets to production, mentoring the team on deployment safety, and driving a blameless culture when things go wrong. Ship safely. Ship often. Ship at scale.

Responsibilities
  • Design and maintain team-wide CI/CD pipelines (Jenkins, GitHub Actions, ArgoCD, or equivalent) with automated testing gates, artifact management, and deployments aligned with GPU cluster standards.
  • Implement release engineering best practices: repeatable releases, GitOps workflows, automated rollback, and change management procedure.
  • Build and manage test infrastructure: environment provisioning, data seeding, long-running job validation (especially for distributed training templates and multi-node job submissions).
  • Establish engineering protocols and standards: repo organization, PR templates, code quality gates, dependency scanning, static analysis.
  • Partner with infra teams to ensure AI product features deployment practices meet compliance and security standards for massive GPU clusters.
  • Mentor team on testing strategies, deployment safety, and incident response procedures.
Qualifications
  • 5–7 years of CI/CD engineering, release engineering, or DevOps experience
  • Deep expertise in GitHub Actions, GitLab CI, ArgoCD, or Jenkins with multi-stage pipeline design and testing gate implementation.
  • Strong automation scripting (Python, Go, or Bash) for build orchestration and environment templating.
  • Strong Kubernetes fundamentals (hands-on): deep understanding of Pod lifecycle and failure modes (Pending/Running/CrashLoopBackOff/Evicted), Deployments/ReplicaSets, Jobs/CronJobs, Services/Ingress, and how these primitives behave under load and during rollouts.
  • Config & secret management: practical experience designing and operating ConfigMaps and Secrets (including secret rotation patterns), with strong hygiene around least privilege, auditability, and preventing credential leakage into logs/artifacts.
  • Safe rollout patterns: proven experience implementing and operating safe rollout strategies (rolling updates, canary, blue/green), readiness/liveness/startup probes, PodDisruptionBudgets, and rollback procedures—ensuring zero/low-downtime deployments for customer-facing services.
  • Deployment safety & debugging: ability to debug common Kubernetes rollout issues end-to-end (bad probes, misconfigured resources/limits, image pull failures, secret/config drift, node pressure/evictions) and convert learnings into automated CI/CD gates and runbooks.
  • Familiarity with artifact management, versioning strategies, and rollback procedures.
  • Experience integrating testing frameworks into CI pipelines (unit, integration, end-to-end).
Key Competencies
  • Engineering Velocity & Time-to-Release improves quarter-over-quarter while release standards remain consistent (gates, tests, approvals, auditability).
  • Platform Reliability & Customer Trust remains strong: release-related incidents are rare and recovery is fast; reliability targets are met without "surprise outages."
  • Developer Productivity & Team Scale improves: engineers spend less time fighting CI/CD and more time shipping as the team grows.
  • Cost Efficiency & Resource Optimization improves: CI/CD and test infrastructure costs stay controlled (or decrease per unit of output) as usage scales.
  • Knowledge & Culture Multiplier effect is visible: release/reliability practices become the default across the org and repeat incident classes reduce
Success Metrics
  • Engineering Velocity & Time-to-Release improves quarter-over-quarter while release standards remain consistent (gates, tests, approvals, auditability).
  • Platform Reliability & Customer Trust remains strong: release-related incidents are rare and recovery is fast; reliability targets are met without “surprise outages.”
  • Developer Productivity & Team Scale improves: engineers spend less time fighting CI/CD and more time shipping as the team grows.
  • Cost Efficiency & Resource Optimization improves: CI/CD and test infrastructure costs stay controlled (or decrease per unit of output) as usage scales.
  • Knowledge & Culture Multiplier effect is visible: release/reliability practices become the default across the org and repeat incident classes reduce
Location & Reporting
  • Singapore or Australia (Launceston, TAS or Sydney, NSW)
  • Reporting to Head of AI & Applications
Employment Basis

Full-time

Diversity

At Firmus, we are committed to building a diverse and inclusive workplace. We encourage applications from candidates of all backgrounds who are passionate about creating a more sustainable future through innovative engineering solutions.

Join us in our mission to revolutionize the AI industry through sustainable practices and cutting-edge engineering. Apply now to be part of shaping the future of sustainable AI infrastructure.

Get your free, confidential resume review.
or drag and drop a PDF, DOC, DOCX, ODT, or PAGES file up to 5MB.