Enable job alerts via email!

Site Reliability Engineer (Linux Kernel, Kubernetes, Cloud, Automation, Networking).

EXASOFT CONSULTING PTE. LTD.

Singapore

On-site

SGD 120,000 - 150,000

Full time

Today
Be an early applicant

Job summary

A financial technology firm in Singapore seeks a Senior Systems Engineer with over 10 years of experience in system administration for financial markets infrastructure. The role encompasses developing performance-oriented solutions, managing hybrid cloud setups, and employing automation tools like Kubernetes and Ansible. Candidates must possess advanced Linux skills and a strong knowledge of cloud operations. Competitive salary and benefits are offered.

Qualifications

  • 10+ years of experience in system administration and performance engineering.
  • Advanced proficiency in Linux internals and kernel performance tuning.
  • Hands-on experience with Kubernetes and Docker for automation.

Responsibilities

  • Develop performance-critical infrastructure for financial markets.
  • Build high-availability environments using Kubernetes and Docker.
  • Manage hybrid cloud infrastructure with strict performance SLAs.

Skills

Linux kernel expertise
Kubernetes
Docker
Ansible
Bash
Python
AWS
Azure
GCP
Networking protocols

Tools

ELK Stack
Grafana
Splunk
VMware
Job description
Responsibilities
  • Develop and oversee performance-critical infrastructure for financial markets, ensuring maximum throughput, high resiliency, and minimal operational risk.
  • Leverage deep Linux kernel expertise to fine-tune scheduling policies, interrupt routing, and NUMA resource allocation, ensuring predictable performance at scale.
  • Build and maintain high-availability containerized environments using Kubernetes, Docker, and advanced orchestration tools with a strong focus on scalability and security.
  • Lead automation initiatives with Ansible, Bash, and Python, eliminating manual intervention and improving system efficiency.
  • Manage hybrid cloud infrastructure (AWS, Azure, GCP) with strict performance SLAs, security compliance, and cost-optimized deployments.
  • Oversee infrastructure monitoring and observability using ELK Stack, Grafana, Site24x7, Splunk, and other enterprise-grade tools, ensuring proactive incident detection and resolution.
  • Administer and troubleshoot enterprise storage and networking stacks like RAID, NFS, SAN/NAS, TCP/IP networking,VMware/vCenter, BigIP load balancers.
  • Collaborate with development, DevOps, and security teams to design fault-tolerant systems and enforce infrastructure governance policies.
  • Execute predictive capacity modeling, OS hardening and patch compliance, coupled with benchmark-driven performance optimization for trading and real-time compute platforms.
  • Provide expert-level outage resolution, coordinating cross-functional teams to deliver sustainable remediation and operational resilience.
Requirements
  • 10+ years of progressive experience in system administration, performance engineering, and reliability operations across enterprise and financial domains.
  • Advanced proficiency in Linux internals with specialization in kernel performance tuning, NUMA-aware optimizations, and real-time workload handling.
  • Proven hands-on experience with Kubernetes, Docker, and Ansible for large-scale automation and orchestration.
  • Strong scripting/programming in Bash, Python, and experience with perf/eBPF for system analysis.
  • Demonstrated expertise in cloud operations across AWS, Azure, and GCP.
  • Strong background in networking protocols (TCP/IP, FIX) and high-performance trading environments.
  • Familiarity with storage systems (SAN, NAS, RAID) and database tuning (MySQL optimization).
  • Experience implementing observability and monitoring solutions like ELK, Grafana, Splunk, Corvil.
Get your free, confidential resume review.
or drag and drop a PDF, DOC, DOCX, ODT, or PAGES file up to 5MB.