Job Search and Career Advice Platform

Enable job alerts via email!

Nadi - Lead Specialist, Sre

TNG Digital

Kuala Lumpur

On-site

MYR 150,000 - 200,000

Full time

Yesterday
Be an early applicant

Generate a tailored resume in minutes

Land an interview and earn more. Learn more

Job summary

An innovative fintech company in Kuala Lumpur seeks an experienced Site Reliability Engineer (SRE) to ensure high availability, optimize operations, and enhance deployment processes. The role involves leading a team, ensuring service reliability, and implementing security measures in cloud environments. Candidates must have a Bachelor's degree and extensive experience in DevOps/SRE roles, along with strong scripting and cloud platform skills. This position offers flexible working hours and various benefits.

Benefits

Flexi working hours
Monthly eWallet allowance
Additional employer EPF contribution
Unlimited office pantry fruits and snacks
Mobile and broadband subscription reimbursement
Outpatient medical benefits coverage
Additional family leave
Comprehensive medical coverage
Corporate discounts

Qualifications

  • 8 years' experience in a DevOps or SRE role is essential.
  • Strong knowledge of site reliability engineering and infrastructure cloud architecture.
  • Experience with cloud platforms and containers.

Responsibilities

  • Ensure uptime/availability of services with 99.99%.
  • Drive automation to reduce manual toil and critical failures.
  • Plan and execute disaster recovery strategies.

Skills

Scripting languages (Bash, Python, Go)
CI/CD tools
Cloud platforms (AWS, Azure)
Infrastructure as code tools (Terraform, CloudFormation)
Containerization technologies (Docker)
Container orchestration platforms (Kubernetes)
Networking principles and protocols
Problem-solving skills
Attention to detail

Education

Bachelor's degree in computer science, Engineering, Network or related field
Professional cloud certification
Job description

We fuel the ideas and ambitions of our people with an environment built on Our DNA of Love, Entrepreneurship, Agility, and Passion – LEAP!

We are a culture that empowers everyone to innovate and create solutions that will leave a positive impact on our communities and our nation, Touch ‘n Go will always be here to inspire our talents to grow as leaders and innovators giving you the power to make a difference.

What would you do?
  • Service Reliability and Availability
    • Ensure uptime/availability of 99.99% are consistently met
    • Reduce Mean Time to Detect (MTTD) and Mean Time to Recover (MTTR) during incidents
    • Drive capacity planning and prevent reliability risks
  • Drive Automation and Operational Excellence
    • Deliver consistent and repeatable deployments with zero critical failures by maintaining and updating deployment scripts/templates
    • Reduce manual toil across the team by measurable percentages
    • Standardize and harden container images, CI/CD pipelines, and cloud infrastructure
  • Release and Disaster Recover
    • Reduce deployment incidents through adherence to best practices in release management
    • Plan and execute disaster recovery and ensure RTO and RPO are met for cloud/multi-cloud environments
  • Incident Response and Troubleshooting
    • Reduce the frequency of recurring issues via problem management & root cause analysis.
    • Establish and enforce incident response processes
  • Security and Compliance
    • Ensure 100% compliance with regulatory and audit requirements for infrastructure security
    • Achieve zero critical security incidents by optimizing infrastructure and adhering to industry standards
    • Ensure infrastructure, container, and code security standards are enforced
    • Successfully implement secure architectures for all new deployments in collaboration with development teams
  • Team Leadership & Strategic Alignment
    • Lead, mentor, and grow the SRE team’s technical and operational capabilities
    • Establish on-call rotations, knowledge sharing sessions, and training programs
    • Foster a good culture of blameless accountability, learning, and continuous improvement
    • Partner with product and engineering teams to embed reliability into the SDLC
    • Influence architectural decisions with an SRE mindset
Who should join us?
Qualification:
  • Bachelor's degree in computer science, Engineering, Network or related field
  • Professional cloud certification
Experiences:
  • Proven 8 years’ experience in a DevOps or SRE role
Skills:
  • Strong knowledge of scripting language and programming language (e.g. Bash, Python, Go) and experience with configuration management tools (e.g. Ansible, Chef)
  • Good mindset and implementation on CI/CD tools and release engineering
  • Experience with cloud platforms (e.g. AWS, Azure) and infrastructure as code (IaC) tools (e.g., Terraform, CloudFormation).
  • Advanced cloud certification and project management is a plus.
  • Strong understanding in site reliability engineering, infrastructure engineering, cloud architecture service and mindset.
  • Experience with containerization technologies like Docker and container orchestration platforms such as Kubernetes.
  • Knowledge of networking principles and protocols with solid examples.
  • Strong knowledge on cloud architecture and services
  • Strong problem-solving skills and the ability to handle high-pressure situations calmly and effectively.
  • Strong attention to detail and a commitment to delivering high‑quality results.
Personality:
  • Passionate, agile, flexible, and positive attitude.
  • Assertive, driven individual with a strong sense of urgency
  • Self-starter with continuous improvement mindset
Benefits:
  • Flexi working hours.
  • Monthly eWallet allowance.
  • Additional 1% employer EPF contribution from your 1st to 3rd year of service, with further increases based on your continued years of service.
  • Unlimited office pantry fruits, snacks and drinks.
  • Mobile and broadband subscription reimbursement.
  • Flexibility to opt dependants coverage (spouse, child, parents or parents-in-law) for outpatient medical benefits.
  • Additional leave including family leave and paid care leave to care for family members.
  • Medical coverage including dental, optometrist, mental care, maternity, registered Traditional Chinese Medicine (“TCM”) and Chiropractic.
  • Corporate membership discount and many more to explore.

We believe that you have what it takes to fit into the Touch ‘n Go family and help revolutionize the Fintech industry by paving the way to a cashless society. If you're ready to take the next step, apply now!

Touch ‘n Go is an organization that strives to provide Equal Opportunity Employment, based on merit, qualifications, capabilities, and calibre. It is Touch ‘n Go’s policy to not discriminate based on age, race, religion, colour or other personal status, identity or characteristics. Fair Opportunity is Our Value and Practice. Please advise us of any accommodations you may need by e-mailing: ********@naditech.io

Note: Only shortlisted candidates will be contacted.

Get your free, confidential resume review.
or drag and drop a PDF, DOC, DOCX, ODT, or PAGES file up to 5MB.