The Manager, DevOps Release Engineering will lead the design, automation, and execution of release pipelines across hybrid infrastructure environments. This role requires deep expertise in cloud computing platforms (primarily AWS), VMware stacks, Linux and Wintel systems, storage, and data center servers. The role will oversee DevOps practices with a strong focus on Infrastructure as Code (IaC), reliability engineering, compliance, and operational excellence in a 24x7 mission‑critical environment.
The role will also be expected to contribute to technical documentation, mentor team members, and participate in project planning and implementation.
KEY RESPONSIBILITIES
- Pipeline & Release Management: Lead the design, implementation, and optimization of CI/CD pipelines for product and engineering deployments.
- Infrastructure as Code (IaC): Drive automation using Terraform, Ansible, Atlantis, and related tooling to manage infrastructure across on‑prem and cloud environments.
- Cloud & Hybrid Operations: Oversee deployments and operations across cloud computing platform (primarily AWS) including containers, RDS, serverless stacks, IAM, and SSO.
- Reliability Engineering: Champion SRE principles, ensuring high availability (HA), disaster recovery (DR), monitoring, and alerting practices.
- Toolchain Leadership: Manage and integrate tools such as GitLab, Bitbucket, Jira, Confluence, Slack, PagerDuty, Grafana, Datadog, Prometheus, and OpenTelemetry.
- Security & Compliance: Ensure adherence to compliance frameworks, including financial regulations (e.g., Bank Negara RMIT), and enforce security standards across deployments.
- Collaboration & Leadership: Partner with product, engineering, and operations teams to drive automation, scripting, scheduling, and continuous improvement.
- Customer & Stakeholder Engagement: Provide technical leadership in customer‑facing scenarios, especially within financial services environments.
- Documentation & Governance: Establish and maintain technical documentation, release governance, and decision‑making frameworks.
- Cloud‑Native & AI‑Driven Innovations: Manage serverless and cloud‑native services including Functions‑as‑a‑Service (e.g., AWS Lambda, Google Cloud Functions), queuing and messaging systems (e.g., SQS, Kafka, Pub/Sub), and integrate AI‑driven capabilities (predictive monitoring, intelligent automation, anomaly detection) to enhance DevOps efficiency and resilience.
WHAT DOES IT TAKE TO BE SUCCESSFUL
Qualifications
- Bachelor’s degree in computer science, Information Technology, or related field.
- Proven expertise in VMware, Linux, Wintel, storage, and data centre operations.
- Strong knowledge of cloud computing platforms (AWS, Alibaba, GCP, Azure).
- Hands‑on experience with IaC tools (Terraform, Ansible, CloudFormation) and CI/CD pipelines.
- Familiarity with containers (Docker, Kubernetes), serverless architectures, IAM, and SSO.
- Deep understanding of network computing, monitoring, alerting, and email systems.
- Knowledge of compliance frameworks and financial industry standards.
- Proficiency with source code repositories and CI/CD services such as GitLab, Bitbucket, and GitHub, including branching strategies, pipeline orchestration, and release governance.
Work Experience
- 5+ years of hands‑on experience in infrastructure engineering, automation, and DevOps practices across hybrid environments (data centre and cloud).
- Strong background in VMware stacks, Linux/Wintel systems, storage, and server administration within data centres.
- Practical experience in cloud computing platforms primarily AWS working on IaaS, PaaS, and SaaS.
- Extensive involvement in CI/CD pipeline development, release engineering, and Infrastructure as Code (Terraform, Ansible, CloudFormation).
- Experience with monitoring and observability tools (Grafana, Datadog, Prometheus, OpenTelemetry) in 24x7 operational environments.
- Demonstrated ability to collaborate with product and engineering teams on deployments, automation, and reliability improvements.
Additional Professional Qualities
- Technical Leadership: Ability to lead complex technical decisions and guide engineering teams.
- Automation & Scripting: Proficiency in scripting languages (Python, Bash, PowerShell) and automation frameworks.
- Monitoring & Observability: Expertise with Grafana, Datadog, Prometheus, OpenTelemetry, etc.
- Reliability & Resilience: Strong focus on scalability, DR, HA, and operational excellence in 24x7 environments.
- Collaboration & Communication: Skilled in cross‑functional collaboration, documentation, and stakeholder communication.
- Compliance & Security Awareness: Knowledge of industry standards, regulatory requirements, and secure DevOps practices.