At AIA we’ve started an exciting movement to create a healthier, more sustainable future for everyone.
If you believe in developing a better tomorrow, read on.
About the Role
The incumbent will be a key driver of our engineering-driven culture and be responsible for designing and implementing strategic initiatives that leverage AI and Machine Learning to create self-healing, automated, and cloud-native operational systems, providing oversight on the enterprise architecture to align with AIA architecture governance. This is a hands‑on leadership role for a technically proficient individual with the vision and negotiation skills, leading the team of solution analysts and DevOps engineers to champion change, eliminate manual processes, and foster a culture of collaboration across our engineering, DevOps, and Site Reliability Engineering (SRE) teams.
Key Responsibilities
- Strategy & Leadership : Design and implement a multi-year strategy to automate and optimize IT operations using AI / ML-driven solutions, predictive analytics, and self-healing systems. Drive the cultural change towards a proactive, autonomous operations model and continuous delivery mindset. Evangelize and implement modern SRE practices across engineering teams.
- DevOps Implementation : Lead the organizational cultural transformation towards engineering-driven practices and DevOps excellence
- Process Transformation : Spearhead initiatives to eliminate traditional operations bottlenecks, automate manual processes, and establish new standards for operational efficiency and system reliability.
- Team Collaboration : Foster a collaborative and integrated environment across product engineering, DevOps, and Site Reliability Engineering (SRE) teams to ensure shared ownership and accountability for the full application lifecycle.
- Platform Modernization : Guide the evolution of our CI / CD pipelines, container orchestration on Kubernetes, and cloud-native infrastructure to support autonomous and proactive operations.
- Cultural Change : Act as a change agent within the organization, articulating the vision for an engineering-driven culture and using excellent communication and negotiation skills to build consensus and drive adoption of new methodologies.
- Hands‑on Contribution : Remain deeply hands‑on with the technology stack, actively contributing to architectural design, code reviews, and key technical decisions to ensure a seamless bridge between innovation and execution.
Required Technical Expertise
- A minimum of 10-15 years of deep Enterprise Architect, DevOps & SRE Experience : Extensive, hands‑on experience in Enterprise Architect, DevOps and SRE principles, including CI / CD pipeline automation, infrastructure-as-code, and observability.
- AI / ML for Operations : Proven experience in designing or implementing AI / ML-driven solutions for IT operations, covering both infrastructure and application observability, such as log analysis, anomaly detection, and predictive maintenance.
- Modern Technology Stack : Strong practical experience with technologies : Languages : Java, NodeJs, PythonFront-end : ReactJsContainerization & Orchestration : Docker and KubernetesCI / CD : GitHub Actions, Bamboo, or similar toolsInfrastructure as Code & Observability Tools : Terraform, CloudFormation, ELK, Dynatrace, Prometheus, Grafana, Datadog etc.Proficiency in scripting (Python, PowerShell)
- Cloud‑Native Architecture : Expertise in designing and managing cloud‑native systems, microservices architectures, and distributed systems.
- Experience with microservices architecture and APM / API management.
- Knowledge of security best practices and DevSecOps implementations.
- Ensure automated systems are complied with security, governance, and regulatory standards.
- Stability of system and services.
- Timely and quality deliverables.
- Good quality solution design by implementing different architecture pillars, such as security, scalability, maintainability, performance, etc.
- Architecture strategy, standards, patterns, governance, and audit reporting.
- Improvements on build automation leveraging CI / CD processes, automated testing, unit testing, code coverage and other software development best practices.
- Degree from a recognized University in Information Technology, Computer Science, Computer Engineering
- Certifications in AIOps, DevOps, AWS / Azure / GCP, ITIL, or related fields are a plus. GenAI certifications (e.g., NVIDIA, Google, Databricks) is highly desirable
Special skills
- Requires in-depth experience, knowledge and skills in own discipline
- Uses best practices and knowledge of internal / external business issues to improve products or services
- Ability to work in high‑pressure environment, troubleshoot complex issues across on‑prem and cloud quickly, and successfully handle multiple priorities.
- Have systematic problem‑solving approach, effective communications skills and have sense of ownership and drive.
- Works independently with minimal guidance
- Manage resource and ability to perform capacity planning
- Applies best practices and knowledge of internal / external business issues to improve products or services in own discipline
- Has expertise in own discipline
- Solves moderately complex problems; takes a new perspective on existing solutions
- Interprets customer needs, assesses requirements and identifies solutions to non‑standard requests
- Explains information and persuades others in straightforward situations
- Makes decisions for own work priorities and allocation of time to meet deadlines
- Is accountable for technical contribution to project team or sub‑team
- Builds awareness of costs related to own work
This incumbent will be reporting to CTO and manage 10 – 15 team members.