Role: Principal Data Platform Engineering Lead (Cloud, AWS & Snowflake)
Location: Malaysia
We are partnering with a leading telecom client to hire a Principal Data Platform Engineering Lead in Malaysia. This role ensures the continuous reliability, scalability, and security of cloud data platform built on AWS, Snowflake, SageMaker, and Datapipe. The incumbent drives operational excellence, automation, and engineering maturity across the platform, while prototyping and rolling out new platform capabilities that enable agility, innovation, and performance for data and AI workloads across the enterprise.
Accountabilities
- Own end-to-end infrastructure and platform operations of the DXP Data Platform across AWS, Snowflake, and SageMaker environments (DEV, SIT, PROD).
- Lead the design, build, and automation of data platform engineering and DevOps practices, ensuring continuous improvement and zero-downtime operations.
- Lead the prototyping, implementation, and rollout of new platform capabilities and services across AWS, Snowflake, and SageMaker.
- Implement governance, security, and compliance standards & improvements for cloud infrastructure, data access, and network controls.
- Drive operational excellence through monitoring, alerting, cost optimization, and performance tuning.
- Manage a hybrid team of internal platform engineers and vendor‑augmented resources supporting Day 2 operations and enhancements.
- Partner with Data Engineering, Architecture, Security, Infrastructure & Tooling teams to ensure aligned technical roadmaps, compliance readiness, and audit traceability.
Requirements
- 8-10 years of experience in cloud and platform engineering, with extensive experience on AWS‑based data platforms.
- Proven leadership of cross‑functional engineering teams managing production‑grade, multi‑environment platforms.
- Hands‑on expertise in:
- AWS Services: VPC, EC2, S3, RDS, Lambda, KMS, CloudFormation/CDK, Transfer Family, CloudWatch, CloudTrail.
- Snowflake: administration, RBAC, warehouse optimization, DevOps automation, Cortex AI, and Streamlit integration.
- EKS / Airflow / Airbyte (Datapipe): container orchestration, CI/CD pipelines, and deployment automation.
- SageMaker: multi‑domain setup, pipeline management, Studio/Canvas lifecycle, and MLOps enablement.
- Monitoring & Observability: CloudWatch, Splunk, Snowflake Account Usage, cost dashboards, PagerDuty, Slack, ServiceNow.
- Demonstrated success in prototyping, implementing, and scaling new cloud and data platform features into production.
- Experience managing Day 2 operations, incident response, and SRE‑driven performance stabilization.
- Familiarity with machine learning integration and model lifecycle management.
- Experience enforcing ITSec and compliance standards (IAM, KMS, PDPA/GDPR).
- Proven success in transitioning platform operations from vendor‑managed to in‑house ownership.