
Enable job alerts via email!
Generate a tailored resume in minutes
Land an interview and earn more. Learn more
A leading tech company in Singapore is seeking an AIOps Engineer to oversee their GCP environment, ensuring reliable operations and management of cloud infrastructure. The role focuses on fault-tolerant application design, implementing CI/CD pipelines, and collaborating with data science teams for machine learning model management. Ideal candidates should have over 3 years of DevOps experience, with strong proficiency in GCP and AI Ops.
The AIOps Engineer will join the Channel Sales and Operations team and work on their foundational platform, which powers a wide array of projects and provides scalable, reliable services for internal and external customers. You'll play a key role in ensuring the seamless operation and management of our cloud infrastructure, with a focus on the foundational layer that supports machine learning and AI-powered solutions.
You will oversee the GCP environment, including Google Kubernetes Engine (GKE) and related services, to ensure reliable operations across the platform. You will design, implement, and maintain fault-tolerant applications within the cloud to support both internal and customer-facing applications. You will establish and maintain monitoring and alerting solutions for the platform to ensure high availability, scalability, and security. You will proactively identify and troubleshoot issues within GCP infrastructure to minimize downtime and ensure optimal performance. You will develop and maintain CI/CD pipelines, manage database configurations, and maintain a stable foundation layer for multiple applications leveraging the platform. You will automate deployments and optimize processes to support continuous integration, delivery, and deployment workflows. You will ensure low-latency, high-throughput performance of deployed ML models, with a focus on scalability and efficient training/inference processes. You will collaborate with data science teams to maintain and scale models, monitor model performance, and ensure robust model management practices.
NICE TO HAVES