Enable job alerts via email!

Site Reliability Engineer (Ai Operations) - 6426

Buscojobs

Metro Manila

Hybrid

PHP 600,000 - 800,000

Full time

Today
Be an early applicant

Job summary

A global academic publisher is seeking a Site Reliability Engineer to enhance AI operations in educational technology. This hybrid role in Metro Manila involves implementing observability solutions, establishing ethical governance of AI services, and collaborating with teams to operationalise AI features. Candidates should have strong cloud platform experience, automation skills, and a degree in a relevant field. The organization offers competitive benefits and a culture of continuous growth.

Benefits

HMO Coverage
Flexible schedule
Paid Annual Leaves
Retirement package
Career growth opportunities

Qualifications

  • 3+ years in Site Reliability Engineering or related roles.
  • Strong experience with infrastructure as code.
  • Hands-on experience with CI/CD pipelines.

Responsibilities

  • Implement observability solutions for AI service deployments.
  • Establish governance frameworks for ethical AI use.
  • Collaborate with teams to operationalise AI features.

Skills

Site Reliability Engineering
Cloud platforms (AWS)
Automation (Python, Bash)
Monitoring systems (Datadog, Grafana)

Education

Degree in Computer Science or related field

Tools

Terraform
GitHub Actions
Job description
Overview

Work setup : Hybrid (open to x a week in the office)

Work schedule : AM to PM Manila time

Employment type : Permanent

Location : Makati City, Metro Manila

Pay range : Php , to Php ,

We value transparency and encourage applicants comfortable with this range to apply.

Discover a world of endless possibilities with Cambridge University Press & Assessment, a distinguished global academic publisher and assessment organization proudly affiliated with the prestigious University of Cambridge.

We are recruiting for a Site Reliability Engineer to be part of our Education Technology Team. As a Site Reliability Engineer (AI Operations), you\'ll be pioneering operational excellence for AI systems that are transforming how millions learn worldwide

Why Cambridge?

Cambridge University Press & Assessment is a world-renowned not-for-profit academic publisher and assessment organisation, proudly part of the prestigious University of Cambridge. With a legacy rooted in over years of educational excellence, we are dedicated to unlocking the potential of learners and educators across the globe.

Joining Cambridge\'s second largest global office in the Philippines —operating for over years with ,+ colleagues— means becoming a part of an extraordinary institution renowned worldwide. We are recognised as a Great Place to Work for three consecutive years, reflecting our inclusive culture, strong sense of purpose, and commitment to the professional growth and well-being of our people. At Cambridge, we don\'t just publish books or deliver tests—we empower progress, inspire curiosity, and champion the pursuit of knowledge.

What can you get from Cambridge?

At Cambridge, you\'ll become a part of a vibrant and forward-thinking community that transcends tradition, fostering a culture of continuous growth and personal development. Here, we provide the right environment for you to thrive, supporting your professional journey and empowering you to reach your highest potential, that is whyour pay philosophy is intricately tied to your skills and competencies, ensuring that your compensation aligns with the unique value you bring to the role you are applying for.

The organization offers a wide range of benefits and opportunities including :

  • Regular Employment on Day
  • HMO Coverage and Life Insurance on Day
  • Paid Annual Leaves (Vacation, Well-being, Flexible, Holiday, and Volunteering leaves)
  • Vesting / Retirement package
  • Opportunities for career growth and development
  • Access to well-being programs
  • Flexible schedule, hybrid work arrangement and work-life balance
  • Opportunity to collaborate with colleagues from diverse branches that will expand your horizons and enrich your understanding of different cultures
What will you do as a Site Reliability Engineer?

You'll be joining our Education Technology Platform Operations team at a pivotal moment as we embrace AI to enhance learning outcomes globally. Working alongside passionate technologists, you'll help us transform how we deploy and operate AI services - from large language models to intelligent automation platforms - ensuring they're reliable, cost-effective, and ethically sound.

In this role, you'll bridge the gap between cutting-edge AI innovation and production excellence. You'll establish the operational frameworks that allow us to deploy AI responsibly in education, always keeping learner safety and data protection at the forefront.

  • Drive innovation in AI operations by implementing observability solutions for LLM deployments, workflow automation platforms ( nn), and AI services across AWS Bedrock and Azure OpenAI
  • Make a real difference by establishing governance frameworks that ensure our AI services are ethical, compliant, and safe for educational use
  • Transform our approach to cost optimisation for AI workloads through intelligent caching, model selection, and resource allocation strategies
  • Collaborate with teams to operationalise AI features, sharing your expertise to help developers build production-ready, scalable AI solutions
  • Be continuously learning about emerging AI operational tools like Portkey and LiteLLM, bringing new approaches to improve reliability and efficiency
  • Strengthen our impact by implementing sustainable AI practices that consider the environmental footprint of compute-intensive workloads

Please review the attached job description for further details on the role.

What makes you the ideal candidate for this role?
  • Education & Experience : – years in Site Reliability Engineering or related roles, with proven application of operational excellence in emerging technologies. Degree or equivalent experience in Computer Science, Engineering, or related field.
  • Cloud & Infrastructure : Strong experience with cloud platforms, particularly AWS, including Infrastructure as Code (Terraform, CDK, CloudFormation) and cloud-native services.
  • Automation & Delivery : Skilled in delivering change through automation with strong scripting abilities (Python, Bash, etc.) and hands-on experience with CI / CD pipelines (GitHub Actions, Jenkins, Bitbucket Pipelines).
  • Monitoring & Reliability : Practical experience with monitoring and observability systems (Datadog, New Relic, Grafana, ELK / EFK stack) to ensure performance, availability, and incident response in distributed systems.
  • API & Distributed Systems : Knowledge of API management, rate limiting, scalability, and the complexities of distributed architectures, particularly for AI-related workloads.
  • AI & Emerging Tech : Familiarity with Large Language Models, cloud AI services, or workflow automation tools. Willingness to learn and apply new approaches to maximize impact in education technology.
  • Ways of Working : Enthusiastic about exploring possibilities with AI while maintaining operational rigor. Collaborative, curious, and aligned with the vision of using technology to unlock potential in learners worldwide.

This is more than a technical role - it's an opportunity to define how AI operates in educational technology, ensuring it's deployed responsibly and effectively. You'll be at the forefront of establishing best practices that could influence how the entire education sector approaches AI operations.

Get your free, confidential resume review.
or drag and drop a PDF, DOC, DOCX, ODT, or PAGES file up to 5MB.