¡Activa las notificaciones laborales por email!

Principal SaaS Capacity Engineer

Oracle

Región Centro

Presencial

MXN 400,000 - 600,000

Jornada completa

Hoy
Sé de los primeros/as/es en solicitar esta vacante

Descripción de la vacante

A global technology company in Jalisco seeks a seasoned professional for capacity engineering, focusing on SaaS optimization and AI/ML model integration. Ideal candidates will have strong cloud infrastructure experience, relevant degrees, and proven skills in programming and automation tools. Join a collaborative team to enhance operational excellence and contribute to innovative capacity planning strategies.

Formación

  • 5+ years of experience in cloud infrastructure or related roles.
  • Hands-on experience with large-scale distributed systems.
  • Deep understanding of cloud capacity topology.

Responsabilidades

  • Ensure SaaS production capacity availability and optimization.
  • Apply AI/ML models for capacity forecasting and anomaly detection.
  • Collaborate with Product teams on AI-driven strategies.

Conocimientos

Cloud infrastructure experience
Programming and scripting (Python, Go, Shell, SQL)
Expertise in AI/ML models for capacity engineering
Infrastructure-as-code (Terraform, Ansible, Helm, Kubernetes)
AIOps tools knowledge (Datadog, Dynatrace, Splunk)

Educación

Bachelor’s or Master’s degree in Computer Science or related field
Descripción del empleo
Required Qualifications
  • Bachelor’s or Master’s degree in Computer Science, Electrical Engineering, Cloud/Systems Engineering, or a related field.
  • 5+ years of experience in cloud infrastructure, SaaS operations, or capacity engineering roles.
  • Hands-on experience with large-scale distributed systems, OCI (or AWS, Azure, GCP), and SaaS production environments.
  • Strong programming and scripting experience (Python, Go, Shell, SQL) for automation and AI/ML model deployment.
  • Proven experience deploying AI/ML solutions for capacity forecasting, anomaly detection, and intelligent workload tuning.
  • Deep understanding of cloud capacity topology and distributed service dependencies.
  • Proficiency with infrastructure-as-code (Terraform, Ansible, Helm, Kubernetes).
  • Familiarity with AIOps tools and AI-driven observability platforms (Datadog, Dynatrace, Splunk, or similar).
  • Knowledge of resiliency and disaster recovery strategies, including AI-simulated failure modeling.
Preferred Qualifications
  • Advanced degree (Master’s/PhD) with specialization in AI, ML, Data Science, or distributed systems engineering.
  • Experience building and deploying self-healing, AI-driven automation at scale in a SaaS environment.
  • Domain expertise in reinforcement learning applications for automated resource optimization.
  • Direct exposure to Oracle Cloud Infrastructure (OCI) systems and tools.
  • Experience with cloud-native AI/ML services, MLOps, and continuous model monitoring.
Competencies and Skills
  • Expertise in designing, developing, and deploying AI/ML models for cloud infrastructure use cases (demand forecasting, anomaly detection, workload optimization).
  • Advanced proficiency in automation, orchestration, and self-healing system architectures.
  • Skilled in communicating technical concepts, AI-powered analytics, and strategic insights to engineering and executive audiences.
  • Strong analytical and critical thinking skills, with a deep data-driven mindset.
  • Curiosity and initiative to explore APIs, system profiles, and operational anomalies, translating technical findings into impactful business outcomes.
  • Highly collaborative, adaptive, and passionate about operational excellence and continuous learning.
  • Ability to influence cross-team priorities and drive best practices in AI-enhanced capacity engineering.
Career Level

IC4

Key Responsibilities
  • Service Accountability: Ensure SaaS production capacity availability, optimization, scaling automation, reserve management, and quota governance.
  • AI/ML Integration: Apply AI/ML models for predictive capacity forecasting, anomaly detection, and workload auto-tuning to anticipate demand spikes and prevent outages.
  • Observability & AIOps: Leverage AI-powered observability and AIOps platforms for end-to-end system monitoring, intelligent alerting, and automated incident mitigation.
  • Strategic Partnership: Collaborate with Product and Development teams to design, validate, and align AI-driven scaling and capacity planning strategies with new launches and initiatives.
  • Automation & Orchestration: Design, implement, and optimize automation and orchestration pipelines, including self-healing systems, policy-driven provisioning, and disaster recovery simulations, using AI to enhance reliability and operational resilience.
  • Data-Driven Decision Support: Deliver advanced instrumentation, AI-powered analytics, and actionable dashboards to inform executives, engineering teams, and stakeholders.
  • Technical Leadership: Translate complex OCI stack and cloud platform resources (compute, storage, DB, networking) into business-ready, AI-enhanced capacity solutions and performance profiles.
  • Simulation & Resiliency: Use AI/ML models to simulate, validate, and improve resiliency and disaster recovery scenarios for faster, more robust recovery response.
  • Collaboration & Communication: Present AI-driven insights, risks, and recommendations to engineering teams, ICs, and executives to illuminate capacity trends and data-driven priorities.
  • Continuous Innovation: Assess new AI/ML techniques, AIOps platforms, and automation tools for ongoing improvements in infrastructure reliability, scalability, and cost optimization.
Consigue la evaluación confidencial y gratuita de tu currículum.
o arrastra un archivo en formato PDF, DOC, DOCX, ODT o PAGES de hasta 5 MB.