Sé de los primeros/as/es en solicitar esta vacante
Descripción de la vacante
A global technology company in Jalisco seeks a seasoned professional for capacity engineering, focusing on SaaS optimization and AI/ML model integration. Ideal candidates will have strong cloud infrastructure experience, relevant degrees, and proven skills in programming and automation tools. Join a collaborative team to enhance operational excellence and contribute to innovative capacity planning strategies.
Formación
5+ years of experience in cloud infrastructure or related roles.
Hands-on experience with large-scale distributed systems.
Deep understanding of cloud capacity topology.
Responsabilidades
Ensure SaaS production capacity availability and optimization.
Apply AI/ML models for capacity forecasting and anomaly detection.
Collaborate with Product teams on AI-driven strategies.
Conocimientos
Cloud infrastructure experience
Programming and scripting (Python, Go, Shell, SQL)
Expertise in AI/ML models for capacity engineering
Bachelor’s or Master’s degree in Computer Science or related field
Descripción del empleo
Required Qualifications
Bachelor’s or Master’s degree in Computer Science, Electrical Engineering, Cloud/Systems Engineering, or a related field.
5+ years of experience in cloud infrastructure, SaaS operations, or capacity engineering roles.
Hands-on experience with large-scale distributed systems, OCI (or AWS, Azure, GCP), and SaaS production environments.
Strong programming and scripting experience (Python, Go, Shell, SQL) for automation and AI/ML model deployment.
Proven experience deploying AI/ML solutions for capacity forecasting, anomaly detection, and intelligent workload tuning.
Deep understanding of cloud capacity topology and distributed service dependencies.
Proficiency with infrastructure-as-code (Terraform, Ansible, Helm, Kubernetes).
Familiarity with AIOps tools and AI-driven observability platforms (Datadog, Dynatrace, Splunk, or similar).
Knowledge of resiliency and disaster recovery strategies, including AI-simulated failure modeling.
Preferred Qualifications
Advanced degree (Master’s/PhD) with specialization in AI, ML, Data Science, or distributed systems engineering.
Experience building and deploying self-healing, AI-driven automation at scale in a SaaS environment.
Domain expertise in reinforcement learning applications for automated resource optimization.
Direct exposure to Oracle Cloud Infrastructure (OCI) systems and tools.
Experience with cloud-native AI/ML services, MLOps, and continuous model monitoring.
Competencies and Skills
Expertise in designing, developing, and deploying AI/ML models for cloud infrastructure use cases (demand forecasting, anomaly detection, workload optimization).
Advanced proficiency in automation, orchestration, and self-healing system architectures.
Skilled in communicating technical concepts, AI-powered analytics, and strategic insights to engineering and executive audiences.
Strong analytical and critical thinking skills, with a deep data-driven mindset.
Curiosity and initiative to explore APIs, system profiles, and operational anomalies, translating technical findings into impactful business outcomes.
Highly collaborative, adaptive, and passionate about operational excellence and continuous learning.
Ability to influence cross-team priorities and drive best practices in AI-enhanced capacity engineering.
Career Level
IC4
Key Responsibilities
Service Accountability: Ensure SaaS production capacity availability, optimization, scaling automation, reserve management, and quota governance.
AI/ML Integration: Apply AI/ML models for predictive capacity forecasting, anomaly detection, and workload auto-tuning to anticipate demand spikes and prevent outages.
Observability & AIOps: Leverage AI-powered observability and AIOps platforms for end-to-end system monitoring, intelligent alerting, and automated incident mitigation.
Strategic Partnership: Collaborate with Product and Development teams to design, validate, and align AI-driven scaling and capacity planning strategies with new launches and initiatives.
Automation & Orchestration: Design, implement, and optimize automation and orchestration pipelines, including self-healing systems, policy-driven provisioning, and disaster recovery simulations, using AI to enhance reliability and operational resilience.
Data-Driven Decision Support: Deliver advanced instrumentation, AI-powered analytics, and actionable dashboards to inform executives, engineering teams, and stakeholders.
Technical Leadership: Translate complex OCI stack and cloud platform resources (compute, storage, DB, networking) into business-ready, AI-enhanced capacity solutions and performance profiles.
Simulation & Resiliency: Use AI/ML models to simulate, validate, and improve resiliency and disaster recovery scenarios for faster, more robust recovery response.
Collaboration & Communication: Present AI-driven insights, risks, and recommendations to engineering teams, ICs, and executives to illuminate capacity trends and data-driven priorities.
Continuous Innovation: Assess new AI/ML techniques, AIOps platforms, and automation tools for ongoing improvements in infrastructure reliability, scalability, and cost optimization.
Consigue la evaluación confidencial y gratuita de tu currículum.
o arrastra un archivo en formato PDF, DOC, DOCX, ODT o PAGES de hasta 5 MB.
Me encontraba atascado, enviaba solicitudes sin respuesta hasta que empecé a usar JobLeads. Hicieron que mi currículum fuera imposible de ignorar por las empresas.
Sophie Reynolds
La evaluación de currículums de JobLeads me ayudó a solucionar algunos errores fundamentales. ¡Empecé a recibir invitaciones a entrevistas casi inmediatamente!
Daniel Fischer
Con la revisión de currículums de JobLeads, ¡mi currículum pasó rápidamente de ignorado a listo para entrevistas!