Job Search and Career Advice Platform

¡Activa las notificaciones laborales por email!

Principal SRE – Cloud Automation & AI Platforms (Zapopan)

Oracle

Jalisco

Presencial

MXN 1,441,000 - 2,163,000

Jornada completa

Hoy
Sé de los primeros/as/es en solicitar esta vacante

Genera un currículum adaptado en cuestión de minutos

Consigue la entrevista y gana más. Más información

Descripción de la vacante

A leading cloud solutions provider is seeking a skilled SRE with AI/ML expertise and robust application engineering skills. The role includes end-to-end service ownership, incident management, and developing solutions with large language models. Candidates should have over 10 years of experience in software engineering in cloud environments, with strong skills in distributed systems, Java/Python, and automation. This position is based in Mexico, Jalisco, and requires proficiency in English.

Formación

  • 10+ years of software engineering experience in cloud environments.
  • Strong in distributed systems/microservices with Java/Python and SQL.
  • Proven SDLC excellence with a focus on code quality and CI/CD.

Responsabilidades

  • Design end-to-end service ownership focusing on telemetry, security, and performance.
  • Lead incident management and postmortems to improve service reliability.
  • Develop and optimize solutions for AI/ML deployment and operations.

Conocimientos

Software engineering
AI/ML expertise
Java/Python programming
SQL/Data modeling
SRE/DevOps expertise
Excellent communication

Educación

BS/MS in Computer Science or related field

Herramientas

Cloud and containers
Observability tools
DevOps tooling
CI/CD tools
Descripción del empleo

This role requires a SRE mindset combined with AI/ML expertise and strong application engineering skills across public and private cloud environments.

Qualifications

Career Level - IC4

Key Responsibilities
  • End-to-end service ownership: design for telemetry, security, resiliency, scalability, and performance; lead sizing/architecture; drive service health reviews and process simplification.
  • Incident management and prevention: lead postmortems/RCAs, coordinate fixes, define repair items, and implement data-driven prevention and continuous improvement.
  • AI/ML and GenAI delivery: design and integrate solutions with LLMs, RAG, agentic workflows, and conversational AI; build low-latency model serving and retraining pipelines.
  • Application engineering: develop performant microservices for distributed, containerized, cloud-native systems.
  • Automation: eliminate toil by automating operational workflows, recovery procedures, code delivery, and configuration management; build internal tools and reusable scripts/services to accelerate delivery and reduce errors.
  • Observability: define and implement monitoring, logging, alerting, and tracing strategies; establish SLOs/SLIs/error budgets; improve diagnostics and performance visibility for rapid triage.
  • Cross-functional collaboration: partner with product, operations, and data teams to translate requirements into secure, scalable solutions; communicate effectively with technical and non-technical stakeholders.
Minimum Qualifications
  • BS/MS in Computer Science or related field; 10+ years of software engineering in cloud environments.
  • Strong in distributed systems/microservices using java / python; SQL/data modeling; python for AI/automation.
  • SRE/DevOps expertise: systems and networking fundamentals, application security, observability, performance analysis, and incident response.
  • Proven SDLC excellence: code quality, reviews, version control, CI/CD, testing, and release engineering.
  • Excellent written and verbal communication; English fluency.
Preferred/Technical Skills
  • AI/ML/GenAI: experience with foundational models, RAG, agentic architectures; model deployment, optimization, monitoring, and retraining.
  • Cloud and containers: experience with containerization, orchestration, and resilient, fault-tolerant microservices.
  • Observability: hands-on experience designing dashboards, alerts, traces, logs, and metrics; defining SLOs/SLIs and error budgets; on-call readiness and runbook quality.
  • Operations: performance tuning across java / python and SQL for large-scale enterprise applications; strong Linux/Unix expertise; capacity planning and reliability reviews.
  • Automation and scripting: proficiency in scripting to automate operational workflows, build tooling, and CI/CD tasks (e.g., shell scripting, python, configuration-as-code, task runners).
  • Familiarity with enterprise ERP applications and standard DevOps tooling and practices.
Consigue la evaluación confidencial y gratuita de tu currículum.
o arrastra un archivo en formato PDF, DOC, DOCX, ODT o PAGES de hasta 5 MB.