Job Search and Career Advice Platform

¡Activa las notificaciones laborales por email!

Site Reliability Developer

Oracle

Región Centro

Híbrido

MXN 2,118,000 - 2,825,000

Jornada completa

Hoy
Sé de los primeros/as/es en solicitar esta vacante

Genera un currículum adaptado en cuestión de minutos

Consigue la entrevista y gana más. Más información

Descripción de la vacante

A leading technology company is seeking a Distinguished Engineer to architect multi-environment infrastructure and lead deployment automation. The role requires 7+ years of experience with distributed systems and proficiency in tools like Terraform and Kubernetes. Candidates will mentor senior engineers and establish standards for reliability and security. This position is hybrid, allowing a blend of on-site and remote work to foster productivity and collaboration.

Formación

  • 7+ years of experience building and operating distributed systems.
  • Proven expertise in Terraform and Kubernetes operations.
  • Strong understanding of secure secret distribution.

Responsabilidades

  • Architect and evolve multi-environment infrastructure.
  • Lead deployment automation strategy.
  • Mentor senior engineers and codify expectations.

Conocimientos

Terraform module design
Kubernetes operations
Python
Bash scripting
Distributed systems

Herramientas

Terraform
Kubernetes
Cloudflare
OpenTelemetry
Descripción del empleo

HeyDonto builds reliable data pipelines that connect fragmented healthcare platforms to modern APIs.

We synchronize and standardize data from both on-premise and cloud-based EHR systems into clean, interoperable formats.

Our mission is simple: make healthcare data work the way software should — predictably, securely, and without silos.

The Role

As a Distinguished Engineer (L7) in the DevOps Tribe, you’ll define and evolve the infrastructure that powers HeyDonto’s ecosystem—from Kubernetes clusters and Terraform modules to developer tooling and multi-environment automation. You’ll lead through technical depth, setting standards for reproducibility, reliability, and cloud portability across every environment.

What You’ll Do
  • Architect and evolve multi‑environment infrastructure across GKE, CloudSQL, Confluent, Temporal, and Cloudflare, encoded in reusable Terraform modules and remote state.
  • Lead deployment automation strategy —CLI orchestration and Helm releases—to keep clusters converged deterministically across environments.
  • Design and enforce the secrets lifecycle integrating Terraform outputs, SOPS, and 1Password for secure, auditable rotation and distribution.
  • Define and implement automated drift detection, IAM regression suites, and compliance guardrails for infrastructure reliability.
  • Own the CUE‑based configuration system that exports Compose stacks, environment templates, secrets, and Helm values through just export‑cue.
  • Shape environment parity and portability —abstract provider specifics behind clear interfaces (DNS, storage, ingress, identity) to reduce lock‑in and enable repeatable deployments across clouds
  • Standardize vendor‑neutral telemetry with OpenTelemetry and consistent log/metric conventions to keep observability portable.
  • Establish portable identity patterns (OIDC, workload identity, least‑privilege IAM mappings) that translate across providers.
  • Mentor senior engineers, codify expectations in documentation and tooling, and steward technical decisions across tribes.
  • Lead incident response and RCA, strengthening feedback loops between SRE and development teams.
Tech You’ll Work With
  • Languages: TypeScript, Python, Bash
  • Infrastructure: Terraform (multi‑provider), Helm, Kubernetes (GKE primary; portable to other managed K8s), Temporal Cloud, Confluent Cloud, Cloudflare
  • Cloud‑Agnostic Interfaces: OpenTelemetry, OIDC/OAuth2, CSI/Ingress abstractions, external‑DNS patterns, OCI registries
  • Configuration: CUE, Just, Docker Compose, SOPS, 1Password, env templates
  • CI/CD: GitHub Actions, Conventional Commits, automated drift and policy checks
What We Value
  • Clarity over cleverness — explicit, predictable systems.
  • Idempotency, type safety, and observability in everything we build.
  • Portability by design — clean interfaces, minimal provider coupling, documented escape hatches.
  • Shared ownership of infrastructure and developer experience.
  • Documentation and tooling as part of engineering craft.
  • Reliability as the ultimate measure of quality.
Qualifications
Required
  • 7+ years building and operating distributed systems or production infrastructure.
  • Proven expertise with Terraform module design (multi‑provider), Kubernetes/Helm operations, and environment automation.
  • Experience designing portable architectures—clear separation of concerns, provider‑agnostic interfaces, and migration‑ready patterns.
  • Advanced knowledge of secure secret distribution with SOPS and 1Password.
  • Proficiency in Python, Node.js, and Bash for automation and operational tooling.
  • Strong understanding of Kafka, Temporal, and distributed workflow systems.
  • Track record of leading through influence—setting technical standards, mentoring seniors, and driving architectural coherence.
Preferred
  • Experience designing and implementing solutions across multiple cloud providers (e.g., AWS, GCP, Azure) to ensure resilience and avoid vendor lock‑in.
  • Hands‑on experience with OpenTelemetry rollouts to build a unified observability platform, helping proactively identify and resolve performance bottlenecks.
  • Solid understanding of Kubernetes networking, especially configuring Ingress controllers and managing traffic flow.
  • Familiarity with CUE or similar declarative configuration frameworks.
  • Open‑source contributions or published writing that demonstrates passion for systems thinking and quality craftsmanship.
Why HeyDonto

HeyDonto is a place where senior engineers work at depth. We build systems that last—secure, observable, portable, and self‑documenting. We believe in small expert teams solving hard problems the right way, with full ownership from concept to delivery. If you value clarity, autonomy, and precision—and you want your work to make a measurable difference in real systems—this is the place for you.

  • Work Type: Hybrid
  • If you are interested in applying, please send your English Resume through LinkedIn or send it to mentioning the name of the role you are applying for in the subject of the email.
When applying, please include:
  • Salary expectations
  • Availability for interviews
Consigue la evaluación confidencial y gratuita de tu currículum.
o arrastra un archivo en formato PDF, DOC, DOCX, ODT o PAGES de hasta 5 MB.