Enable job alerts via email!
Boost your interview chances
Create a job specific, tailored resume for higher success rate.
An innovative company is seeking a Kubernetes On-Premise Operations Engineer to manage their infrastructure. This role involves proactive monitoring, troubleshooting, and ensuring high availability of systems. You will collaborate closely with Level 3 Engineers to maintain seamless production operations. Your expertise in Kubernetes, networking, and monitoring tools like Prometheus and Grafana will be crucial. Join a dynamic team where your contributions will enhance system stability and performance, making a significant impact in a forward-thinking environment.
Location:Remote (only Bolivia candidates)
Type: Full-Time
Project Scope: Iridium Panama (end of 2025)
We are seeking a Kubernetes On-Premise Operations Engineer to manage and maintain our on-premise Kubernetes infrastructure. This role is focused on day-to-day operations, proactive monitoring, troubleshooting, and ensuring high availability and system stability. The engineer will collaborate closely with Level 3 Engineers who provide the infrastructure backbone, ensuring seamless and reliable production operations.
Tigo Sports – Available in 6 countries
KannelGateway – Used across 9 countries
Apply patches and updates
Monitor and troubleshoot performance issues
Incident Management & On-Call Support
Participate in on-call rotation
Respond to incidents, perform root cause analysis (RCA), and document resolutions
Networking & Ingress Management
Operate and troubleshoot Cilium, Nginx Ingress Controller, and Traefik
Storage & Databases
Support and maintain NFS, MongoDB, MySQL, PostgreSQL ensuring performance and data integrity
Observability & Monitoring
Manage Prometheus, Grafana, and Loki for proactive alerting and system logging
Automation & Configuration Management
Use Helm, Ansible, and CI/CD pipelines to apply and manage infrastructure configurations
Production Deployments
Execute, monitor, and manage production deployments with proper rollback strategies
OS & Security Management
Maintain Ubuntu-based systems, ensuring they are patched, secure, and performant
5+ years in Operations, SRE, or DevOps roles
Strong troubleshooting skills in:
Networking
Proficient in monitoring tools: Prometheus, Grafana, Loki
Familiar with operational processes, incident management, and runbooks
Experience with Helm, Ansible, and optionally Terraform
Prior experience with production on-call support and incident resolution