Enable job alerts via email!

(M) Staffing – 3x Operations Engineer

Believe Solutions

United States

Remote

USD 80,000 - 120,000

Full time

2 days ago

Be an early applicant

Boost your interview chances

Create a job specific, tailored resume for higher success rate.

Job summary

An innovative company is seeking a Kubernetes On-Premise Operations Engineer to manage their infrastructure. This role involves proactive monitoring, troubleshooting, and ensuring high availability of systems. You will collaborate closely with Level 3 Engineers to maintain seamless production operations. Your expertise in Kubernetes, networking, and monitoring tools like Prometheus and Grafana will be crucial. Join a dynamic team where your contributions will enhance system stability and performance, making a significant impact in a forward-thinking environment.

Qualifications

5+ years in Operations, SRE, or DevOps roles.
Strong troubleshooting skills in Networking and Monitoring tools.

Responsibilities

Apply patches and updates, monitor and troubleshoot performance issues.
Participate in on-call rotation and respond to incidents.

Skills

Kubernetes

Troubleshooting

Networking

Prometheus

Grafana

Loki

Ansible

Helm

CI/CD

Incident Management

Tools

Prometheus

Grafana

Loki

Helm

Ansible

Terraform

Location:Remote (only Bolivia candidates)
Type: Full-Time
Project Scope: Iridium Panama (end of 2025)

We are seeking a Kubernetes On-Premise Operations Engineer to manage and maintain our on-premise Kubernetes infrastructure. This role is focused on day-to-day operations, proactive monitoring, troubleshooting, and ensuring high availability and system stability. The engineer will collaborate closely with Level 3 Engineers who provide the infrastructure backbone, ensuring seamless and reliable production operations.

Scope of Applications Supported

Tigo Sports – Available in 6 countries

KannelGateway – Used across 9 countries

Key Responsibilities

Apply patches and updates

Monitor and troubleshoot performance issues

Incident Management & On-Call Support

Participate in on-call rotation

Respond to incidents, perform root cause analysis (RCA), and document resolutions

Networking & Ingress Management

Operate and troubleshoot Cilium, Nginx Ingress Controller, and Traefik

Storage & Databases

Support and maintain NFS, MongoDB, MySQL, PostgreSQL ensuring performance and data integrity

Observability & Monitoring

Manage Prometheus, Grafana, and Loki for proactive alerting and system logging

Automation & Configuration Management

Use Helm, Ansible, and CI/CD pipelines to apply and manage infrastructure configurations

Production Deployments

Execute, monitor, and manage production deployments with proper rollback strategies

OS & Security Management

Maintain Ubuntu-based systems, ensuring they are patched, secure, and performant

Requirements

5+ years in Operations, SRE, or DevOps roles

Strong troubleshooting skills in:

Networking

Proficient in monitoring tools: Prometheus, Grafana, Loki

Familiar with operational processes, incident management, and runbooks

Experience with Helm, Ansible, and optionally Terraform

Prior experience with production on-call support and incident resolution

Get your free, confidential resume review.

or drag and drop a PDF, DOC, DOCX, ODT, or PAGES file up to 5MB.

Similar jobs

Practice Manager

Lexington Medical Center

Columbia

On-site

USD 60,000 - 100,000

5 days ago

Be an early applicant

Senior Security Specialist - HVA Analyst

Planned Systems International, Inc.

Remote

USD 80,000 - 120,000

12 days ago

(M) Staffing – 3x Operations Engineer

Believe Solutions

United States

Remote

USD 80,000 - 120,000

Full time

Job summary

Qualifications

Responsibilities

Skills

Tools

Job description

Similar jobs

Practice Manager

Columbia

On-site

USD 60,000 - 100,000

Senior Security Specialist - HVA Analyst

Remote

USD 80,000 - 120,000