Ativa os alertas de emprego por e-mail!

Platform Operations Team Lead (Brazil-based)

Mindhive Global

Manaus

Híbrido

BRL 250.000 - 350.000

Tempo integral

Ontem

Torna-te num dos primeiros candidatos

Cria um currículo personalizado em poucos minutos

Consegue uma entrevista e ganha mais. Sabe mais

Resumo da oferta

A leading AI company is seeking a Platform Operations Team Lead in Brazil to oversee the operational team and ensure system reliability across multiple time zones. This hands-on leadership position focuses on monitoring, incident response, and improving operational processes while collaborating closely with engineering teams. The ideal candidate has strong technical skills, experience in distributed team leadership, and proficiency in both English and Portuguese. This role supports hybrid work flexibility.

Qualificações

Experience leading distributed teams across multiple time zones.
Strong background in DevOps, SRE, or Production Engineering environments.
Experience running incident response, on-call processes, or follow-the-sun operations.

Responsabilidades

Lead and grow the Platform Operations team across Brazil and Portugal.
Own the quality and accuracy of Datadog dashboards and operational visibility.
Mentor engineers and create a culture of ownership and continuous improvement.

Conhecimentos

Operational Excellence

Leadership & Mentorship

Systems Thinking

Collaboration

Calm Under Pressure

Continuous Improvement

Excellent communication in English and Portuguese

Ferramentas

Datadog

AWS

Containerization (Docker)

Kubernetes / K3S

IaC tools

Solid programming skills in Python

About the Role Mindhive builds AI-powered vision systems that transform industrial production. As we scale globally, reliability, observability, and rapid issue response are critical. The Platform Operations Team Lead (Brazil) plays a central role in ensuring our systems remain healthy across LATAM and European time zones.

You will lead the Platform Operations function — the customer‑adjacent, reliability‑focused counterpart to our Platform Engineering team in New Zealand. Your team will ensure our deployed systems are monitored, stable, recoverable, and well‑understood by the rest of the business.

What this role focuses on

monitoring, alert quality, and fast incident response
supporting on‑premise and edge deployments
improving operational processes and tooling
follow‑the‑sun coverage across Brazil and Portugal
ensuring operational readiness for engineering‑led upgrades

You will collaborate closely with the NZ‑based Platform Engineering team, who drive deep engineering projects (Puppet 8 rollout, Python 3.13 upgrade, CDK migration, CI/CD consistency, platform hardening, test‑rig reliability). Together, you will form Mindhive's global reliability backbone.

This is a hands‑on leadership role with high impact on customer experience, system uptime, and our ability to scale installations worldwide.

About You

You are a strong operational leader with deep hands‑on technical skills. You thrive in live production environments, enjoy solving real‑world system issues, and understand how to build reliable systems across time zones.

Key strengths

structured incident response
observability and monitoring
improving operational processes
leading teams in distributed, multicultural environments
working close to customers to ensure uptime and stability

You care about technical quality, clarity, and people — and you bring a mindset focused on resilience, collaboration, and steady improvement.

Key Competencies

Operational Excellence - builds systems, processes, and behaviours that improve stability and reliability.
Leadership & Mentorship - develops engineers and coordinates distributed teams.
Systems Thinking - sees the interplay between cloud, edge hardware, software, people, and processes.
Collaboration - works closely and constructively with NZ Platform Engineering and cross‑functional teams.
Calm Under Pressure - handles incidents and live issues with clarity and good judgement.
Continuous Improvement - always looking for ways to automate, simplify, and strengthen operations.

Key Responsibilities

Platform Operations Leadership

Lead and grow the Platform Operations team across Brazil and Portugal.

Build a high‑performing follow‑the‑sun operational capability that supports both internal teams and customers.

Establish clear daily operational rhythms, including alert review, ticket management, and incident response.

Team Leadership & Culture

Mentor engineers and technicians across Brazil and Portugal.

Create a culture of ownership and continuous improvement.

Ensure communication is clear, predictable, and aligned with our values.

Build a team that is highly accountable, collaborative, and customer‑focused.

Observability & Monitoring

Own the quality and accuracy of Datadog dashboards, alerts, service catalog, resource catalog, and operational visibility.

Reduce alert noise, improve signal quality, and ensure teams receive actionable information.

Develop and maintain runbooks, playbooks, and operational documentation.

Incident Response & Reliability

Oversee first‑line and second‑line incident response during LATAM and EU hours.

Ensure fast, structured triage for issues across cloud, on‑premise, and edge deployments.

Maintain clear escalation paths and strong communication practices during incidents.

Partner with Implementation and Customer Success teams to resolve client‑facing issues.

Collaboration with Platform Engineering (NZ)

Act as the operational counterpart to NZ Platform Engineering.

Ensure operational readiness for major engineering initiatives, such as :

Puppet 8 migration
Python 3.13 upgrade
CDK migration
CI/CD unification
Platform hardening
Test rig and E2E reliability improvements

Provide field feedback, operational insights, and rollout support for these improvements.

System Health & Operational Excellence

Monitor the health of live systems across sites and proactively identify stability risks.

Help drive improvements in

edge hardware reliability
network stability
server provisioning consistency
observability for both cloud and on‑prem components

Work with teams to reduce operational toil and automate repetitive tasks.

Required Skills & Experience

Leadership & Communication

Experience leading distributed teams across multiple time zones.

Excellent communication in English and Portuguese.

Ability to collaborate effectively with engineering, implementation, and customer‑facing teams.

Strong organisational skills with ability to manage competing priorities.

Technical

Strong background in DevOps, SRE, or Production Engineering environments.

Hands‑on experience operating hybrid cloud + on‑premise / edge systems.

Proficiency with

Datadog (or similar observability platforms)
AWS (IAM, networking, security, monitoring)
Containerization (Docker)
Kubernetes / K3S
IaC tools (AWS CDK ideal)

Solid programming skills in Python (TypeScript / JavaScript is a plus).

Understanding of security best practices (identity, access, endpoint, and network security).

Operational

Experience running incident response, on‑call processes, or follow‑the‑sun operations.

Proven ability to write and maintain runbooks, playbooks, and operational documentation.

Experience supporting industrial, IoT, or hardware‑integrated systems (ideal).

About Us

Mindhive Ltd is a fast‑moving AI company using machine learning and computer vision to reimagine industrial systems. Our products run across cloud, on‑premise, and edge deployments, bringing AI performance and reliability directly to the factory floor.

We care deeply about people, quality, and impact. We work collaboratively, iterate quickly, and tackle meaningful, complex problems.

Mindhive is a New Zealand Hi‑Tech Awards winner, recognised for innovation and impact in software, AI, and advanced manufacturing.

Work Environment & Flexibility

We support hybrid and remote work, with our people distributed across Brazil, Portugal, Italy, Japan and New Zealand. We trust each other to deliver results in ways that suit our lives while maximising our collective impact. We move quickly, adapt fast, and support each other through the ups and downs that come with building something new and meaningful.

Our Values

Relentless Curiosity - we explore deeply, question assumptions, and seek better ways.
Authentic Humanity - we support and care for people first.
Inclusive Connection - we collaborate openly and build strong relationships with customers and colleagues.
Determination to Deliver - we strive to do the right thing, consistently and with purpose.

Obtém a tua avaliação gratuita e confidencial do currículo.

ou arrasta um ficheiro em formato PDF, DOC, DOCX, ODT ou PAGES até 5 MB.