Activez les alertes d’offres d’emploi par e-mail !

Platform Engineer (all) - Kubernetes Expert

Sonia Solutions

France

À distance

Confidentiel

Plein temps

Il y a 5 jours

Soyez parmi les premiers à postuler

Résumé du poste

A healthcare technology firm is looking for a Platform Engineer to manage core infrastructure and internal developer platform. Key responsibilities include designing Kubernetes environments, managing CI/CD pipelines, and supporting ML workloads. Ideal candidates have strong DevOps backgrounds and a collaborative mindset. This role offers remote work flexibility, competitive salary, and the chance to impact healthcare innovation.

Prestations

Competitive salary

Remote-first work option

Ownership of mission-critical platform

Focus on learning and experimentation

Qualifications

3-5+ years in Platform Engineering, DevOps, or SRE role.
Strong Kubernetes experience in production environments.
Experience managing cloud infrastructure, preferably OVHcloud.

Responsabilités

Design and manage Kubernetes clusters on OVHcloud.
Own and advance CI/CD pipelines for deployments.
Manage GPU workloads in Kubernetes for ML teams.

Connaissances

Kubernetes management

DevOps experience

CI/CD systems

Infrastructure as Code

Scripting in Python

Outils

Terraform

Grafana

ArgoCD

Let me introduce...

With Sonia, doctors are successful doctors. We create and deploy AI enhanced solutions that make doctors’ lives easier, patients’ care better, and healthcare systems more efficient. If you’re an intrinsically motivated self-starter who values impactful work, join us in revolutionizing healthcare.

We’re looking for a Platform Engineer (all) (levels: mid to senior) to take ownership of our core infrastructure and internal developer platform. You’ll design, build, and maintain our Kubernetes-based environments on OVHcloud, ensuring they are scalable, reliable, and secure. Partnering closely with engineering and ML teams, you’ll manage our CI/CD pipelines, observability stack, and the critical infrastructure powering our GPU workloads, enabling our teams to ship code faster and more reliably.

This role can be performed remotely from anywhere in Germany or Luxembourg, or in a hybrid setup from our offices in Luxembourg or Berlin.

This is what you’ll own

Design, deployment, and management of scalable and secure Kubernetes clusters on OVHcloud.
Ownership and advancement of our CI/CD pipelines for automated, reliable application and infrastructure deployments.
Implementation and management of our GitOps workflows using tools like ArgoCD or Flux.
Management and scaling of GPU workloads in Kubernetes, ensuring optimal performance and resource utilization for our ML teams.
Development and maintenance of our observability stack (VictoriaMetrics, VictoriaLogs, Grafana, Tracing) to ensure deep visibility into system health.
Management of our cloud infrastructure on OVHcloud, focusing on automation (Infrastructure as Code), cost optimization, and security.
Lifecycle management of core platform services, including message brokers (RabbitMQ), databases (PostgreSQL, Redis), and authentication systems (Okta, OIDC, OAuth2).
Acting as a key responder for infrastructure incidents; debugging and troubleshooting complex production issues across distributed systems.
Supporting and empowering development teams by providing robust self-service tools, clear documentation, and collaborative support.

You’ll thrive in this role if you bring

3-5+ years of professional experience in a Platform Engineering, DevOps, or SRE role.
Deep, hands-on experience with Kubernetes in a production environment (cluster management, networking, security, scheduling).
Proven experience managing infrastructure on a cloud provider (OVHcloud is a strong plus; AWS, GCP, or Azure experience is also valued).
Strong practical knowledge of CI/CD systems (e.g. GitHub Actions) and GitOps principles (ArgoCD, Flux).
Proficiency with Infrastructure as Code (IaC) tools like Terraform or Pulumi.
Solid understanding of observability principles and tools (e.g. VictoriaMetrics, VictoriaLogs, OpenTelemetry/Tracing, Grafana).
Experience managing stateful services in production (e.g. PostgreSQL, Redis, RabbitMQ).
Solid scripting skills in Python
A collaborative, user-centric mindset focused on enabling and empowering other engineers.
Strong debugging and problem-solving skills in distributed systems.

Nice-to-Haves

Experience managing GPU workloads in Kubernetes (e.g. NVIDIA GPU Operator).
Familiarity with MLOps frameworks and tools (e.g. MLflow, Argo Workflows).
Exposure to CI/CD practices tailored for ML systems.
Experience with real-time inference of LLMs (e.g. vLLM, LMCache, llm-d).
Deep knowledge of authentication and authorization protocols (OIDC/OAuth2 in combination with Okta).

Why you’ll love working with us

Full ownership of a mission-critical platform
A team that values curiosity, learning, and experimentation
Remote-first setup with the option to work in our Berlin office
Competitive salary depending on experience
Work on AI infrastructure that directly impacts healthcare innovation

Ready to apply?

If you\'re passionate about web development and want to work with cutting-edge technologies, we\'d love to hear from you!

I\'m Margarita and will be guiding you through the application process.

Obtenez votre examen gratuit et confidentiel de votre CV.

ou faites glisser et déposez un fichier PDF, DOC, DOCX, ODT ou PAGES jusqu’à 5 Mo.

Noté « Excellent » sur la base de 18 710 évaluations