Enable job alerts via email!

Développeur senior plateforme Cloud – Expert DevOps

SkySys

Ottawa

On-site

CAD 90,000 - 120,000

Full time

Today
Be an early applicant

Job summary

Une entreprise de technologie recherche un Développeur senior plateforme Cloud expert en DevOps pour gérer des infrastructures Kafka et Spark. Le candidat idéal aura au moins 5 ans d'expérience avec Apache Kafka en production, ainsi qu'une solide expertise en administration Linux et Kubernetes. L'environnement de travail est 100 % anglophone, avec une préférence pour les bilingues. Les compétences en scripting et en automatisation sont essentielles.

Qualifications

  • 5+ ans d'expérience avec Apache Kafka en production.
  • Excellente expertise en administration Linux, Red Hat et Debian.
  • Compétences solides en Kubernetes et outils d'automatisation.

Responsibilities

  • Concevoir, déployer et gérer des clusters Apache Kafka.
  • Optimiser la performance et la fiabilité de Kafka.
  • Collaborer avec des équipes multifonctionnelles sur des cas d'utilisation de Kafka.

Skills

Administration d'Apache Kafka
Administration système Linux
Kubernetes
Scripting (Bash, Python)
Kafka sécurité
CI/CD
Apache Spark

Tools

Prometheus
Grafana
Ansible
Terraform
Job description
Overview

Développeur senior plateforme Cloud – Expert DevOps

Work model: 4 days on site in Ottawa

100% English-speaking environment – Fluency level 5 / 5 – if bilingual, a plus

Candidate must be eligible to Controlled Goods Program (CGP)

Week work : 37.50 hours

Context: At first, the candidate will have to put in place the environment, the production, etc.) and after, must have double competencies and be able to do development too.

Requirements:

  • 5+ years of experience administering and supporting Apache Kafka in production environments.
  • Strong expertise in Linux system administration (Red Hat and Debian).
  • Solid experience with Kubernetes (CNCF distributions, OpenShift, Rancher, or upstream K8s).
  • Proficiency in scripting (Bash, Python) and automation tools (Ansible, Terraform).
  • Experience with Kafka security, monitoring (Prometheus, Grafana, Istio), and schema management.
  • Familiarity with CI / CD pipelines and DevOps practices.
  • Proficient in scripting and automation (Bash, Python, or Ansible).
  • Comfortable with Helm, YAML, Kustomize, and GitOps, GitLab principles.
  • 4+ years of experience in Apache Spark development, including building scalable data pipelines and optimizing distributed processing.

Responsibilities:

  • Design, deploy, and manage Apache Kafka clusters in development / testing / production environments.
  • Proven experience deploying and managing Apache Spark and Apache Flink in production environments.
  • Optimize Kafka performance, reliability, and scalability for high-throughput data pipelines.
  • Ensure seamless integration of Kafka with other systems and services.
  • Manage and troubleshoot Linux-based systems (Ubuntu) supporting Kafka infrastructure.
  • Manage, fine-tune, deploy and operate Kafka on Kubernetes clusters, using Helm, Operators, or custom manifests Kafka
  • Collaborate with cross-functional teams to identify and implement Kafka use cases.
  • Contribute to automation and Infrastructure as Code (IaC) practices through CI / CD pipeline with gitlab
  • Monitor system health, implement alerting, and ensure high availability.
  • Participate in incident response and root cause analysis for Kafka and related systems.
  • Evaluate and recommend Kafka ecosystem tools like Kafka Connect, Schema Registry, MirrorMaker, and Kafka Streams.
  • Build automation and observability tools for Kafka using Prometheus, Grafana, Fluent Bit, etc.
  • Deep understanding of streaming and batch processing architectures.
  • Familiarity with Spark Structured Streaming and Flink DataStream API.
  • Work with teams to build end-to-end Kafka-based pipelines for various applications (data integration, event-driven microservices, logging, monitoring).
  • Experience running Spark and Flink on Kubernetes, YARN, or standalone clusters.
  • Proficiency in configuring resource allocation, job scheduling, and cluster scaling.
  • Knowledge of checkpointing, state management, and fault tolerance mechanisms.
  • Ability to tune Spark and Flink jobs for low latency, high throughput, and resource efficiency.
  • Experience with memory management, shuffle tuning, and parallelism settings.
  • Familiarity with Spark UI, Flink Dashboard, and integration with Prometheus / Grafana.
  • Ability to implement metrics collection, log aggregation, and alerting for job health and performance.
  • Understanding of TLS encryption, Kerberos, and RBAC in distributed environments.
  • Experience integrating with OAuth, or other identity providers.
  • Familiarity with time series databases
Get your free, confidential resume review.
or drag and drop a PDF, DOC, DOCX, ODT, or PAGES file up to 5MB.