Description The DevOps Lead is responsible for shaping the strategy, governance, and implementation of DevOps practices across the organization. This role blends hands-on technical leadership with operational excellence and strategic foresight, ensuring scalability, automation, and reliability throughout the software delivery lifecycle. Working with a small, agile team, the DevOps Lead collaborates closely with Development, Operations, and AI Engineering stakeholders to drive modernization, automation, and innovation across both cloud and on-premise environments.
Key Responsibillties
- Define and implement the DevOps strategy, ensuring alignment with business and technology goals.
- Promote a culture of automation, continuous improvement, and operational excellence.
- Define best practices and governance for CI/CD, cloud infrastructure, observability, and security.
- Coordinate with senior management and project stakeholders to ensure DevOps initiatives contribute to the broader digital transformation strategy.
Automation & Delivery
- Oversee the design and optimization of CI/CD pipelines for continuous integration, testing, and deployment across multiple products and platforms.
- Implement Infrastructure-as-Code for standardized, reproducible environments.
- Drive automation across configuration management, testing, and monitoring.
Cloud & Platform Engineering
- Supervise cloud operations ensuring reliability, scalability, and cost efficiency.
- Guide the transition toward cloud-native architectures and container orchestration (Kubernetes, Docker).
- Ensure the integration of AI/ML workloads (MLOps) within the company’s infrastructure, supporting model deployment and lifecycle management.
Reliability & Observability
- Define and maintain SLAs/SLOs, ensuring optimal system reliability and uptime.
- Implement monitoring, logging, and observability frameworks (Prometheus, Grafana, ELK) in collaboration with Operations.
- Lead incident management and root-cause analysis, ensuring post-mortem documentation and continuous improvement.
Security & Compliance
- Enforce DevSecOps principles, integrating security checks within CI/CD and IaC pipelines.
- Collaborate with IT Security to ensure compliance with industry standards (ISO 27001, SOCx, GDPR, etc.).
Requirements
Required Qualifications
- Proven experience (5+ years) in DevOps or Site Reliability Engineering roles, including senior responsibilities.
- Strong technical background in (or similar):
- Automation tools: Jenkins, GitLab CI/CD, Ansible, n8n, Airflow.
- Containers & Orchestration: Docker, Kubernetes, Nomad, Consul.
- Cloud platforms: AWS, Azure, GCP (at least one required).
- IaC: Terraform, CloudFormation.
- Monitoring & Logging: Prometheus, Grafana, ELK.
- Programming/Scripting: Python, Bash, Go, Java or Node.js.
- Solid understanding of networking, system design, and distributed architectures.
Soft Skills
- Strong team-building, and mentoring abilities.
- Excellent communication and collaboration across technical and non-technical teams.
- Analytical mindset and problem-solving skills in complex environments.
- Strategic thinking and decision-making capability.
- Proactive attitude and passion for innovation in cloud, automation, and AI.
Preferred Qualifications
- Experience in multi-cloud strategies or hybrid environments.
- Knowledge of AI/ML platform operations.
- Certifications such as:
- AWS Solutions Architect / DevOps Engineer
- Azure DevOps Expert
- Google Cloud Professional Engineer