Enable job alerts via email!

Site Reliability Engineer

OneShot AI

City Of London

Hybrid

GBP 70,000 - 90,000

Full time

Today
Be an early applicant

Job summary

A leading technology company in the UK is seeking a Multi-Cloud Engineer to manage Kafka systems, optimize OpenSearch clusters, and manage AWS deployments. The role requires a Bachelor's degree in Computer Science, deep knowledge of cloud services, CI/CD automation, and strong analytical skills. This is an opportunity to work in a dynamic environment, driving innovation in cloud technologies.

Qualifications

  • Deep knowledge of Kafka, with experience in cluster setup and tuning.
  • Expertise in OpenSearch cluster management and query optimization.
  • Proficiency with AWS services, particularly RDS and EKS.
  • Experience managing multi-cloud environments with security best practices.
  • Strong background in Linux administration and performance tuning.
  • Proficiency in CI/CD tools and secure pipeline management.
  • Strong analytical and problem-solving skills for troubleshooting.

Responsibilities

  • Manage and scale Kafka, optimize Streams and Connect.
  • Configure and manage OpenSearch clusters and optimize queries.
  • Deploy and manage Kubernetes on AWS EKS.
  • Design multi-cloud infrastructure and implement disaster recovery.
  • Automate CI/CD pipelines and ensure compliance.

Skills

Kafka expertise
OpenSearch management
AWS RDS and EKS
Multi-cloud strategies
Linux administration
CI/CD automation
Analytical and problem-solving skills

Education

Bachelor's or higher in Computer Science or related field

Tools

Jenkins
GitLab CI/CD
AWS
Linux
Job description
Responsibilities
  • Kafka Expert: Manage and scale Kafka, optimize Streams and Connect, fine‑tune brokers/producers/consumers for performance.
  • OpenSearch Specialist: Configure and manage OpenSearch clusters, optimize indexing/queries, ensure HA (replication/sharding), and set up monitoring/alerting.
  • AWS Cloud Engineer: Manage AWS RDS (provisioning, config, scaling, backup/recovery). Deploy, manage, and scale Kubernetes on AWS EKS (networking, security, CI/CD integration).
  • Multi‑Cloud Architect: Design and manage multi‑cloud infrastructure, ensure seamless networking/security, implement disaster recovery, and optimize costs.
  • Linux Administrator: Optimize Linux performance, manage resources, automate with shell scripting, security hardening, and troubleshooting.
  • CI/CD Automation Lead: Design and manage pipelines (Jenkins, GitLab CI, CircleCI, ArgoCD). Automate deployments (blue‑green, canary, rolling), integrate with VCS, ensure security/compliance.
Requirements
  • Bachelor's, Master's, or Doctorate in Computer Science or a related field: This establishes a foundational understanding of computer systems, algorithms, data structures, and software development principles. It indicates a strong theoretical base for complex problem‑solving in a technical environment.
  • Deep knowledge of Kafka, with hands‑on experience in cluster setup, management, and performance tuning:
    • Kafka: A distributed streaming platform used for building real‑time data pipelines and streaming applications. It's crucial for handling high‑throughput, low‑latency data feeds.
    • Cluster setup, management, and performance tuning: This implies the ability to design, deploy, operate, and optimize Kafka environments to ensure data integrity, high availability, and efficient message processing at scale. It includes understanding topics, partitions, brokers, producers, consumers, and monitoring tools.
  • Expertise in OpenSearch cluster management, indexing, query optimization, and monitoring:
    • OpenSearch: An open‑source, distributed search and analytics suite, often used for log analytics, real‑time application monitoring, and website search.
    • Cluster management, indexing, query optimization, and monitoring: These points to skills in setting up and maintaining OpenSearch clusters, designing efficient data structures (indexing) for fast retrieval, writing optimized queries to get relevant results quickly, and continuously monitoring the cluster's health and performance.
  • Proficiency with AWS services, particularly RDS and EKS, including experience in database management, performance tuning, and Kubernetes deployment:
    • AWS (Amazon Web Services): The leading cloud provider.
    • RDS (Relational Database Service): A managed service that simplifies relational database setup, operation, and scaling. Proficiency here means managing various database engines (e. g., MySQL, PostgreSQL, Aurora), handling backups, replication, and optimizing database performance.
    • EKS (Elastic Kubernetes Service): A managed Kubernetes service on AWS. Expertise involves deploying, scaling, and managing containerized applications using Kubernetes, including understanding pods, deployments, services, and networking within EKS.
    • Database management, performance tuning, and Kubernetes deployment: These are core operational skills within the AWS ecosystem, ensuring applications and data run efficiently and reliably.
  • Experience in managing multi‑cloud environments, with a strong understanding of cloud networking, security, and cost optimization strategies:
    • Multi‑cloud: Using services from multiple cloud providers (e. g., AWS, Azure, GCP).
    • Cloud networking: Designing and implementing robust network connectivity across different clouds.
    • Security: Implementing consistent security policies, identity and access management, and data protection across diverse cloud platforms.
    • Cost optimization strategies: Managing and reducing expenditure across multiple cloud bills, leveraging reserved instances, savings plans, and rightsizing resources. This demonstrates a strategic and financial awareness beyond pure technical implementation.
  • Strong background in Linux administration, including system performance tuning, shell scripting, and security hardening:
    • Linux administration: Fundamental operating system knowledge for managing servers and applications.
    • System performance tuning: Optimizing OS parameters, resource allocation, and processes for maximum efficiency.
    • Shell scripting: Automating repetitive tasks and orchestrating complex workflows.
    • Security hardening: Implementing measures to secure Linux systems against unauthorized access and vulnerabilities.
  • Proficiency with CI/CD automation tools and best practices, with a focus on secure and compliant pipeline management:
    • CI/CD (Continuous Integration/Continuous Delivery/Deployment): Automating the software development and release process.
    • CI/CD automation tools: Experience with tools like Jenkins, GitLab CI/CD, GitHub Actions, Azure DevOps, CircleCI, etc.
    • Best practices: Understanding principles like automated testing, frequent integration, small changes, and repeatable deployments.
    • Secure and compliant pipeline management: Integrating security checks (e. g., static analysis, vulnerability scanning) into the pipeline "shift left" security and ensuring that deployments adhere to regulatory and organizational compliance requirements.
    • Strong analytical and problem‑solving skills, essential for troubleshooting complex technical challenges: This is a meta‑skill that underpins all the technical qualifications. It highlights the ability to diagnose issues, identify root causes, and devise effective solutions in distributed and complex systems.
Get your free, confidential resume review.
or drag and drop a PDF, DOC, DOCX, ODT, or PAGES file up to 5MB.