Enable job alerts via email!

Site Reliability Engineer IOE: Cardano

Devopshunt

United Kingdom

Remote

GBP 100,000 - 125,000

Full time

30+ days ago

Boost your interview chances

Create a job specific, tailored resume for higher success rate.

Job summary

An innovative technology company is seeking a Site Reliability Engineer to enhance the reliability and performance of its production systems. This role is crucial in combining service operation, systems engineering, and software engineering to ensure high availability and scalability of services. The ideal candidate will design and deliver automation tools using languages like Python and Bash, while also collaborating with development teams to optimize customer experience. Join a forward-thinking firm that values creativity and positive change, and be part of a team that is shaping the future of blockchain technology.

Qualifications

  • Experience in service operation and systems engineering principles.
  • Proficiency in scripting languages and automation tools.

Responsibilities

  • Ensure reliability, availability, and performance of production systems.
  • Design and deliver tools to improve service efficiency and scalability.
  • Participate in on-call rotations to address service interruptions.

Skills

Python
Bash
Terraform
Nix
Systems Engineering
Software Engineering

Tools

Monitoring Tools
Infrastructure as Code

Job description

IOHK is a technology company focused on Blockchain research and development. We are renowned for our scientific approach to blockchain development, emphasizing peer-reviewed research and formal methods to ensure security, scalability, and sustainability. Our projects include decentralized finance (DeFi), governance, and identity management, aiming to advance the capabilities and adoption of blockchain technology globally.

We invest in the unknown, applying our curiosity and desire for positive change to everything we do. By fueling creativity, innovation, and progress within our teams, our products and services are designed for people to be fearless, to be changemakers.

What the role involves:

As a Site Reliability Engineer (SRE), you are an integral part of our open-source project, ensuring the reliability, availability, and performance of our production systems. This role combines service operation, systems engineering, and software engineering principles to operate and monitor services as well as create or maintain tools, automations, and infrastructure code that bolster the efficiency and resilience of our platform.

  1. Design, write, and deliver tools and software primarily using Python, Bash, Terraform, or Nix to improve the availability, scalability, and efficiency of our services.
  2. Engage in and refine the whole lifecycle of services, from inception and design, through deployment, operation, and continuous improvement.
  3. Practice sustainable incident response and promote blameless postmortems.
  4. Collaborate with the development teams to ensure that solutions are designed with customer experience, scalability, and performance in mind.
  5. Analyze system performance and reliability, offering recommendations for enhancement.
  6. Develop and uphold service-level objectives (SLOs), service-level indicators (SLIs), and error budgets for our services.
  7. Participate in on-call rotations, responding to and mitigating service interruptions and technical challenges.
Get your free, confidential resume review.
or drag and drop a PDF, DOC, DOCX, ODT, or PAGES file up to 5MB.