Enable job alerts via email!

Remote Senior Site Reliability Engineer Manager (Remote)

Remotestar

London

Remote

GBP 80,000 - 100,000

Full time

30+ days ago

Boost your interview chances

Create a job specific, tailored resume for higher success rate.

Job description

Job description

RemoteStar is looking to hire a Senior Site Reliability Engineering Manager on behalf of our client based in the UK with a fully remote work policy.

About Client:

The client is building a B2B marketplace for diamonds. It’s an industry-leading B2B diamond and gemstones marketplace, connecting jewelry retailers to gemstone suppliers. They have a presence in London, Hong Kong, Amsterdam, Mumbai, and New York since 2001.

About the role:

As the SRE Manager, you will play a critical role in ensuring the reliability, scalability, and performance of our infrastructure and services through both direct technical contribution along with team building and management.

  • Take full ownership of the production estate from both a technical and process perspective.
  • Provide a consistent smooth operation of live systems and drive all on-call support issues.
  • Design and operate a new incident tracking process to ensure root causes are found and remediated in a timely fashion by the development team.
  • Create and maintain high-end monitoring and automation tooling. Drive automation initiatives to streamline operational workflows and improve efficiency.
  • Develop and maintain tools, scripts, and dashboards to monitor system health, performance, and reliability.
  • Build a first-class SRE team through a combination of leading by example, coaching, and mentoring. Provide leadership and guidance to the SRE team, fostering a culture of collaboration, innovation, and continuous improvement.

RESPONSIBILITIES:

  • Proven experience in a senior or lead SRE role, with a strong track record of building and maintaining highly reliable infrastructure and services.
  • Expertise in incident management, including incident response, resolution, and post-mortem analysis.
  • Proficiency in monitoring, alerting, and observability tools such as Prometheus, Grafana, ELK stack, or Datadog.
  • Experience with cloud platforms such as AWS, Azure, or GCP, including infrastructure as code tools like Terraform or CloudFormation.
  • Strong scripting and automation skills, with proficiency in languages such as Python, Bash, or Go.
  • Excellent communication and collaboration skills, with the ability to work effectively with cross-functional teams in a remote environment.
  • Demonstrated leadership capabilities, with a passion for mentoring and developing team members.

WHAT THEY OFFER:

  • Dynamic working environment in an extremely fast-growing company.
  • Work in an international environment.
  • Work in a pleasant environment with very little hierarchy.
  • Intellectually challenging role, playing a massive part in the client’s success and scalability.
  • Flexible working hours.
Get your free, confidential resume review.
or drag and drop a PDF, DOC, DOCX, ODT, or PAGES file up to 5MB.