Job description
RemoteStar is looking to hire a Senior Site Reliability Engineering Manager on behalf of our client based in the UK with a fully remote work policy.
About Client:
The client is building a B2B marketplace for diamonds. It’s an industry-leading B2B diamond and gemstones marketplace, connecting jewelry retailers to gemstone suppliers. They have a presence in London, Hong Kong, Amsterdam, Mumbai, and New York since 2001.
About the role:
As the SRE Manager, you will play a critical role in ensuring the reliability, scalability, and performance of our infrastructure and services through both direct technical contribution along with team building and management.
- Take full ownership of the production estate from both a technical and process perspective.
- Provide a consistent smooth operation of live systems and drive all on-call support issues.
- Design and operate a new incident tracking process to ensure root causes are found and remediated in a timely fashion by the development team.
- Create and maintain high-end monitoring and automation tooling. Drive automation initiatives to streamline operational workflows and improve efficiency.
- Develop and maintain tools, scripts, and dashboards to monitor system health, performance, and reliability.
- Build a first-class SRE team through a combination of leading by example, coaching, and mentoring. Provide leadership and guidance to the SRE team, fostering a culture of collaboration, innovation, and continuous improvement.
RESPONSIBILITIES:
- Proven experience in a senior or lead SRE role, with a strong track record of building and maintaining highly reliable infrastructure and services.
- Expertise in incident management, including incident response, resolution, and post-mortem analysis.
- Proficiency in monitoring, alerting, and observability tools such as Prometheus, Grafana, ELK stack, or Datadog.
- Experience with cloud platforms such as AWS, Azure, or GCP, including infrastructure as code tools like Terraform or CloudFormation.
- Strong scripting and automation skills, with proficiency in languages such as Python, Bash, or Go.
- Excellent communication and collaboration skills, with the ability to work effectively with cross-functional teams in a remote environment.
- Demonstrated leadership capabilities, with a passion for mentoring and developing team members.
WHAT THEY OFFER:
- Dynamic working environment in an extremely fast-growing company.
- Work in an international environment.
- Work in a pleasant environment with very little hierarchy.
- Intellectually challenging role, playing a massive part in the client’s success and scalability.
- Flexible working hours.