Overview
At Sana Commerce we're committed to an inclusive environment and recognize that our diverse workforce is one of our greatest strengths.
It all started in 2007, with a pizza and a plan. Sana Commerce is an e-commerce platform designed to help manufacturers, distributors and wholesalers succeed by fostering lasting relationships with customers who depend on them.
We’re a fast-growing SaaS company that allows you to take ownership of your career.
What you'll get
- The opportunity to make an impact at a fast-growing SaaS scale-up;
- Up to 5weeks “work from anywhere” per year;
- A globaland customized onboarding program (9,1/10 rated by previous hires);
- A hybrid working model – 3days from the office, 2day from home;
- Weekly company lunch on us.
What you'll be doing
- Leading the SRE team, setting objectives, and guiding the team towards achieving high reliability while balancing cost and performance SLAs.
- Collaborating with platform & product engineering teams to embed reliability and operational best practices into the software development lifecycle.
- Developing and implementing SRE policies and practices, including service level objectives (SLOs), service level indicators (SLIs), and error budgets.
- Driving automation across operations to reduce toil, improve system performance, ensure scalability, with a reasonable amount of allergic response towards repetitive manual work.
- Overseeing incident management, post-mortem analyses, and root cause investigations to prevent future outages and enhance system reliability.
- Facilitating capacity planning and scalability exercises to manage growth and ensure the efficient use of resources.
- Facilitating disaster recovery plans & testing to ensure business continuity for our customers’ webstores.
- Encouraging a culture of continuous improvement by mentoring team members and fostering innovation within the team.
- Staying up to date with the latest trends and technologies in SRE and advocating for their adoption where appropriate.
What you'll bring
- Bachelor’s or Master’s degree in Computer Science, Engineering, or a related technical field.
- At least 5 years of experience in Site Reliability Engineering, with 2+ years in a leadership or management role.
- Proven expertise in cloud computing platforms (e.g., AWS, Azure, GCP) and experience with container orchestration (e.g., Kubernetes).
- A deep understanding of network protocols, load balancing, and high availability configurations.
- Experience in applying software development solutions to SRE and familiarity with programming languages such as (preferably) PowerShell and C# or else Python, Go, Java etc.
- Experience with automation tools, infrastructure as code (e.g., Terraform, Ansible).
- Proficiency in monitoring and logging tools (e.g., Prometheus, Grafana, ELK Stack) and in implementing comprehensive monitoring solutions. Dynatrace knowledge is a plus.
- Excellent problem-solving skills, with a proven ability to tackle complex issues under pressure.
- Outstanding leadership qualities, with a track record of mentoring and developing high-performing teams.
- Exceptional communication and collaboration skills, capable of working effectively with cross-functional teams.