Enable job alerts via email!
Boost your interview chances
A leading software company seeks a motivated Site Reliability Engineer III in Canada. In this role, you'll enhance the reliability of software systems, using cutting-edge technology to support hundreds of clients. Ideal candidates have a strong foundation in computer science and experience in automation and cloud services, providing an opportunity to greatly impact operational efficiency.
Site Reliability Engineer III page is loaded
Summary
POSITION OVERVIEWJob Description
Take a purist SRE approach to shared multi-tenant infrastructure for a resilient SaaS microservice-based containerized systems in addition to customer-centric application environments
Oversee and automate the team’s growing presence in AWS
Contribute to core infrastructure systems development with features, bug fixes, reliability improvements, etc
Platform reliability engineering of a complex single sign-on SAML/OAuth-based central authentication platform
Creatively build and develop tooling to aid in driving 24x7x365 follow-the-sun operations of critical production systems
Automate deployment tasks for core product and infrastructure tools and maintain automation infrastructure
Create system documentation and training materials to empower and educate our fellow team members
Build and maintain observability tooling, metrics, and dashboarding for a global platform product infrastructure
Improve our incident management lifecycle to identify, mitigate, and learn from reliability risks and issues
Enhance platform observability with helping create a self-healing approach to platform reliability
Collaborate with engineering teams, providing product feedback and where necessary contribute code to the product
Bachelor’s Degree in Computer Science or related field
Software engineering and task automation skills with Bash, Python, and/or Go are a must.
Solid understanding of agile software development methodologies (Scrum, Kanban, etc.)
Deep background with Linux systems and engineering
Highly experienced with engineering and automating on Amazon Web Services (AWS)
Experience supporting web applications running on Java / Apache / Tomcat in a live production environment
Prior experience with IaC tools like Terraform/Terragrunt/Terraspace
Prior experience with devops/gitops tools (Git, Bitbucket, Flux CD, Teamcity) for gate promotions
Production-At-Scale support background in a heavily microservice-based world
Hands-on engineering and ops expertise in containerization (Docker, Helm, Kubernetes/EKS, CNI and Ingress networking)
Strong understanding of Single-Sign On, SAML, OAuth (Bonus if hands-on experience with Okta)
Seasoned expertise around x.509 certificate technology and basic concepts of encryption
Experience working with Relational Databases such as Aurora Postgres and/or Oracle RDS
Advanced exposure to application development, web UI (design and development), JSON, application architecture
Experience strongly utilizing observability tools (logging/APM) like Datadog, CloudWatch, and PagerDuty.
Familiarity with event store/stream-processing technologies like Kafka or AWS SQS
Understanding of Open Application Model systems such as KubeVela or Crossplane
You greatly prefer writing code than clicking a GUI.
You enjoy teaching, being a mentor to others, and working across boundaries
Outstanding troubleshooting skills; ability to think critically and display an aptitude for problem solving
Strong analytical mind with a penchant for process development and enhancement
A highly positive can-do attitude with desire for being a team player
Great communication skills and ability to explain complex technical concepts to a varied audience
Demonstrate strong follow-through, a strong work ethic and consistently keep and meet commitments
Ability to champion a culture of reliability within the product team, promoting practices like blameless postmortems , SLO tracking, and continuous learning from incidents.
Ability to read, write, and speak English
We provide 24x7 support to our customers, so we expect you to take turns with your teammates being on-call for weekend production emergencies or to provide rotating weekend operational support
Travel – Expect occasional travel (less than 5%) to other Guidewire offices for training and team meetings
About Guidewire
Guidewire is the platform P&C insurers trust to engage, innovate, and grow efficiently. We combine digital, core, analytics, and AI to deliver our platform as a cloud service. More than 540+ insurers in 40 countries, from new ventures to the largest and most complex in the world, run on Guidewire.
As a partner to our customers, we continually evolve to enable their success. We are proud of our unparalleled implementation track record with 1600+ successful projects, supported by the largest R&D team and partner ecosystem in the industry. Our Marketplace provides hundreds of applications that accelerate integration, localization, and innovation.
For more information, please visit www.guidewire.com and follow us on Twitter: @Guidewire_PandC.
Guidewire Software, Inc. is proud to be an equal opportunity and affirmative action employer. We are committed to an inclusive workplace, and believe that a diversity of perspectives, abilities, and cultures is a key to our success. Qualified applicants will receive consideration without regard to race, color, ancestry, religion, sex, national origin, citizenship, marital status, age, sexual orientation, gender identity, gender expression, veteran status, or disability. All offers are contingent upon passing a criminal history and other background checks where it's applicable to the position.