Enable job alerts via email!
Boost your interview chances
Create a job specific, tailored resume for higher success rate.
Join a forward-thinking company as a Reliability Engineer on the Boundary Product team, where you'll enhance customer experiences through innovative cloud solutions. This role focuses on driving service reliability, developing tools for metric visibility, and collaborating across teams to improve software performance. You'll be empowered to troubleshoot issues, implement reliable design patterns, and participate in a 24/7 on-call rotation. If you have a passion for developer productivity and a desire to make a difference in a dynamic environment, this opportunity is perfect for you.
Introduction
A career in IBM Software means you'll be part of a team that transforms our customer's challenges into industry-leading solutions. We are an infinitely curious team, always seeking new possibilities, and dedicated to creating the world's leading AI-powered, cloud-native software solutions. Our renowned legacy creates endless global opportunities for our network of IBMers. We are a team of deep product experts, ensuring exceptional client experiences, with a focus on delivery, excellence, and obsession over customer outcomes. This position involves contributing to HashiCorp's offerings, now part of IBM, which empower organizations to automate and secure multi-cloud and hybrid environments. You will join a team managing the lifecycle of infrastructure and security, enhancing IBM's cloud solutions to ensure enterprises achieve efficiency, security, and scalability in their cloud journey.
Your role and responsibilities
HashiCorp Boundary aims to provide a seamless, just-in-time remote access experience for customers to their infrastructure and other web applications without having to worry about passwords, certificates or other credentials. Boundary is offered as a Cloud platform, and this role will be part of the Boundary Enterprise Enablement team whose primary focus will be scale and reliability to enable hypergrowth among medium and large enterprises.
What you’ll do (responsibilities)
As an engineer on the Boundary Product Reliability team,you will:
Develop a deep understanding on how customers use Boundary Cloud and enhance their experience through reliability
Drive service reliability by developing tooling that enables metric visibility using SLIs, SLOs, and SLAs
Champion incident management processes that directly impact customer experience
Reduce the operational overhead of HashiCorp Boundary product and leverage data to understand the largest source of reliability risk
Deploy, manage, monitor a large-scale Boundary Cloud
Predict our future failures and work proactively to mitigate them
Have a passion for developer productivity to make other engineers lives better
Empowering engineers to troubleshoot their own issues by developing tools, frameworks and guardrails for safety
Advocate and implement reliable design patterns (circuit breakers, graceful degradation, Zero-Downtime Upgrades etc.)
Partner with the broader HashiCorp organization to learn from incidents through a blameless postmortem process
Collaborate across teams to improve our tools based on experiences found from running our own software in production
Participate in a 24/7 on-call rotation that supports our production services
This job can be performed from anywhere in the US
Required technical and professional expertise
5+ years of handling production applications at scale: Backend applications written in Golang, Databases, Observability, and AWS Primitives
Strive for quality through maintainable code and comprehensive testing from development to deployment
Clear communication skills while remaining empathetic and kind
An eagerness to learn through humility and reflection
Experience debugging performance bottlenecks for live services and database systems
Led or participated in incidents through incident management tools like incident.io, PagerDuty, etc
Preferred technical and professional experience
Working knowledge of industry best practices related to information security
Working knowledge on AWS Aurora or postgres, Nomad or other orchestration platforms, Traefik or other load balancing technologies
Experience or willingness to conceive, document and advocate for best practices
IBM is committed to creating a diverse environment and is proud to be an equal-opportunity employer. All qualified applicants will receive consideration for employment without regard to race, color, religion, sex, gender, gender identity or expression, sexual orientation, national origin, caste, genetics, pregnancy, disability, neurodivergence, age, veteran status, or other characteristics. IBM is also committed to compliance with all fair employment practices regarding citizenship and immigration status.