Company Description
BETSOL is a cloud-first digital transformation and data management company offering products and solutions to both enterprises and consumers. BETSOL’s Data Management product lines include Rebit (Rebitgo.com) and Zmanda (Zmanda.com). BETSOL Global IT Services (BETSOL.com) builds and supports end-to-end enterprise solutions, reducing time-to-market for our customers. Our engineering team, with its several patents, delivers award-winning products and solutions in over 40 countries. Our work locations are set against the vibrant backdrops of Broomfield, Colorado and Bangalore, India. We offer comprehensive health insurance, competitive salary compensation, volunteer programs, scholarship opportunities, gym, cafe, recreational facilities and 401K. Our success is recognized with industry awards and a net promoter score that is 2x the industry average. We take pride in being an employee-centric organization. Learn more at betsol.com
Job Description
We are looking for a Site Reliability Engineer (SRE), you'll be part of a global virtual SRE team. Helping to build a meaningful engineering discipline, combining software and systems to develop creative engineering solutions to operations problems. You'll be a member of a team of curious problem solvers with a diverse set of perspectives who are thinking big and taking risks. In this environment, you'll support the team in delivery to relevant projects, supported by an organization that provides the support and mentorship you need to learn and grow. As an SRE Engineer, you'll be focused collaborating in a team focused on running better production applications and systems.
- Be responsible for Design, Code, Test and delivering software to automate manual operational efforts
- As an SREs troubleshoot priority incidents, facilitate blameless post-mortems and ensure permanent closure of incidents
- Engage with the development team throughout the life cycle to help develop software for reliability and scale, ensuring minimal refactoring or changes
- Identify application patterns and analytics in support of better service level objectives
- Advocate and lead design self-healing and resiliency patterns
- Lead the design and architecture of automated software and product upgrades, change management, and release management solutions
- Plan and execute disaster recovery drills.
- Evaluate Next-Gen Cloud Solutions and tools to meet solution and business requirements
- Coach, mentor, and support the CloudOps (NOC) on platform related incidents and pro-active improvement.
- Documenting alert specific playbooks and/or runbooks.
- Participate in the 24x7 support coverage as needed
Qualifications
- Bachelor's degree or equivalent experience in a software engineering discipline
- 6+ years of Software/IT industry experience and 4+ years of hands-on experience in the following areas:
- Fluent in one or more programming languages: C, Python, Go, Groovy, Java
- Familiarity with algorithms, data structures, and complexity analysis
- Experience with one or more cloud platforms (AWS, GCP, Azure)
- Experience with Terraform, or other cloud provisioning tool (TFC preferred).
- Working knowledge of cloud infrastructure components (e.g. routers, load balancers, Kubernetes, compute, storage, and networks)
- Experience working with Unix/Linux systems shell and beyond
- Excellent debugging and troubleshooting skills
- Experience in the following Avaya products will be an added advantage.
- Avaya Experience Platform
- Avaya Aura (System Manager, Session Manager, CM)
- Avaya Session Border Controller for Enterprise (ASBCE)