Enable job alerts via email!

Site Reliability Engineer

IBM

Jersey City (NJ)

Remote

USD 90,000 - 140,000

Full time

6 days ago
Be an early applicant

Boost your interview chances

Create a job specific, tailored resume for higher success rate.

Job summary

An established industry player is seeking a Site Reliability Engineer II to enhance their cloud infrastructure services. In this role, you'll collaborate with skilled engineers to ensure the reliability, scalability, and security of cloud products. You'll implement automation, improve operational efficiency, and gain hands-on experience in a dynamic environment. This position offers an opportunity to deepen your expertise in site reliability engineering while contributing to innovative solutions that empower organizations. Join a forward-thinking team dedicated to transforming challenges into industry-leading solutions and enjoy a flexible work environment that allows you to thrive.

Qualifications

  • Experience in site reliability engineering or cloud infrastructure management.
  • Familiarity with AWS and Terraform for infrastructure as code.

Responsibilities

  • Contribute to core infrastructure services ensuring reliability and security.
  • Implement automation to improve operational efficiency.

Skills

Site Reliability Engineering
Cloud Infrastructure Management
Systems Administration
Automation in Python
Automation in Go
Automation in Bash
Problem-Solving

Education

Bachelor's Degree

Tools

AWS
Terraform
Datadog
Prometheus
Grafana

Job description

Join us to apply for the Site Reliability Engineer role at IBM

Introduction

A career in IBM Software means you'll be part of a team that transforms our customer's challenges into industry-leading solutions. We are an infinitely curious team, always seeking new possibilities, and dedicated to creating the world's leading AI-powered, cloud-native software solutions. Our renowned legacy creates endless global opportunities for our network of IBMers. We are a team of deep product experts, ensuring exceptional client experiences, with a focus on delivery, excellence, and obsession over customer outcomes. This position involves contributing to HashiCorp's offerings, now part of IBM, which empower organizations to automate and secure multi-cloud and hybrid environments. You will join a team managing the lifecycle of infrastructure and security, enhancing IBM's cloud solutions to ensure enterprises achieve efficiency, security, and scalability in their cloud journey.

Your Role and Responsibilities

Our Team

The Infrastructure Services team builds and maintains the backbone of HashiCorp’s cloud products. We focus on creating reliable, scalable, and secure infrastructure services that enable engineering teams to transition quickly without breaking things. Instead of just keeping the lights on, we’re constantly improving automation, reducing toil, and making infrastructure more self-service and developer-friendly.

We work with Nomad, Consul, Vault, Terraform, and AWS services to power HashiCorp’s cloud offerings. Our mission is to provide infrastructure that’s easy to use, resilient, and secure by default so product teams can focus on delivering great experiences to customers.

About This Role

As a Site Reliability Engineer II on the Infrastructure Services team, you will help build, maintain, and improve the infrastructure that supports all HashiCorp cloud products. You will work alongside skilled engineers to ensure our systems are reliable, scalable, and secure while gaining hands-on experience in operating and automating cloud infrastructure. This role is ideal for an engineer looking to deepen their expertise in site reliability engineering, learn from senior engineers, and take on increasing responsibility over time.

In This Role, You Can Expect To
  • Contribute to the development and maintenance of core infrastructure services, ensuring reliability, scalability, and security
  • Implement automation to improve operational efficiency and reduce manual toil
  • Assist in monitoring, alerting, and logging improvements to enhance system observability
  • Debug and address medium-complexity infrastructure issues with guidance from senior engineers
  • Participate in on-call rotations after an initial onboarding period, learning incident response best practices
  • Work within established team practices, exercising self-directed judgment on tasks while seeking guidance when necessary
  • Propose and implement improvements to existing infrastructure components and deployment processes
  • Write and maintain documentation for infrastructure configurations, procedures, and troubleshooting guides
  • Collaborate with other teams to understand infrastructure needs and contribute to solutions
  • Shadow interviews for entry-level candidates and participate in discussions on hiring evaluations
  • This job can be performed from anywhere in the US
Preferred Education

Bachelor's Degree

Required Technical and Professional Expertise
  • Experience in site reliability engineering, cloud infrastructure management, or systems administration
  • Familiarity with cloud platforms such as AWS and infrastructure as code tools like Terraform
  • Experience with observability tools such as Datadog, Prometheus, or Grafana
  • Enjoy problem-solving and operational challenges
  • Comfort with scripting or automation in Python, Go, or Bash
Preferred Technical and Professional Experience
  • Effective communication and teamwork skills
  • Interest in growing into a senior SRE role and learning from experienced engineers
  • Growth mindset and continuous improvement focus
  • Knowledge of HashiCorp and IBM products is a plus
Additional Details
  • Seniority level: Mid-Senior level
  • Employment type: Full-time
  • Job function: Engineering and Information Technology
  • Industries: IT Services and IT Consulting
Get your free, confidential resume review.
or drag and drop a PDF, DOC, DOCX, ODT, or PAGES file up to 5MB.

Similar jobs

Site Reliability Engineer

IBM Computing

Jersey City

Remote

USD 90,000 - 150,000

5 days ago
Be an early applicant

Site Reliability Engineer

Jobot

New York

Remote

USD 100,000 - 150,000

2 days ago
Be an early applicant

Site Reliability Engineer

Tradeweb

Jersey City

Remote

USD 130,000 - 250,000

9 days ago

Software Engineering Site Reliability Engineer Professional JERSEY CITY, US

Avature

New Jersey

Remote

USD 111,000 - 191,000

6 days ago
Be an early applicant

Lead Site Reliability Engineer (Remote)

Livepeer

New York

Remote

USD 90,000 - 150,000

14 days ago

Site Reliability Engineer

Kforce Inc

Atlanta

Remote

USD 125,000 - 150,000

Yesterday
Be an early applicant

Site Reliability Engineer

Jobot

Roanoke

Remote

USD 100,000 - 150,000

Yesterday
Be an early applicant

[Hiring] Site Reliability Engineer @JatApp

JatApp

Remote

USD 80,000 - 120,000

Yesterday
Be an early applicant

[Hiring] Site Reliability Engineer @RebelMouse

RebelMouse

Remote

USD 80,000 - 120,000

2 days ago
Be an early applicant