Enable job alerts via email!

Site Reliability Engineer 4- RS1013439

Juniper Networks

Westford (MA)

Remote

USD 120,000 - 160,000

Full time

14 days ago

Boost your interview chances

Create a job specific, tailored resume for higher success rate.

Job summary

A leading company in networking technology is seeking a Site Reliability Engineer to join their innovative team. This full-time role involves maintaining and improving cloud production environments, ensuring high availability and performance. Candidates should have a strong background in cloud services, incident management, and DevOps practices, with opportunities to work on cutting-edge technology solutions.

Qualifications

Minimum 5 years of DevOps/SRE experience.
3 years’ experience with AWS and/or GCP.
Hands-on experience operating large-scale cloud-based distributed applications.

Responsibilities

Manage system availability and service levels of cloud infrastructure.
Participate in on-call rotation for issue resolution in a multi-cloud environment.
Own lifecycle of incidents, including reporting and analysis.

Skills

DevOps

Cloud Management

Incident Management

Monitoring

Scripting

Education

Bachelor’s degree in Computer Science or Computer Engineering

Tools

AWS

GCP

Kubernetes

Jenkins

Prometheus

CloudWatch

Linux

Site Reliability Engineer 4

Location: REMOTE / anywhere in the U.S

Juniper is seeking a full-time SRE to join our talented team and support high quality technology solutions that revolutionize wireless and wired networks, powered by Artificial Intelligence in the cloud. Juniper provides services through SaaS applications to several enterprises, including Fortune 100 and Fortune 500 customers. You will be responsible for maintaining and improving the company's production environment for rapid scaling and outstanding performance. You will keep stellar cloud uptime and reliability. Your primary responsibilities will be incident management and release management in cloud instances in various regions.

Juniper is changing what’s possible in networking. We’re going beyond building the networks customers expect — we’re building the networks customers deserve. And the world is taking note. But to continue to excel, we have work to do. Change in our industry is accelerating. To power connections and empower change, we need radical thinkers, eternal optimists, and energized personalities. We need people like you.

Success requires big thinking and high-reaching goals. Our culture breeds innovation. Here, you will have the opportunity to take chances and let your ideas grow. You will be supported by thoughtful, inclusive, and accessible leaders. You will have every chance to be a part of the conversation and seize our momentum. Your career will be better for it.

At Juniper, we strive to deliver network experiences that transform how people connect, work and live. We Power Connections, Empower Change, and we do that through our core values Being Bold, Building Trust and Delivering Excellence.

Do you want to solve complex problems and build systems that will change the Internet? Do you want to be part of a company that is on the cutting edge of technology? Do you want to work with a world-class team of engineers?

Juniper is seeking a full-time SRE to join our talented team and support high quality technology solutions that revolutionize wireless and wired networks, powered by Artificial Intelligence in the cloud. You will be responsible for maintaining and improving the company's production environment for rapid scaling and outstanding performance. You will keep stellar cloud uptime and reliability. Your primary responsibilities will be incident management and release management in cloud instances in various regions.

Responsibilities:

Manage system availability, health and service levels (SLAs, SLOs) of the large scale cloud infrastructure, running in AWS and GCP.
Proactively monitor, diagnose, analyze failures, and provide support for software engineers to debug production issues across microservices and distributed platforms.
Participate in on-call rotation and resolution of issues in a 24x7 multi-cloud (AWS/GCP) environment.
Monitor metrics and performance of applications and cloud infrastructure.
Manage code releases, i.e., push code and patches on cloud.
Own entire lifecycle of incidents (incident management), including reporting, analyzing, handling incidents, all the way up to its closure and writing RCAs.
Analyze scalability, reliability, high availability, performance, software maintainability, and operational challenges.
Write and maintain runbooks for knowledge-driven automated processes and bots.
Perform capacity planning based on performance, usage, and utilization stats.
Perform after-hours infrastructure updates and maintenance.
Follow SRE best practices and procedures.

Basic Qualifications

Bachelor’s degree in Computer Science or Computer Engineering or equivalent.
Minimum 5 years of devops/SRE experience.
3 years’ experience working with AWS and/or GCP.
Technical experience with EC2 (GCE), IAM, S3 (GS), Kubernetes, Jenkins, Prometheus, CloudWatch (Stack Driver), Linux, and Shell Scripting.
Basic understanding of Terraform or CloudFormation or any IaC code is preferred.
General understanding of distributed systems.
Understanding of data management technologies including relational and non-relational databases.
Hands-on experience operating large-scale cloud-based distributed applications.
The ability to "fix the plane while in flight".

Get your free, confidential resume review.

or drag and drop a PDF, DOC, DOCX, ODT, or PAGES file up to 5MB.

Similar jobs