Enable job alerts via email!

Site Reliability Engineer - Senior (CPE)

TalentBurst

San Diego (CA)

Remote

USD 130,000 - 170,000

Full time

6 days ago

Be an early applicant

Boost your interview chances

Create a job specific, tailored resume for higher success rate.

Job summary

A leading company is seeking a Senior Site Reliability Engineer to work 100% remotely, focusing on enhancing current services and developing future solutions with extensive experience in cloud environments, particularly AWS and Akamai DSA. The role includes responsibilities like designing systems, collaborating with security teams, and providing on-call support, fostering a highly collaborative and innovative environment.

Qualifications

4+ years working in AWS and hands-on Akamai DSA experience.
3+ years professional Site Reliability experience operating at scale.
Strong hands-on experience building and maintaining infrastructure for microservices.

Responsibilities

Design, build, deploy, and operate a variety of services.
Collaborate with global teams on access and compatibility issues.
Provide rotational on-call support to respond to production incidents.

Skills

Akamai DSA

AWS

GCP

Kubernetes

Terraform Enterprise

Jenkins

Ansible

Network Load Balancer

Windows Active Directory

Datadog

Education

BS in Computer Science, Software Engineering, or equivalent experience

Tools

Terraform

CloudFormation

Splunk

W2 Acceptable
Site Reliability Engineer - Senior (CPE)
Location: San Diego, CA 92101 - 100% Remote
Duration: 12 Months+
14812500

Notes-
Backfill Position – 100% Remote
Must-have: Akamai DSA experience
Cloud: AWS, GCP, Kubernetes environments
IaC Tools: Terraform Enterprise preferred; CloudFormation/SAM acceptable
DevOps Stack: Jenkins, Ansible
Networking: Network Load Balancer, IAM familiarity
Systems: Wiz, Datadog, Splunk, Windows Active Directory
Collaboration: Work closely with Security teams (compliance, IAM, policies)

This Site Reliability Engineer role will focus on delivering on all existing services provided by the teams and helping with the development of future services. It will require working with multiple global teams working through access and compatibility with various systems.

Skills Required:

Akamai DSA is must
AWS
GCP and Kubernetes environments
Network Load Balancer
Infrastructure as Code Tools - Terraform Enterprise Prefer , Otherwise - (CloudFormation, SAM)
DevOps Framework - Jenkins, Ansible
Familiarity with the following systems: Wiz, Datadog, Terraform Enterprise, and observability tools such as Datadog and Splunk.
IAM Familiarity
Windows Active Directory
Collaborate and partner with Security teams that specialize in areas such as compliance, identity & access management (IAM), security groups, and policies.

Responsibilities
· Design Build, deploy and operate a combination of open-source, custom-written, and vendor provided software to provide services
· Analyze, review, & fulfill Identity & Access Management (IAM) requests
· Improve and Support and understand Security Groups flows, execution, and troubleshooting
· Support a Windows AD service in the cloud
· Support an ecosystem of remote systems for remote user access
· Collaborate with multiple software & security engineering teams to integrate solutions and contribute to project deliveries
· Provide rotational on-call support where you'll respond, detect, triage and resolve production incidents.
· Collaborate and partner with Security teams that specialize in areas such as compliance, identity & access management (IAM), security groups, and policies.
· Collaborate on projects deliveries on time and within budget
· Developing automation pipelines to streamline development, testing, and deployment workflows within Infrastructure as Code (laC) framework.
· Collaborating with engineering teams to investigate and troubleshoot complex problems.
· Improving system monitoring and analysis of various cloud provider services ( AWS, GCP) to speed up error detection and remediation, enhancing performance and reliability.
· Provide Tier 2 support for all engineering escalations from operational team (Platform Support)
· Ability to design solutions and provide architectural and infrastructural requirements that promote uptime, laC, speed and security at all phases of the software lifecycle on a global scale
· Experience operating in regulated environments such as SOX/PCI
· Results driven person with great energy

Key Qualifications
· BS in Computer Science, Software Engineering, or equivalent experience
· 4 years professional experience operating complex system with at least 3 years at large scale
· 3+ years professional Site Reliability experience operating at scale in high pace environment
· 4+ years working in AWS
· 4+ years hands-on with Akamai DSA experience
· 4+ years hands-on with AWS, Kubernetes, GCP,Infrastructure as Code, administration experience
· Experience with the following AWS Concepts: Compute Services, Serverless, Identity
· Experience with GCP and Kubernetes environments
· Experience with the following AWS systems: AMIs, KMS, IAM, Workspaces, S3, EBS, Security Groups, CloudWatch, CloudTrail, and EC2,
· Experience with the following systems: Windows AD and Squid Proxy
· Infrastructure as Code Tools: Terraform Enterprise, CloudFormation, SAM
· Familiarity with the following systems: Wiz, Datadog, Terraform Enterprise, and observability tools such as Datadog and Splunk
· Strong software development experience in: Python, and GitHub
· Build, deploy, and operate services at a fluent level (Linux/Unix)
· Hands on experience in working with distributed systems and 11Ries " (availability, reliability, scalability, etc.) of the services
· Extensive use of automation for Infrastructure as Code preferably via Terraform Enterprise
· Should have experience with continuous integration, continuous delivery/deployment tools like Jenkins and ArgoCD
· Strong development experience in one of these languages — Python or Go (Python preferred) , JavaScript
· Hands on experience in working with distributed systems and 11Ries " (availability, reliability, scalability, etc.) of the services
· Strong hands-on experience building and maintaining infrastructure for micro services
· Design and provide operational and infrastructural requirements that promote uptime, speed and security at all phases of SDLC on a global scale

Required Foundational Skills
· Fluency with running distributed services at scale with performance
· Proven experience following software engineering best-practices
· In depth understanding of Unix/Linux systems internals and networking
· Experience with automation and configuration management tools
· Experience in AWS public cloud services and deployment
· Experience deploying and supporting Cl/CD delivery pipelines in a large enterprise environment
· Knowledge of the software development lifecycle with experience integrating Open-Source tools
· Strong ability to tackle sophisticated issues ranging from system resources to application stack traces
· Strong hands-on experience building and maintaining infrastructure for micro services
· Experience developing tools for system configuration, deployment, and monitoring
· Strong belief in driving operational excellence with owning efficiency and automation at the core of operations
· PASSIONATE, desire to automate and improve everything including process improvements, standardizing tools and technologies!
· Methodical and systematic problem-solving approach
· Complete ownership of end-to-end solutions and handling their life cycle
· Execution oriented and results driven
· Customer and peer relationship focused with strong interpersonal and communication skills
· Ability to thrive in a fast-paced, collaborative, team environment
· Ability to learn new skills/technologies quickly and independently

Optional
Istio Service Mesh

#TB_EN

Get your free, confidential resume review.

or drag and drop a PDF, DOC, DOCX, ODT, or PAGES file up to 5MB.

Similar jobs