Enable job alerts via email!

Site Reliability Engineer - Senior (CPE)

Ursus

San Diego (CA)

Remote

USD 150,000 - 200,000

Full time

8 days ago

Boost your interview chances

Create a job specific, tailored resume for higher success rate.

Job summary

A global leader in interactive and digital entertainment is seeking a Senior Site Reliability Engineer to focus on service delivery and collaborative systems integration. This role requires strong expertise in AWS, Kubernetes, and Infrastructure as Code, offering a dynamic opportunity in a cloud-based environment with dedicated support for operational excellence.

Qualifications

4+ years working in AWS and Kubernetes.
Experience supporting CI/CD delivery pipelines.
Strong ability to resolve complexities in distributed systems.

Responsibilities

Design, build, and maintain services for availability and reliability.
Support Windows AD service in the cloud and IAM requests.
Collaborate with engineering teams to troubleshoot and resolve incidents.

Skills

Fluent in Python

Experience with AWS

Knowledge of Linux/Unix systems

Skills in Infrastructure as Code

Education

BS in Computer Science or Software Engineering

Tools

Terraform Enterprise

CloudFormation

Jenkins

JOB TITLE: Site Reliability Engineer - Senior (CPE)
LOCATION: Remote (San Diego area preferred)
DURATION: 1 year
PAY RANGE: $61.00 - $71.72/hr

COMPANY:
Our client is a a global leader in interactive and digital entertainment.

This Site Reliability Engineer role will focus on delivering on all existing services provided by the teams and helping with the development of future services. It will require working with multiple global teams working through access and compatibility with various systems.

Responsibilities

Design Build, deploy and operate a combination of open-source, custom-written, and vendor provided software to provide services
Analyze, review, & fulfill Identity & Access Management (IAM) requests
Improve and Support and understand Security Groups flows, execution, and troubleshooting
Support a Windows AD service in the cloud
Support an ecosystem of remote systems for remote user access
Collaborate with multiple software & security engineering teams to integrate solutions and contribute to project deliveries
Provide rotational on-call support where you'll respond, detect, triage and resolve production incidents.
Collaborate and partner with Security teams that specialize in areas such as compliance, identity & access management (IAM), security groups, and policies.
Collaborate on projects deliveries on time and within budget
Developing automation pipelines to streamline development, testing, and deployment workflows within Infrastructure as Code (IaC) framework.
Collaborating with engineering teams to investigate and troubleshoot complex problems.
Improving system monitoring and analysis of various cloud provider services (AWS, GCP) to speed up error detection and remediation, enhancing performance and reliability.
Provide Tier 2 support for all engineering escalations from operational team (Platform Support)
Ability to design solutions and provide architectural and infrastructural requirements that promote uptime, IaC, speed and security at all phases of the software lifecycle on a global scale
Experience operating in regulated environments such as SOX/PCI
Results driven person with great energy

Key Qualifications

BS in Computer Science, Software Engineering, or equivalent experience
4 years professional experience operating complex system with at least 3 years at large scale
3+ years professional Site Reliability experience operating at scale in high pace environment
4+ years working in AWS
4+ years hands-on with Akamai DSA experience
4+ years hands-on with AWS, Kubernetes, GCP, Infrastructure as Code, administration experience
Experience with the following AWS Concepts: Compute Services, Serverless, Identity
Experience with GCP and Kubernetes environments
Experience with the following AWS systems: AMIs, KMS, IAM, Workspaces, S3, EBS, Security Groups, CloudWatch, CloudTrail, and EC2.
Experience with the following systems: Windows AD and Squid Proxy
Infrastructure as Code Tools: Terraform Enterprise, CloudFormation, SAM
Familiarity with the following systems: Wiz, Datadog, Terraform Enterprise, and observability tools such as Datadog and Splunk
Strong software development experience in: Python, and GitHub
Build, deploy, and operate services at a fluent level (Linux/Unix)
Hands-on experience in working with distributed systems and ‘ilities’ (availability, reliability, scalability, etc.) of the services
Extensive use of automation for Infrastructure as Code preferably via Terraform Enterprise
Should have experience with continuous integration, continuous delivery/deployment tools like Jenkins and ArgoCD
Strong development experience in one of these languages – Python or Go (Python preferred), JavaScript
Hands-on experience in working with distributed systems and ‘ilities’ (availability, reliability, scalability, etc.) of the services
Strong hands-on experience building and maintaining infrastructure for microservices
Design and provide operational and infrastructural requirements that promote uptime, speed and security at all phases of SDLC on a global scale

Required Foundational Skills

Fluency with running distributed services at scale with performance
Proven experience following software engineering best-practices
In-depth understanding of Unix/Linux systems internals and networking
Experience with automation and configuration management tools
Experience in AWS public cloud services and deployment
Experience deploying and supporting CI/CD delivery pipelines in a large enterprise environment
Knowledge of the software development lifecycle with experience integrating Open-Source tools
Strong ability to tackle sophisticated issues ranging from system resources to application stack traces
Strong hands-on experience building and maintaining infrastructure for microservices
Experience developing tools for system configuration, deployment, and monitoring
Strong belief in driving operational excellence with owning efficiency and automation at the core of operations
PASSIONATE, desire to automate and improve everything including process improvements, standardizing tools and technologies!
Methodical and systematic problem-solving approach
Complete ownership of end-to-end solutions and handling their life cycle
Execution oriented and results driven
Customer and peer relationship focused with strong interpersonal and communication skills
Ability to thrive in a fast-paced, collaborative, team environment
Ability to learn new skills/technologies quickly and independently

IND123

Get your free, confidential resume review.

or drag and drop a PDF, DOC, DOCX, ODT, or PAGES file up to 5MB.

Similar jobs