Enable job alerts via email!

Senior Site Reliability Engineer

Stability AI

United States

Remote

USD 120,000 - 160,000

Full time

Today

Be an early applicant

Job summary

A leading cloud technology firm in the United States is looking for a Senior Site Reliability Engineer (SRE) to enhance cloud infrastructure. This role involves developing best practices, managing scalable systems, and driving incident management. Ideal candidates will have experience in AWS, Kubernetes, and scripting. The firm promotes a culture of innovation and mentorship.

Qualifications

Experience in architecting scalable systems in AWS.
Knowledge of cloud security measures.
Strong background in software development.

Responsibilities

Develop and enforce SRE best practices across the organization.
Manage scalable systems focusing on high availability.
Implement infrastructure as code using Terraform.
Refine monitoring, logging, and alerting systems.
Drive incident management and root cause analysis.

Skills

SRE best practices

AWS cloud environments

Infrastructure as code

Monitoring and logging systems

Incident management

CI/CD pipelines

Kubernetes

Automation scripting

Tools

Terraform

Grafana

ELK stack

Remote - United States

Job Description: Stability AI’s Engineering Operations team is looking for a Senior Site Reliability Engineer (SRE) to join our growing team and play a pivotal role in improving and shaping our cloud infrastructure. The person will closely work with engineering, IT, security, and product teams to drive innovation and reliability in an evolving environment. Candidates should have the initiative to build and improve a maturing cloud landscape.

Responsibilities

Developing and enforcing SRE best practices and standards across the organization.
Architecting and managing scalable systems in AWS and other cloud environments, focusing on high availability and resilience.
Implementing and maintaining infrastructure as code using Terraform.
Setting up and refining monitoring, logging, and alerting systems.
Driving incident management and root cause analysis to improve system reliability.
Championing SRE principles and mentoring junior team members.

Qualifications

Collaborating with development teams to enhance CI/CD pipelines.
Experience scaling resource intensive systems, be it storage, networking, or compute.
Knowledge and experience with Kubernetes or other container scaling solutions
Background in software development or automation scripting.
Knowledge and experience with Grafana, ELK stack, or similar tools.
Cloud security experience.

Equal Employment Opportunity:

We are an equal opportunity employer and do not discriminate on the basis of race, religion, national origin, gender, sexual orientation, age, veteran status, disability or other legally protected statuses.

Get your free, confidential resume review.

or drag and drop a PDF, DOC, DOCX, ODT, or PAGES file up to 5MB.

Senior Site Reliability Engineer

Stability AI

United States

Remote

USD 120,000 - 160,000

Full time

Job summary

Qualifications

Responsibilities

Skills

Tools

Company

Services

Free resources

Support

Senior Site Reliability Engineer

Stability AI

United States

Remote

USD 120,000 - 160,000

Full time

Job summary

Qualifications

Responsibilities

Skills

Tools

Follow us

Company

Services

Free resources

Support