Join to apply for the Site Reliability Engineer role at TieTalent
Join to apply for the Site Reliability Engineer role at TieTalent
Get AI-powered advice on this job and more exclusive features.
- Experience in Cloud platforms (AWS, Azure, Google Cloud) and hybrid environments.
- Proficiency in container technologies (Docker, Container, Podman).
- Strong knowledge of Linux administration and networking concepts.
- Experience with Infrastructure as Code (IaC) tools like Terraform, Ansible, Helm, or Pulumi.
- Monitoring and logging expertise using Prometheus, Grafana, ELK, Datadog, or Splunk.
- Hands-on experience with CI/CD pipelines and DevOps tools (Jenkins, GitHub Actions, GitLab CI, ArgoCD).
- Proficiency in scripting/programming (Python, Bash, Go) for automation.
- Strong troubleshooting and incident management skills.
About
Site Reliability Engineer
Remote
Fulltime Opportunity
Job Description
Site Reliability Engineer
Must Have Technical/Functional Skills
- Experience in Cloud platforms (AWS, Azure, Google Cloud) and hybrid environments.
- Proficiency in container technologies (Docker, Container, Podman).
- Strong knowledge of Linux administration and networking concepts.
- Experience with Infrastructure as Code (IaC) tools like Terraform, Ansible, Helm, or Pulumi.
- Monitoring and logging expertise using Prometheus, Grafana, ELK, Datadog, or Splunk.
- Hands-on experience with CI/CD pipelines and DevOps tools (Jenkins, GitHub Actions, GitLab CI, ArgoCD).
- Proficiency in scripting/programming (Python, Bash, Go) for automation.
- Strong troubleshooting and incident management skills.
Roles & Responsibilities
We are seeking a highly skilled - Site Reliability Engineer (SRE) to manage, optimize, and ensure the reliability of infrastructure. The ideal candidate will have deep expertise in ELK, Dynatrace Pagerduty. Powershell, container orchestration, cloud infrastructure, and automation, along with a strong focus on reliability, scalability, and performance. Good to have Logic Monitor and Python knowledge
- Reliability & Performance: Implement best practices to ensure high availability, scalability, and performance of containerized applications.
- Monitoring & Incident Response: Set up monitoring (Prometheus, Grafana, ELK, Dynatrace, Pagerduty, Powershell etc.), troubleshoot issues, and lead incident resolution.
- Automation & Infrastructure as Code (IaC): Develop and maintain Terraform, Helm charts, and Kubernetes manifests for automation.
- CI/CD & DevOps Integration: Work with DevOps teams to optimize CI/CD pipelines for Kubernetes deployments (Jenkins, ArgoCD, FluxCD, etc.).
- Security & Compliance: Implement security best practices for containerized workloads, RBAC, network policies, and vulnerability scanning.
- Capacity Planning & Optimization: Analyze resource usage and optimize infrastructure costs and performance.
- Disaster Recovery & Backup: Implement backup and disaster recovery strategies for Kubernetes workloads.
Diverse Lynx LLC is an Equal Employment Opportunity employer. All qualified applicants will receive due consideration for employment without any discrimination. All applicants will be evaluated solely on the basis of their ability, competence and their proven capability to perform the functions outlined in the corresponding role. We promote and support a diverse workforce across all levels in the company.
Nice-to-have skills
- AWS
- Azure
- Docker
- Linux
- Terraform
- Ansible
- Prometheus
- Grafana
- Splunk
- Jenkins
- Gitlab CI
- Python
- Bash
- Go
- Dynatrace
- Powershell
- Kubernetes
- Los Angeles, California
Work experience
Languages
Seniority level
Seniority level
Mid-Senior level
Employment type
Job function
Job function
Engineering and Information TechnologyIndustries
Technology, Information and Internet
Referrals increase your chances of interviewing at TieTalent by 2x
Sign in to set job alerts for “Site Reliability Engineer” roles.
ML Software Engineer (L4/L5) - Media Algorithms
Rancho Dominguez, CA $95,000.00-$130,000.00 3 weeks ago
Santa Fe Springs, CA $66,560.00-$85,000.00 5 months ago
Principal Software Engineer (ML Focused) - League Studio, League Data Central
Site Reliability Engineer, Kubernetes Platform (Starshield)
Staff Software Engineer: GraphQL Platform
Senior Staff Software Engineer, Time Engineering
Customer Engineer, Startups, Google Cloud
Engineer, Software Quality - Senior or Lead
Site Reliability Engineer, Hardware and Infrastructure (Starshield)
Los Angeles, CA $150,000.00-$200,000.00 2 weeks ago
Los Angeles, CA $130,000.00-$190,000.00 2 weeks ago
Sr Software Engineer, Reliability Engineering
Los Angeles, CA $141,000.00-$202,000.00 3 weeks ago
Los Angeles, CA $134,309.00-$148,732.00 4 hours ago
Network Reliability Engineer (L5) Live Broadcast
We’re unlocking community knowledge in a new way. Experts add insights directly into each article, started with the help of AI.