Enable job alerts via email!

Site Reliability Engineer

ZipRecruiter

Plano (TX)

Hybrid

USD 90,000 - 130,000

Full time

10 days ago

Boost your interview chances

Create a job specific, tailored resume for higher success rate.

Job summary

A leading technology company is seeking an IT Operations Engineer (SRE) for a hybrid role in Texas. The ideal candidate will have a robust background in Site Reliability Engineering, cloud platforms, and automation. You will be responsible for ensuring system reliability, developing automated solutions, and collaborating with various teams to enhance performance and cost efficiency. This position is ideal for a proactive individual with a passion for continuous improvement in IT systems.

Qualifications

  • 3+ years in Site Reliability Engineering, Systems Engineering, or similar role.
  • Strong experience with cloud platforms.
  • Familiarity with automation and continuous improvement.

Responsibilities

  • Ensure reliability and uptime of production systems through monitoring and incident response.
  • Develop and maintain automated solutions for configuration and deployment.
  • Collaborate with teams to design resilient and scalable systems.

Skills

Cloud platforms
Scripting
Site Reliability Engineering
Linux systems
Networking
Performance tuning
Monitoring and observability
Security best practices
Containerization

Education

Bachelor's degree in computer science, Engineering, or a related field

Tools

Azure
AWS
GCP
Terraform
Ansible
Docker
Kubernetes
Power Automate
PowerApps

Job description

Job Description

Title: IT Operations Engineer (SRE)
Job Type: Contract
Location: Hybrid – Daytona Beach, Florida OR Plano, TX

Job Summary
The ideal candidate has experience leading root cause analysis in an enterprise environment, with knowledge of various aspects of IT systems, including networking, infrastructure (on-prem, hybrid, cloud), endpoints, data, and modern workplace platforms. They should have managed endpoints on an enterprise level, including policy management, patching, vulnerability management, observability, and related strategies. Familiarity with Site Reliability Engineering best practices, automation, and continuous improvement is essential.

Qualifications

  • Bachelor's degree in computer science, Engineering, or a related field (or equivalent experience).
  • 3+ years in a Site Reliability Engineering, Systems Engineering, or similar role.
  • Strong experience with cloud platforms such as Azure, AWS, or GCP.
  • Proficient in scripting or programming languages such as Python, Go, Bash, or PowerShell.
  • Experience with Power Automate and PowerApps.
  • Experience with infrastructure as code tools such as Terraform or Ansible.
  • Strong understanding of Linux systems, networking, and performance tuning.
  • Experience with monitoring and observability tools such as Azure Monitor, Zabbix, Grafana, Datadog, Dynatrace, LogicMonitor, ControlUp, etc.
  • Familiarity with ITIL/ITSM processes and incident/change management systems.
  • Knowledge of security best practices such as least privilege access, secure configurations, and patching.
  • Experience supporting large-scale or distributed systems in production.
  • Knowledge of FinOps or cloud cost optimization.
  • Hands-on experience with containerization and orchestration tools such as Docker or Kubernetes.
  • Systems administration experience, including applying best practices, optimization, and vendor management.

Description and Responsibilities

  • Ensure reliability and uptime of production systems through monitoring, incident response, and capacity planning.
  • Develop and maintain automated solutions for configuration, deployment, monitoring, and alerting/self-healing.
  • Collaborate with application and infrastructure teams to design resilient and scalable systems.
  • Participate in on-call rotations, respond to incidents, and perform root cause analysis.
  • Define and track SLIs, SLOs, and SLAs, using data to inform operational decisions.
  • Continuously improve system performance, cost efficiency, and observability.
  • Work with developers to integrate reliability and security best practices into the software development lifecycle.
  • Document processes, runbooks, and architectural decisions.

Eligibility: All applications authorized to live and work in the United States on a permanent basis are welcome to apply. Residency in the US is required. Sponsorship is not available for this position.

Wright Technical Services and our client are Equal Opportunity Employers. We are committed to creating an inclusive environment for all employees. All qualified applicants will receive consideration without regard to race, color, religion, sex, national origin, age, disability, or veteran status.

Get your free, confidential resume review.
or drag and drop a PDF, DOC, DOCX, ODT, or PAGES file up to 5MB.

Similar jobs

Senior Site Reliability Engineer

General Dynamics Mission Systems

Aurora

Remote

USD 129,000 - 141,000

10 days ago

Site Reliability Engineer-FedRAMP (FULLY REMOTE)

AECOM

Oregon

Remote

USD 100,000 - 140,000

13 days ago

Site Reliability Engineer-FedRAMP (FULLY REMOTE)

AECOM

Chicago

Remote

USD 100,000 - 140,000

13 days ago

Site Reliability Engineer

Jobot

Birmingham

Remote

USD 100,000 - 150,000

Today
Be an early applicant

Site Reliability Engineer (AWS) - Remote

CentralSquare

Remote

USD 100,000 - 150,000

Today
Be an early applicant

Site Reliability Engineer (SRE) at Lucidya Remote

Itlearn360

Remote

USD 85,000 - 135,000

Yesterday
Be an early applicant

Senior SRE (Site Reliability Engineer) - Remote

SailPoint

Remote

USD 100,000 - 140,000

Today
Be an early applicant

Site Reliability Engineer

Monograph

Remote

USD 120,000 - 160,000

Yesterday
Be an early applicant

Remote - Senior Site Reliability Engineer (SRE)

Green Dot Corporation

Remote

USD 87,000 - 132,000

Yesterday
Be an early applicant