Enable job alerts via email!

Senior Site Reliability Engineer (SRE)

Team Velocity Marketing

Virginia (MN)

Remote

USD 100,000 - 130,000

Full time

2 days ago
Be an early applicant

Boost your interview chances

Create a job specific, tailored resume for higher success rate.

Job summary

A leading marketing agency is seeking a Senior Site Reliability Engineer to enhance the reliability and efficiency of their applications. This full-time, remote position involves overseeing service-level objectives, deploying cloud resources, and fostering collaboration across teams, particularly within the automotive industry. Ideal candidates will have robust technical expertise, a strong background in automation, and a commitment to operational excellence.

Benefits

Medical, dental, and vision benefits
Unlimited paid leave
401(k) matching
Wellness programs

Qualifications

  • Minimum of 5 years experience in SRE, DevOps, or similar role.
  • Hands-on expertise with Kubernetes and cloud platforms.
  • Strong understanding of security best practices and compliance standards.

Responsibilities

  • Ensure system reliability and performance through monitoring and automation.
  • Collaborate with development teams for service readiness and performance.
  • Manage incident resolution and participate in on-call rotations.

Skills

Microsoft SQL Clusters
Elasticsearch
Kubernetes
Networking
Automation
Monitoring
Debugging

Education

Bachelor’s degree in Computer Science
Relevant work experience

Tools

GIT
TFS
Bitbucket
Bamboo

Job description

As a Senior Site Reliability Engineer (SRE), you will collaborate closely with our Development and IT teams to ensure the reliability, scalability, and performance of our applications. You will take ownership of setting and maintaining service-level objectives (SLOs), building robust monitoring and alerting, and continually improving our infrastructure and processes to maximize up time and deliver exceptional customer experience. This role operates at the intersection of development and operations, reinforcing best practices, automating solutions, and reducing toil across systems and platforms.

This is a full-time, salaried, remote position. Candidate must reside within the Continental U.S. Eastern or Central time zones highly preferred.

Responsibilities

  • Ensure Reliability & Performance: Own the observability of our systems, ensuring they meet established service-level objectives (SLOs) and maintain high availability.
  • Cloud & Container Orchestration: Deploy, configure, and manage resources on Google Cloud Platform (GCP) and Google Kubernetes Engine (GKE), focusing on secure and scalable infrastructures.
  • Infrastructure Automation & Tooling: Set up and maintain automated build and deployment pipelines; drive continuous improvements to reduce manual work and risks.
  • Monitoring & Alerting: Develop and refine comprehensive monitoring solutions (performance, uptime, error rates, etc.) to detect issues early and minimize downtime.
  • Incident Management & Troubleshooting: Participate in on-call rotations; manage incidents through resolution, investigate root causes, and create blameless postmortems to prevent recurrences.
  • Collaboration with Development: Partner with development teams to design and release services that are production-ready from day one, emphasizing reliability, scalability, and performance.
  • Security & Compliance: Integrate security best practices into system design and operations; maintain compliance with SOC 2 and other relevant standards.
  • Performance & Capacity Planning: Continuously assess system performance and capacity; propose and implement improvements to meet current and future demands.
  • Technical Evangelism: Contribute to cultivating a culture of reliability through training, documentation, and mentorship across the organization.

Requirements

  • Education & Experience:
    • Bachelor’s degree in Computer Science, Business Administration, or relevant work experience
    • A minimum of 5+ years in an SRE, DevOps, or similar role in an IT environment, required
  • System Administration Expertise:
    • Hands-on experience with Microsoft SQL Clusters, Elasticsearch, Kubernetes, required
    • Deep familiarity with Windows or Linux environments and .NET or PHP stack applications, including IIS/Apache, SQL Server/MySQL, etc.
    • Strong understanding of networking, firewalls, intrusion detection, and security best practices
  • CI/CD & Automation:
    • Proven administrative experience with tools like GIT, TFS, Bitbucket, and Bamboo for Continuous Integration, Delivery, and Deployment
    • Knowledge of automation testing tools such as SonarQube, Selenium, or comparable technologies
  • Observability & Monitoring:
    • Experience with performance profiling, logging, metrics collection, and alerting tools
    • Competence in debugging solutions across diverse environments
  • Hands-on experience with GCP, AWS, or Azure, container orchestration (Kubernetes), and microservices-based architectures
  • Security Acumen:
    • Understanding of authentication, authorization, OAUTH, SAML, encryption (public/private key, symmetric, asymmetric), token validation, and SSO
    • Familiarity with security strategies to optimize performance while maintaining compliance (e.g., SOC 2)
  • On-Call Support:
    • Willingness to participate in an on-call rotation and respond to system emergencies 24/7 when necessary
    • Monthly weekend rotation for Production Patching
  • Certifications (nice to have):
    • A+, MCP, Dell certifications
    • Microsoft Office expertise

Compensation
This is a full-time, salaried, remote position headquartered in Herndon, VA. Compensation is commensurate with experience. Benefits include medical, dental, vision, unlimited paid leave, 401(k) matching, wellness programs, and more.

Next Steps
If you meet these requirements and are interested in applying for this role, please complete the online application and include a current resume with contact information. Eastern and Central Time Zones highly preferred. No phone calls please.

About Team Velocity
Team Velocity is a full-service marketing agency serving the automotive industry, providing integrated marketing solutions to OEMs and dealerships nationwide. We leverage our proprietary Apollo technology platform to predict consumer behavior, personalize marketing campaigns, and help dealerships drive more sales and service revenue. Our team members are driven, creative, and collaborative, enjoying a unique culture where innovation and client success are paramount.

Join us in revolutionizing automotive marketing and technology through powerful, data-driven insights, continuous improvement, and an unwavering commitment to reliability.

Get your free, confidential resume review.
or drag and drop a PDF, DOC, DOCX, ODT, or PAGES file up to 5MB.

Similar jobs

Remote Senior Site Reliability Engineer (SRE) - Zetachain

Blockchain Works

San Francisco

Remote

USD 120,000 - 160,000

2 days ago
Be an early applicant

Senior Site Reliability Engineer ( Remote - US)

Jobgether

Remote

USD 120,000 - 160,000

6 days ago
Be an early applicant

Senior Site Reliability Engineer

Roadie

Remote

USD 120,000 - 160,000

3 days ago
Be an early applicant

Senior Site Reliability Engineer

MachineFi Lab

Remote

USD 100,000 - 150,000

3 days ago
Be an early applicant

Senior Site Reliability Engineer II

Instacart

Remote

USD 120,000 - 200,000

11 days ago

Senior Site Reliability Engineer

MongoDB

Remote

USD 127,000 - 249,000

10 days ago

Senior Site Reliability Engineer

General Dynamics Mission Systems

Aurora

Remote

USD 129,000 - 141,000

8 days ago

Mid to Senior Site Reliability Engineer (SRE) - AWS Cloud (Security Clearance Required)

ZipRecruiter

Great Falls Crossing

Remote

USD 120,000 - 160,000

8 days ago

Senior Site Reliability Engineer

Rocket Lab

Remote

USD 126,000 - 193,000

8 days ago