Enable job alerts via email!

Senior AI SRE

Madison-Davis, LLC

United States

Remote

USD 64,000 - 720,000

Full time

5 days ago
Be an early applicant

Boost your interview chances

Create a job specific, tailored resume for higher success rate.

Job summary

A leading company is seeking a Site Reliability Engineer to oversee the deployment and management of AI-driven tools. The role involves ensuring reliability, architecting scalable infrastructure, and collaborating with various teams. Candidates should have strong experience in site reliability, cloud services, and coding in Python, Java, or Go. This position offers a competitive salary and the opportunity to work in a dynamic environment focused on AI solutions.

Qualifications

  • Strong experience in site reliability or infrastructure engineering.
  • Direct experience deploying or supporting AI tools.
  • Deep expertise with cloud-native services in AWS and/or GCP.

Responsibilities

  • Oversee deployment and management of AI-driven productivity tools.
  • Architect scalable infrastructure for AI usage.
  • Drive deployment efforts across major public cloud platforms.

Skills

Python
Java
Go
Terraform
Ansible
Bash
Prometheus
Grafana
Datadog

Job description

  • Oversee deployment, configuration, and lifecycle management of internal AI-driven productivity tools and proprietary AI applications.
  • Ensure the reliability, uptime, and high performance of AI workloads and services. Drive observability practices with robust monitoring and alerting in place.
  • Architect and maintain scalable, resilient infrastructure to support AI usage across thousands of users. Plan and manage resource capacity to meet growth demands.
  • Build and maintain automation (IaC and CI/CD pipelines) to accelerate environment setup, monitoring, and support. Participate in sandbox testing environments for new use cases.
  • Partner closely with engineering, ML, infosec, and business operations teams to deploy and support AI solutions that drive internal productivity.
  • Apply best practices in data protection, access controls, and audit-readiness—especially in environments subject to regulatory oversight.
  • Be part of the on-call rotation and handle troubleshooting, root cause analysis, and response for AI-related outages or degradation.
  • Drive deployment efforts across major public cloud platforms (AWS/GCP), leveraging native services for compute, orchestration, and security.
  • Write, debug, and optimize code (Python, Java, or Go preferred) supporting integrations and back-end services for AI-based tooling.
  • Present technical insights, incident reports, and roadmap plans to both technical peers and non-technical leadership.
  • Strong experience in a site reliability or infrastructure engineering role supporting enterprise platforms
  • Direct experience deploying or supporting AI tools or intelligent automation platforms
  • Deep expertise with cloud-native services in AWS and/or GCP
  • Comfortable coding in Python, Java, or Go, especially in back-end systems or automation pipelines
  • Proficient with tools like Terraform, Ansible, Bash, and observability stacks (e.g., Prometheus, Grafana, Datadog)
  • Working knowledge of security and privacy frameworks, ideally within regulated industries (finance, healthcare, etc.)
  • Hands-on experience in incident response, playbook creation, and postmortem analysis
  • Confident communicating across business, technical, and leadership stakeholders
Seniority level
  • Seniority level
    Mid-Senior level
Employment type
  • Employment type
    Contract
Job function
  • Job function
    Information Technology
  • Industries
    Staffing and Recruiting

Referrals increase your chances of interviewing at Madison-Davis, LLC by 2x

Sign in to set job alerts for “Site Reliability Engineer” roles.
CDN Site Reliability Engineer L4/L5 - Live Streaming, Open Connect CDN
Site Reliability Engineer (SRE) - Platform Infrastructure team (100% Remote - USA)

United States $147,000.00-$208,000.00 1 week ago

Site Reliability Engineer (SRE) - Platform Infrastructure team (100% Remote - USA)

United States $100,000.00-$720,000.00 1 day ago

Site Reliability Engineer (SRE) - Platform Infrastructure team (100% Remote - USA)

United States $170,000.00-$720,000.00 1 week ago

Site Reliability Engineer (SRE) - Platform Infrastructure team (100% Remote - USA)
Site Reliability Engineer - 100 % Remote

United States $64,000.00-$112,000.00 2 weeks ago

Site Reliability Engineer (SRE) - Platform Infrastructure team (100% Remote - USA)

United States $140,000.00-$180,000.00 3 weeks ago

United States $150,000.00-$200,000.00 1 week ago

United States $170,000.00-$210,000.00 5 days ago

Site Reliability Engineer (SRE) - Platform Infrastructure team (100% Remote - USA)
Site Reliability Engineer - Analytics and Visualization Platform
Site Reliability Engineer (SRE, Remote US)

Austin, TX $120,000.00-$160,000.00 2 months ago

We’re unlocking community knowledge in a new way. Experts add insights directly into each article, started with the help of AI.

Get your free, confidential resume review.
or drag and drop a PDF, DOC, DOCX, ODT, or PAGES file up to 5MB.