Enable job alerts via email!

Principal AI/ML Infra and Ops Engineering - Remote

California Jobs

San Francisco (CA)

Remote

USD 130,000 - 180,000

Full time

Today
Be an early applicant

Boost your interview chances

Create a job specific, tailored resume for higher success rate.

Job summary

A leading company is seeking a Principal AI/ML Infra and Ops Engineer for a remote role. The position involves implementing automation, systems monitoring, and providing support for infrastructure management. Ideal candidates will ensure high availability and compliance with security standards.

Responsibilities

  • Implement automation across the infrastructure lifecycle.
  • Develop and implement monitoring frameworks for infrastructure.
  • Provide SRE support to geographically distributed users.

Job description

Principal AI/ML Infra and Ops Engineering - Remote

Responsibilities:

  1. Automation & DevOps: Implement automation across the infrastructure lifecycle, leveraging Infrastructure as Code (IaC) and DevOps principles to streamline deployment and management processes.
  2. Systems Monitoring & Performance Tuning: Develop and implement monitoring frameworks for infrastructure, identify areas for performance improvement, optimize systems, and ensure high availability.
  3. Continuous Support: Provide SRE support to geographically distributed users on the UAIS platform, respond to tickets, triage support, and liaise with customers.
  4. Disaster Recovery & Business Continuity: Design, test, and implement disaster recovery and business continuity plans to ensure minimal downtime and data integrity.
  5. Security & Compliance: Collaborate with cybersecurity teams to ensure all systems and operations comply with industry standards and are secure against evolving threats.
  6. Capacity Planning & Cost Optimization: Forecast and manage capacity requirements for the AI/ML infrastructure.
Get your free, confidential resume review.
or drag and drop a PDF, DOC, DOCX, ODT, or PAGES file up to 5MB.