Enable job alerts via email!
Boost your interview chances
Create a job specific, tailored resume for higher success rate.
A leading provider of HPC and advanced technology solutions is seeking a Site Reliability Engineer. This role involves managing high-performance computing environments, ensuring system reliability, and collaborating with teams on innovations to improve user experience. Candidates should hold relevant degrees and have extensive experience in networking and HPC architectures.
This range is provided by asobbi. Your actual pay will be based on your skills and experience — talk with your recruiter to learn more.
$120,000.00/yr - $160,000.00/yr
Direct message the job poster from asobbi
Company Overview: Our client is a leading provider of HPC and advanced technology solutions, specialising in AI infrastructure. They offer customisable cloud solutions designed to support AI teams at every stage of their projects.
Job Purpose
As an HPC SRE, you will manage, optimise, and ensure the reliability of the high-performance computing environments. You will be the technical expert for the HPC infrastructure, covering system architecture, optimisation, integrations, and networking. Collaborating with cross-functional teams, you will drive innovations that align with business goals and enhance user experiences. This role demands 24/7 support to maintain high availability and performance of HPC systems.
Key Responsibilities
Infrastructure Management
Automation and Efficiency
Monitoring and Observability
Collaboration and Communication
Key Objectives and Goals
Reliability: Achieve and maintain high availability and uptime for HPC systems.
Performance: Continuously optimise the performance of Nvidia-based and other HPC systems.
Scalability: Develop scalable HPC solutions to support ongoing business growth.
Automation: Increase the level of automation to enhance efficiency and reduce manual tasks.
Continuous Availability: Ensure 24/7 support through effective coverage and on-call practices.
Collaboration: Foster a collaborative environment within the SRE teams and with other departments.
Continuous Improvement: Promote a culture of ongoing learning and improvement.
Required Qualifications
Desired Skills
Referrals increase your chances of interviewing at asobbi by 2x
Austin, TX $120,000.00-$160,000.00 3 months ago
Austin, TX $175,000.00-$200,000.00 1 month ago
Austin, TX $85,000.00-$95,000.00 5 days ago
Dallas, TX $80,000.00-$125,000.00 3 days ago
United States $130,000.00-$140,000.00 2 days ago
Austin, TX $83,200.00-$156,000.00 2 weeks ago
Dallas, TX $120,000.00-$180,000.00 5 hours ago
Austin, Texas Metropolitan Area 3 days ago
Houston, TX $120,000.00-$180,000.00 5 hours ago
Austin, TX $140,000.00-$157,000.00 1 day ago
We’re unlocking community knowledge in a new way. Experts add insights directly into each article, started with the help of AI.