Enable job alerts via email!

Site Reliability Engineer, AI/ML Platforms

Adobe

California

On-site

USD 133,000 - 242,000

Full time

30+ days ago

Boost your interview chances

Create a job specific, tailored resume for higher success rate.

Job summary

Join a dynamic team as a Site Reliability Engineer, where you will work on Adobe's cutting-edge AI Training and Inference Platforms. This role involves collaborating with engineering teams to enhance the reliability and scalability of AI capabilities. You will have the opportunity to define service level objectives and implement methodologies that ensure operational excellence. This innovative firm values creativity and curiosity, providing an exceptional work environment that fosters growth and collaboration. If you are passionate about technology and eager to make a significant impact, this is the perfect opportunity for you.

Qualifications

5+ years of relevant experience in Site Reliability Engineering or related fields.
Experience with building and scaling distributed systems and containerization.

Responsibilities

Implement solutions to enhance reliability, scalability, and efficiency.
Ensure high uptime and quality of service for Adobe's customers.

Skills

Site Reliability Engineering

Distributed Systems

Containerization

Kubernetes

Python

Ansible

Terraform

InfluxDB

Prometheus

Education

Bachelor's degree in Computer Science

Master's degree in Computer Science

Tools

Elastic Stack

AWS

CI/CD

Git

Pytorch

SageMaker

HuggingFace

NVIDIA TensorRT

OpenAI Triton

JOB LEVEL

P40

ADDITIONAL JOB LEVELS

P50

P55

EMPLOYEE ROLE

Individual Contributor

The Opportunity

We're looking for an outstanding Site Reliability Engineer for Adobe’s AI Training and Inference Platforms within Adobe Firefly. You will be part of a team of Site Reliability Engineers closely working with the Engineering teams on building, scaling, and securing the AI Platform. This enables the Firefly product teams to easily manage and deploy Machine Learning capabilities used by Adobe client applications.

The Applied Research groups from Adobe Research and other App Teams in Adobe will deploy thousands of models onto this platform in a variety of lifecycle stages (early research, development, productization, optimization, etc). This platform will offer ML model training and serving at scale, with high-cost efficiency, and on a wide variety of hardware platforms across multiple clouds.

What You'll Do

Identify and implement methodologies and solutions to increase reliability, scalability, security, and efficiency.
Ensure the highest uptime and Quality of Service (QoS) for Adobe’s customers through operational excellence.
Define service level objectives (SLOs) and indicators (SLIs) to represent and measure service quality.
Support and maintain globally distributed, multi-cloud (public and/or private) environments.
Automate common, repeatable tasks at a large scale to streamline operational procedures.
Identify areas to improve service resiliency through techniques such as chaos engineering, performance/load testing, etc.
Coordinate with other Adobe platform teams and service providers (primarily AWS) to innovate on Generative AI as a Service.

What You’ll Need to Succeed

A Bachelor's or Master's degree in Computer Science, Electrical Engineering, a related field, and 5+ years relevant industry experience.
You excel in undefined environments and get excited about finding pragmatic solutions to complex technical or organizational challenges.
You keep up with the industry trends and grow your knowledge and skills to solve technical problems.
Experience in building and scaling distributed systems, as well as experience with containerization and orchestration technologies like Kubernetes.
Production level expertise with containerization orchestration engines (e.g. Kubernetes) and proven understanding of modern, continuous development techniques and pipelines (IaC, CI/CD, ArgoCD, Git).
Fundamental programming skills, ideally practical experience in one (and preferably more) of the following languages: Python, Go.
Good knowledge of infrastructure configuration management tools like Ansible and Terraform.
Experience in using observability and tracing-related tools like InfluxDB, Prometheus, and Elastic Stack.
An understanding of AI/ML, including ML frameworks, public cloud, and commercial AI/ML solutions - familiarity with Pytorch, SageMaker, HuggingFace, NVIDIA TensorRT or OpenAI Triton a plus.

Application Window Notice

There is no deadline to apply to this job posting because Adobe accepts applications for this role on an ongoing basis. The posting will remain open based on hiring needs and position availability.

Our compensation reflects the cost of labor across several U.S. geographic markets, and we pay differently based on those defined markets. The U.S. pay range for this position is $133,900 -- $242,000 annually. Pay within this range varies by work location and may also depend on job-related knowledge, skills, and experience. Your recruiter can share more about the specific salary range for the job location during the hiring process.

Adobe will consider qualified applicants with arrest or conviction records for employment in accordance with state and local laws and 'fair chance' ordinances.

Internal Opportunities

Creativity, curiosity, and constant learning are celebrated aspects of your career growth journey. We’re glad that you’re pursuing a new opportunity at Adobe!

Put your best foot forward:

Update your Resume/CV and Workday profile – don’t forget to include your uniquely ‘Adobe’ experiences and volunteer work.
Visit the Internal Mobility page on Inside Adobe to learn more about the process and set up a job alert for roles you’re interested in.
Check out these tips to help you prep for interviews.
If you are applying for a role outside of your current country, ensure you review the International Resources for Relocating Employees on Inside Adobe, including the impacts to your Benefits, AIP, Equity & Payroll.

Once you apply for a role via Workday, the Talent Team will reach out to you within 2 weeks. If you move into the official interview process with the hiring team, make sure you inform your manager so they can champion your career growth.

At Adobe, you will be immersed in an exceptional work environment that is recognized around the world. You will also be surrounded by colleagues who are committed to helping each other grow through our unique Check-In approach where ongoing feedback flows freely. If you’re looking to make an impact, Adobe's the place for you.

Adobe is an equal opportunity and affirmative action employer. We welcome and encourage diversity in the workplace regardless of gender, race or color, ethnicity or national origin, age, disability, religion, sexual orientation, gender identity or expression, veteran status, or any other characteristics protected by law.

If you have a disability or special need that requires accommodation to navigate our internal careers site or to complete the application process, please contact accommodations@adobe.com.

Get your free, confidential resume review.

or drag and drop a PDF, DOC, DOCX, ODT, or PAGES file up to 5MB.

Similar jobs

[Hiring] Engineering Manager, AI Platform @Vanta

Vanta

Remote

USD 130,000 - 180,000

Today

Be an early applicant