Enable job alerts via email!

Staff Software Engineer / Tech Lead (Model Training Infrastructure)

Anyscale

San Francisco (CA)

On-site

USD 237,000 - 285,000

Full time

14 days ago

Boost your interview chances

Create a job specific, tailored resume for higher success rate.

Job summary

Join a forward-thinking company on a mission to democratize distributed computing with innovative solutions. As a Staff Software Engineer and Tech Lead, you will spearhead the Model Training Infrastructure team, enhancing scalable machine learning capabilities. Collaborate with global ML infrastructure teams, mentor fellow engineers, and drive impactful projects that shape the future of AI applications. This role offers a dynamic environment where your expertise in distributed systems and machine learning frameworks will be pivotal in transforming how developers scale their applications. If you're ready to make a significant impact, this opportunity is for you.

Benefits

Stock Options
Healthcare plans (99% premiums covered)
401k Retirement Plan
Education & Wellbeing Stipend
Paid Parental Leave
Fertility Benefits
Flexible Time Off
Commute reimbursement
100% of in-office meals covered

Qualifications

  • Experience in building and maintaining complex software systems.
  • Proven leadership in mentoring engineering teams.

Responsibilities

  • Lead the development of Ray’s distributed training libraries.
  • Drive technical direction and mentor engineers.

Skills

Machine Learning
Distributed Systems
Software Development
Communication Skills
Team Leadership

Education

Bachelor's Degree in Computer Science
Master's Degree in a related field

Tools

PyTorch
TensorFlow
XGBoost
AWS
GCP
Kubernetes

Job description

Staff Software Engineer / Tech Lead (Model Training Infrastructure)

Join to apply for the Staff Software Engineer / Tech Lead (Model Training Infrastructure) role at Anyscale

Staff Software Engineer / Tech Lead (Model Training Infrastructure)

2 weeks ago Be among the first 25 applicants

Join to apply for the Staff Software Engineer / Tech Lead (Model Training Infrastructure) role at Anyscale

About Anyscale

At Anyscale, we're on a mission to democratize distributed computing and make it accessible to software developers of all skill levels. We’re commercializing Ray, a popular open-source project that's creating an ecosystem of libraries for scalable machine learning. Companies like OpenAI, Uber, Spotify, Instacart, Cruise, and many more, have Ray in their tech stacks to accelerate the progress of AI applications out into the real world.

About Anyscale

At Anyscale, we're on a mission to democratize distributed computing and make it accessible to software developers of all skill levels. We’re commercializing Ray, a popular open-source project that's creating an ecosystem of libraries for scalable machine learning. Companies like OpenAI, Uber, Spotify, Instacart, Cruise, and many more, have Ray in their tech stacks to accelerate the progress of AI applications out into the real world.

With Anyscale, we’re building the best place to run Ray, so that any developer or data scientist can scale an ML application from their laptop to the cluster without needing to be a distributed systems expert.

Proud to be backed by Andreessen Horowitz, NEA, and Addition with $250+ million raised to date.

About The Role

Anyscale is looking for a staff software engineer to lead the Model Training Infrastructure team.

The Model Training Infrastructure team leads the development and optimization of Ray’s distributed training libraries, focusing on enabling large-scale ML workloads. The team owns and maintains widely adopted open source libraries like Ray Train for distributed model training and Ray Tune for distributed hyperparameter tuning.

As the technical leader for this team, you will be responsible for:

  • Thinking deeply about delightful, programmatic interfaces for machine learning engineers to scale model training
  • Build and rethink distributed training architectures to scale seamlessly from laptop to the cloud
  • Implement and innovate on distributed training algorithms like elastic training to improve model training performance
  • Working with and leading a robust open source community around the Ray project
  • Engage directly with ML infrastructure teams around the world to iterate and build the best training infrastructure.
  • Advocate and share your work broadly with the ML community through talks, tutorials, and blog posts

On the day-to-day basis, you will drive the technical direction of the team, mentor engineers, and deliver high-impact projects. You’ll shape the vision for what training infrastructure looks like for enterprises around the world and remain hands-on with the code and product development.

We’d love to hear from you if you have:

  • Multiple years of experience building, scaling, and maintaining complex software systems in production
  • Proven experience leading or mentoring engineering teams in a technical capacity
  • Expertise in machine learning frameworks (e.g., PyTorch, TensorFlow, XGBoost)
  • Hands-on experience with distributed systems and designing fault-tolerant infrastructure
  • Excellent communication and collaboration skills

Bonus Points If You Have

  • Experience with Ray
  • Experience with cloud technologies (e.g., AWS, GCP, Kubernetes)
  • Experience building and operating ML training platforms in production
  • Contributions to or maintenance of open-source libraries
  • Experience leading open-source or cross-functional teams

Compensation

  • At Anyscale, we take a market-based approach to compensation. We are data-driven, transparent, and consistent. The target salary for this role is $237,000 ~ $284,614. As the market data changes over time, the target salary for this role may be adjusted.
  • This role is also eligible to participate in Anyscale's Equity and Benefits offerings, including the following:
  • Stock Options
  • Healthcare plans, with premiums covered by Anyscale at 99%
  • 401k Retirement Plan
  • Education & Wellbeing Stipend
  • Paid Parental Leave
  • Fertility Benefits
  • Flexible Time Off
  • Commute reimbursement
  • 100% of in office meals covered

Anyscale Inc. is an Equal Opportunity Employer. Candidates are evaluated without regard to age, race, color, religion, sex, disability, national origin, sexual orientation, veteran status, or any other characteristic protected by federal or state law.

Anyscale Inc. is an E-Verify company and you may review the Notice of E-Verify Participation and the Right to Work posters in English and Spanish

Seniority level
  • Seniority level
    Mid-Senior level
Employment type
  • Employment type
    Full-time
Job function
  • Job function
    Engineering and Information Technology
  • Industries
    Software Development

Referrals increase your chances of interviewing at Anyscale by 2x

Sign in to set job alerts for “Software Engineer Technical Lead” roles.
Software Engineering Manager II, Front End, Google Cloud

San Francisco, CA $185,200 - $274,400 5 days ago

Senior Software Engineer, Infrastructure

San Francisco, CA
$100,000.00
-
$300,000.00
5 days ago

San Francisco, CA
$150,000.00
-
$210,000.00
5 months ago

Staff Software Engineer, Experimentation
Principal Software Development Engineer, Navigation
Software Engineer (Technical Leadership)
Senior Software Engineer - Infrastructure

San Francisco, CA
$200,000.00
-
$270,000.00
1 month ago

Oakland, CA
$150,000.00
-
$180,000.00
13 hours ago

Senior Staff/Principal Software Engineer

Foster City, CA
$280,000.00
-
$400,000.00
2 weeks ago

San Francisco, CA
$175,000.00
-
$230,000.00
10 months ago

Staff Embedded Software Engineer, Device Platform

San Francisco, CA
$210,000.00
-
$247,000.00
2 months ago

San Francisco, CA
$130,000.00
-
$260,000.00
1 day ago

Foster City, CA $180,000 - $265,000 1 month ago

Senior Principal Software Engineer - Regulated Industries

San Francisco, CA $215,100 - $318,600 5 days ago

Redwood City, CA $170,000 - $225,000 5 months ago

San Francisco, CA $150,000 - $200,000 2 months ago

Senior Software Engineer, Registry - US (Remote)
Staff Software Engineer, ML Serving Platform
Principal Software Engineer - Data & AI 666

San Francisco, CA $172,000 - $221,500 5 days ago

We’re unlocking community knowledge in a new way. Experts add insights directly into each article, started with the help of AI.

Get your free, confidential resume review.
or drag and drop a PDF, DOC, DOCX, ODT, or PAGES file up to 5MB.

Similar jobs

Software Engineer, Accounts

Vercel

San Francisco

Remote

USD 196,000 - 294,000

7 days ago
Be an early applicant

Senior Software Engineer, Online Data Stores (SF/NYC/Remote)

Persona

San Francisco

Remote

USD 180,000 - 240,000

7 days ago
Be an early applicant

Senior Software Engineer, Billing & Expansion Team - US (Remote)

W&B

San Francisco

Remote

USD 200,000 - 245,000

4 days ago
Be an early applicant

Principal Software Engineer New York, New York, United States, San Francisco, California, Unite[...]

Alchemy

San Francisco

Remote

USD 135,000 - 350,000

4 days ago
Be an early applicant

Golang System Software Engineer - Containers / Virtualisation

Canonical

San Francisco

Remote

USD 150,000 - 250,000

15 days ago

Fullstack Software Engineer (Remote)

Angle Health

San Francisco

Remote

USD 100,000 - 240,000

4 days ago
Be an early applicant

Senior Software Engineer, Billing & Expansion Team - US (Remote)

Weights & Biases

San Francisco

Remote

USD 177,000 - 245,000

6 days ago
Be an early applicant

Senior Software Engineer, Identity Team (Remote)

Weights & Biases

San Francisco

Remote

USD 177,000 - 245,000

6 days ago
Be an early applicant

Software Engineer, Fraud

Whatnot

New York

Remote

USD 180,000 - 245,000

6 days ago
Be an early applicant