Enable job alerts via email!

Senior Machine Learning Infrastructure Engineer

Plus

Santa Clara (CA)

On-site

USD 160,000 - 200,000

Full time

30+ days ago

Boost your interview chances

Create a job specific, tailored resume for higher success rate.

Job summary

An innovative firm at the forefront of autonomous driving solutions is searching for talented engineers to join their dynamic team. This role focuses on developing high-performance systems for machine learning, enhancing platform usability, and mentoring junior engineers. With a commitment to safety and sustainability, this company provides an exciting opportunity to work on groundbreaking technology that is shaping the future of driving. If you are passionate about machine learning and distributed systems, this is the perfect chance to make a significant impact in a fast-paced environment.

Qualifications

  • 3+ years of software engineering experience focused on ML infrastructure.
  • Deep understanding of containerization and distributed ML workloads.
  • Proficiency in at least one deep learning framework.

Responsibilities

  • Design and develop scalable systems for training and deploying ML models.
  • Build and maintain efficient data pipelines and model versioning systems.
  • Mentor junior engineers and contribute to technical excellence.

Skills

Python
C++
SQL
Communication Skills
Distributed Systems
Machine Learning Infrastructure
Adaptability

Education

MS in Computer Science
MS in Electrical Engineering

Tools

Docker
Kubernetes
AWS
GCP
Kubeflow
mlflow
Apache Airflow
Ray

Job description

Plus is a global provider of highly automated driving and fully autonomous driving solutions with headquarters in Silicon Valley, California. Named by Forbes as one of America’s Best Startup Employers and Fast Company as one of the World’s Most Innovative Companies, Plus’s open autonomy technology platform is already powering vehicles in commercial use today. Working with one of the largest companies in the U.S., vehicle manufacturers, and others globally, Plus is helping to make driving safer, more comfortable, and more sustainable. Plus has received a number of industry awards and distinctions for its transformative technology and business momentum fromFast Company,Forbes,Insider,Consumer Electronics Show,AUVSI, and others. If you’re ready to make a huge impact and drive the future of autonomy, Plus is looking for talented individuals to join its fast-growing teams.


Responsibilities:
  • Design and develop scalable, high-performance systems for training, inference, deploying, and monitoring ML models at scale.
  • Build and maintain efficient data pipelines, model versioning systems, and experiment tracking frameworks.
  • Collaborate with cross-functional teams, including ML researchers and engineers, to identify bottlenecks and improve platform usability.
  • Implement distributed systems and storage solutions optimized for machine learning workloadsDrive improvements in CI/CD workflows for ML models and infrastructure.
  • Ensure high availability and reliability of the ML platform by implementing robust monitoring, logging, and alerting systems.
  • Stay current with industry trends and integrate relevant tools and frameworks to enhance the platform.
  • Mentor junior engineers and contribute to a culture of technical excellence
Required Skills:
  • MS in Computer Science, Electrical Engineering, or related field
  • Good oral and written communication skills
  • 3+ years of software engineering experience with a focus on ML infrastructure or distributed systems.
  • Proficiency in in Python, C++, SQL
  • Deep understanding of containerization, orchestration technologies, distributed ML workload, and experiment tracking tools (e.g., Docker, Kubernetes, multiprocessing, Kubeflow, and mlflow)
  • Deploy and manage resources across multiple cloud platforms (AWS, GCP, or on-prem environments)
  • Proficiency in at least one deep learning framework, such as PyTorch and data pipeline tools (e.g., Apache Airflow, Prefect).
  • Strong knowledge of distributed systems, databases, and storage solutions.
  • Extensive software design and development skills.
  • Ability to learn and adapt to new technologies and contribute in a productive environment.
Preferred Skills:
  • Familiarity with fundamental deep learning architectures, such as Convolutional Neural Networks (CNNs) and Transformer models
  • Experience in building large-scale ML datasets, MLOps pipelines, and distributed computing frameworks like Ray
  • Experience working with autonomous vehicles or robotics
Salary Range:
  • $160,000 - $200,000 a year

Our compensations (cash and equity) are determined based on the position, your location, qualifications, and experience.

Get your free, confidential resume review.
or drag and drop a PDF, DOC, DOCX, ODT, or PAGES file up to 5MB.

Similar jobs

Senior Machine Learning Infrastructure Engineer

PlusAI Inc

Santa Clara

On-site

USD 160,000 - 200,000

27 days ago

High Performance Computing and AI Infrastructure Engineer, Sr

Lockheed Martin

Fort Worth

Remote

USD 125,000 - 189,000

Yesterday
Be an early applicant

Lead Machine Learning Infrastructure Engineer - Infrastructure & Data

Upwork

Remote

USD 185,000 - 294,000

2 days ago
Be an early applicant

DevOps & Cloud Infrastructure Engineer

Soraban

San Francisco

Remote

USD 110,000 - 170,000

3 days ago
Be an early applicant

Software Engineer, Infrastructure

Figma

San Francisco

Remote

USD 149,000 - 350,000

4 days ago
Be an early applicant

Senior Linux Infrastructure Engineer (IaC)

The Voleon Group

Remote

USD 170,000 - 205,000

6 days ago
Be an early applicant

Senior Linux Infrastructure Engineer (HPC)

The Voleon Group

Remote

USD 170,000 - 205,000

4 days ago
Be an early applicant

Sr. Machine Learning Infrastructure Engineer, Optimus

Tesla, Inc.

Palo Alto

On-site

USD 116,000 - 360,000

30+ days ago

Senior Infrastructure Engineer

DataGrail

Remote

USD 190,000 - 210,000

6 days ago
Be an early applicant