Enable job alerts via email!

Principal/Lead/Senior Software Engineer - ML Infrastructure

Salesforce, Inc..

Bellevue (WA)

On-site

USD 120,000 - 180,000

Full time

12 days ago

Boost your interview chances

Create a job specific, tailored resume for higher success rate.

Job summary

An innovative firm is seeking a seasoned ML Engineer to design and deliver scalable generative AI services. This role involves collaborating with product managers and data scientists to develop cutting-edge AI solutions that enhance customer experiences. You will drive efficiencies through automation and tackle complex challenges in a fast-paced environment. If you have a passion for AI and a strong background in machine learning engineering, this opportunity allows you to make a significant impact in a dynamic team focused on transforming the way AI is integrated into applications.

Qualifications

8+ years of experience in ML engineering and building AI systems.
Strong competencies in algorithms, data structures, and software design.

Responsibilities

Design scalable generative AI services for production integration.
Drive system efficiencies through automation and performance tuning.

Skills

Machine Learning Engineering

Distributed Services

JVM-based Languages (Java, Scala)

Python

Microservice Architecture

Cloud Platforms (AWS, GCP)

Containerization (Kubernetes, Spinnaker)

Data Structures and Algorithms

Education

Bachelor's in Computer Science

Master's in Software Engineering

Tools

Kubernetes

Docker

Kafka

Spark

Hadoop

Sagemaker

TensorFlow

PyTorch

Einstein products & platform democratizes AI and transforms the way our Salesforce Ohana builds trusted machine learning and AI products - in days instead of months. It augments the Salesforce Platform with the ability to easily create, deploy, and manage Generative AI and
Predictive AI applications across all clouds. We achieve this vision by providing unified, configuration-driven, and fully orchestrated machine learning APIs, customer-facing declarative interfaces and various microservices for the entire machine learning lifecycle including Data, Training, Predictions/scoring, Orchestration, Model Management, Model Storage, Experimentation etc.

We are already producing over a billion predictions per day, training 1000s of models per day along with 10s of different Large Language models, serving thousands of customers. We are enabling customers' usage of leading large language models (LLMs), both internally and externally developed, so they can leverage it in their Salesforce use cases. Along with the power of Data Cloud, this platform provides customers an unparalleled advantage for quickly integrating AI in their applications and processes.

What you’ll do:

Design and deliver scalable generative AI services that can be integrated with many applications, thousands of tenants, and run at scale in production.
Drive system efficiencies through automation, including capacity planning, configuration management, performance tuning, monitoring and root cause analysis.
Participate in periodic on-call rotations and be available for critical issues.
Partner with Product Managers, Application Architects, Data Scientists, and Deep Learning Researchers to understand customer requirements, design prototypes, and bring innovative technologies to production
Participate in meal conversations with your team members about really important topics, such as: Should the cuteness of panda bears be a factor in their survivability? Is love a decision tree or a regression model? How far ahead would society be today if we had 12 fingers instead of 10?

Required Skills:

8+ years of industry experience of ML engineering in building AI systems and/or distributed services.
Bachelors (or) Masters degree in Computer Science, Software Engineering, or related STEM field with strong competencies in algorithms, data structures and software design.
Experience building distributed microservice architecture on AWS, GCP or other public cloud substrates
Experience using modern containerized deployment stack using Kubernetes, Spinnaker, and other technologies
Proven ability to implement, operate, and deliver results via innovation at large scale
Strong programming expertise in JVM-based languages (Java, Scala) and Python.
Experience with distributed, scalable systems and modern data storage, messaging and processing frameworks, including Kafka, Spark, Docker, Hadoop, etc.
Grit, drive and a strong feeling of ownership coupled with collaboration and leadership.

Preferred Skills:

Understanding of MLOps/ML Infra workflows, processes and ML components
Strong experience building and applying machine learning models for business applications
Working or academic knowledge with Sagemaker, Tensorflow, Pytorch, Triton, Spark, or equivalent large-scale distributed Machine Learning technologies
Fantastic problem solver; ability to solve problems that the world has not solved before
Excellent written and spoken communication skills
Demonstrated track record of cultivating strong working relationships and driving collaboration across multiple technical and business teams

#LI-Y

Get your free, confidential resume review.

or drag and drop a PDF, DOC, DOCX, ODT, or PAGES file up to 5MB.

Similar jobs