Enable job alerts via email!
Boost your interview chances
A leading company seeks a Senior Machine Learning DevOps Engineer to design and implement advanced ML and CI/CD pipelines. The role requires deep expertise in ML technologies and experience mentoring junior engineers. This telecommuting position allows candidates from across the United States to apply, offering an opportunity to impact operational health and performance in ML models significantly.
Senior Machine Learning DevOps Engineer, Cohere Health, Inc., Boston, Massachusetts (Remote)
Architect and design end to end ML pipelines, CI/CD pipelines, operational health, performance monitoring including model promotion, demotion and cost monitoring.Duties include:
● Implement automated ML pipelines, including data preprocessing, model training, and deployment including LLMs.
● Implement robust CI/CD pipelines for ML models to facilitate continuous integration and delivery.
● Implement monitoring systems to track model performance, data drift, and system health.
● Identify and resolve issues to maintain high availability and reliability of ML services.
● Implement new tools and technologies to enhance the ML infrastructure
● Implement data governance policies and practices to protect sensitive information.
● Mentor and guide junior ML engineers in MLOps best practices and engage with cross-functional teams to drive cohesive and integrated solutions.
Requirements:
Position requires either (i) a Master’s degree (or an equivalent foreign degree) in Computer Science, Business Analytics, Machine Learning or a closely related field and 3 years of experience as a Software Engineer working with Machine Learning technologies or (ii) a Bachelor’s degree (or an equivalent foreign degree) in Computer Science, Business Analytics, Machine Learning or a closely related field and 5 years of experience as a Software Engineer working with Machine Learning technologies.
Must also have 3 years of experience (which can have been gained concurrently with either primary experience requirement above) working with the following:
● ML model deployment, maintenance and the methods to evaluate models in product use, including working with Arize, Arize Phoenix, MLFlow, Docker, Kubernetes, AWS Lambda/ECS, log aggregation, CI/CD, Linux , and Terraform;
● Using Python, TensorFlow, PyTorch, scikit-learn, and ElasticSearch;
● Using AWS tools including AWS SageMaker, Athena, Fargate, Bedrock, CloudWatch, S3, ECS, EMR, EC2 and Lambda; and
● Access configuration, and setting up monitoring and alerts to ensure appropriate use of computational resources for models performance monitoring .
This is a telecommuting position working from home.May reside anywhere in the United States.
* Free services are subject to limitations