Enable job alerts via email!

Express of Interest: MLOps Engineer – AWS-Focused ML Infrastructure

Keysight Technologies Malaysia Sdn. Bhd.

Penang

On-site

MYR 75,000 - 95,000

Full time

Today

Be an early applicant

Generate a tailored resume in minutes

Land an interview and earn more. Learn more

Job summary

A leading tech company in Penang is looking for an MLOps Engineer focusing on AWS infrastructure. This role involves supporting the operationalization of machine learning solutions and ensuring efficient ML workflows with tools like SageMaker and Terraform. Candidates should have 3-5 years of experience in MLOps and a strong background in AWS services. This position provides an exciting opportunity to work in a collaborative environment, leveraging cutting-edge technologies to drive innovation and improve customer outcomes.

Benefits

Opportunity for growth

Innovative work environment

Competitive salary

Qualifications

3–5 years of experience in MLOps, DevOps, or cloud engineering roles.
Deep expertise in AWS services for ML and data workflows.
Strong scripting skills in Python with libraries like Boto3.

Responsibilities

Design, implement, and maintain end-to-end MLOps pipelines on AWS.
Collaborate closely with ML engineers to automate pipelines.
Manage infrastructure as code (IaC) using Terraform or CloudFormation.

Skills

AWS services expertise

MLOps knowledge

Python programming

CI/CD pipeline experience

IaC tools proficiency

Agile methodologies

Education

Bachelor’s or Master’s degree in relevant field

Tools

AWS SageMaker

Terraform

CloudFormation

Docker

GitHub Actions

Express of Interest: MLOps Engineer – AWS‑Focused ML Infrastructure

Keysight Technologies Malaysia Sdn. Bhd.

Overview

Keysight is at the forefront of technology innovation, delivering breakthroughs and trusted insights in electronic design, simulation, prototyping, test, manufacturing, and optimization. Our ~15,000 employees create world‑class solutions in communications, 5G, automotive, energy, quantum, aerospace, defense, and semiconductor markets for customers in over 100 countries.

Our culture embraces a bold vision of where technology can take us and a passion for tackling challenging problems with industry‑first solutions. We believe that when people feel a sense of belonging, they can be more creative, innovative, and thrive at all points in their careers.

Responsibilities

We are expanding our engineering team with a dedicated MLOps Engineer specializing in AWS to support the deployment, scaling, and operationalization of machine learning solutions across our manufacturing and semiconductor analytics platforms. This role will serve as a critical bridge between our Machine Learning Engineers—focused on Generative AI and classical ML—and production environments, ensuring seamless, reliable, and efficient ML workflows.

You will collaborate closely with the Senior Machine Learning Engineer (GenAI Platform) and the Machine Learning Engineer (Classical ML and Predictive Analytics) to automate pipelines, monitor model performance, and manage infrastructure for high‑stakes applications like test plan generation, anomaly detection, predictive maintenance, and market intelligence. In our AWS‑centric ecosystem, you will leverage best‑in‑class tools to enable rapid iteration while maintaining compliance, security, and cost efficiency in regulated industrial settings.

Design, implement, and maintain end‑to‑end MLOps pipelines on AWS, including CI/CD automation for model training, validation, deployment, and retraining, using services like SageMaker, CodePipeline, CodeBuild, and Step Functions.
Support the Generative AI platform by operationalizing AWS Bedrock workflows, including RAG pipelines, vector databases (e.g., via OpenSearch or Pinecone integrations), Lambda functions, and agentic systems – ensuring scalability for large‑scale data processing like historical test plans and news article summarization.
Enable classical ML initiatives by deploying and monitoring models built with XGBoost, Scikit‑learn, and NLP architectures (e.g., RNNs/LSTMs) on AWS infrastructure, incorporating drift detection for anomaly tracking in sensor data and competitor pricing monitoring.
Manage infrastructure as code (IaC) using Terraform or CloudFormation to provision and optimize AWS resources, such as EC2 instances, S3 buckets, EMR for Apache Spark‑based processing (supporting our PMA product), and ECS/EKS for containerized deployments.
Implement comprehensive monitoring, logging, and alerting systems with CloudWatch, X‑Ray, and third‑party tools (e.g., Prometheus/Grafana integrations) to track model performance, detect anomalies, handle concept drift, and ensure high availability for customer‑facing tools like Q&A chatbots and predictive maintenance advisors.
Collaborate in an Agile environment with ML engineers, data scientists, and SRE teams to conduct A/B testing, version models, automate rollbacks, and optimize costs through auto‑scaling and spot instances.
Enforce security and compliance best practices, including IAM roles, VPC configurations, data encryption, and audit logging, to safeguard sensitive manufacturing data and meet industry standards.
Troubleshoot production issues, perform root‑cause analysis, and drive continuous improvements in ML operations, staying ahead of AWS innovations to enhance platform reliability and efficiency.

Qualifications

Bachelor’s or Master’s degree in Computer Science, Engineering, Information Systems, or a related technical field.
3–5 years of experience in MLOps, DevOps, or cloud engineering roles, with a proven track record of deploying and managing ML models in production environments.
Deep expertise in AWS services for ML and data workflows, including SageMaker, Bedrock, EMR, Lambda, S3, ECR, and orchestration tools like Step Functions or Airflow.
Proven experience with Amazon Elastic Container Registry (ECR): building, scanning for vulnerabilities, tagging, versioning, and pushing custom Docker images for inference containers; managing ECR lifecycle policies, replication across regions, and secure access via IAM roles.
Strong proficiency in EC2‑based ML deployments and infrastructure: selecting optimal instance types (e.g., ml.g family for GPU‑heavy GenAI inference, g5/g6 for newer accelerators), configuring Auto Scaling Groups, managing spot instances for cost optimization, and handling EC2 fleets for custom hosting when SageMaker/Bedrock abstractions are insufficient.
Expertise in load balancing & scaling for ML inference: configuring and troubleshooting Application Load Balancers (ALB) or Network Load Balancers (NLB) integrated with SageMaker endpoints or ECS/EKS tasks; implementing SageMaker’s built‑in routing strategies; setting up auto‑scaling policies and cross‑region inference profiles in Bedrock; ensuring high availability through multi‑AZ deployments with minimum instance counts ≥2.
Demonstrated ability to resolve common deployment issues in production ML environments, including cold‑start latency, container pull failures, IAM permission misconfigurations, model artifact corruption, endpoint update failures, drift/throttling, unhealthy instance recovery, and debugging via CloudWatch Logs, X‑Ray traces, and SageMaker Model Monitor alerts.
Proficiency in IaC tools such as Terraform or CloudFormation to provision and optimize AWS resources in a repeatable, auditable manner.
Strong scripting and programming skills in Python with libraries like Boto3, and experience in CI/CD pipelines using Jenkins, GitHub Actions, or AWS CodePipeline.
Familiarity with monitoring and observability stacks (e.g., CloudWatch, ELK Stack) and ML‑specific tools for versioning (e.g., MLflow) and experiment tracking.
Experience in Agile methodologies, with hands‑on participation in sprints, code reviews, and cross‑functional problem‑solving.
Solid understanding of ML concepts, including model drift, bias detection, and serving patterns.

Strongly Preferred

Fluency in English.
Prior exposure to manufacturing, semiconductor, or industrial IoT domains.
Certifications such as AWS Certified Machine Learning – Specialty, AWS Certified DevOps Engineer, or equivalent.
Experience with hybrid ML setups, integrating on‑premises data with cloud services, or handling large‑scale NLP/Numerical data pipelines.
Knowledge of security frameworks like SOC 2 or ISO 27001, and tools for automated testing of ML infrastructure.
Prior experience troubleshooting and optimizing SageMaker multi‑instance/multi‑variant endpoints and Bedrock inference profiles.
Hands‑on work with EC2 Auto Scaling in ML contexts, including handling GPU instance availability constraints, spot interruption recovery, and cost‑effective scaling for bursty inference workloads.
Familiarity with advanced deployment patterns such as blue/green deployments, canary rollouts, and rollback automation.

If you are a pragmatic, AWS‑savvy engineer excited about operationalizing cutting‑edge ML in mission‑critical industries, this role offers the opportunity to build resilient systems that directly impact our company’s innovation and customer outcomes. Join a dynamic team committed to excellence, with ample room for growth and technical leadership.

Keysight is an Equal Opportunity Employer.

Get your free, confidential resume review.

or drag and drop a PDF, DOC, DOCX, ODT, or PAGES file up to 5MB.