Enable job alerts via email!

LLM Ops Engineer

Quantios

Selangor

On-site

MYR 100,000 - 130,000

Full time

Yesterday

Be an early applicant

Generate a tailored resume in minutes

Land an interview and earn more. Learn more

Job summary

A dynamic tech company in Malaysia is seeking an LLMOps Engineer to develop and optimize their LLM-powered products. You will design and maintain efficient LLM pipelines while collaborating with multidisciplinary teams to implement robust AI solutions. The ideal candidate should have a Bachelor's degree and 4+ years of experience in software engineering, familiarity with AI frameworks, and strong Python skills. This position offers an exciting opportunity to shape best practices in a rapidly evolving AI landscape.

Qualifications

4+ years of experience in software engineering, data engineering, machine learning engineering or DevOps.
Hands-on experience with modern AI frameworks such as LangChain.
Strong understanding of CI/CD, version control, and environment management.

Responsibilities

Design and maintain LLM pipelines and RAG architectures.
Build and operate infrastructure for AI components using Azure.
Implement observability for LLM-based systems.

Skills

Python

AI frameworks

Observability tooling

Container orchestration

Problem-solving

Collaboration

Education

Bachelor’s degree in Computer Science or related field

Tools

Azure AI framework

Kubernetes

Azure DevOps

LangChain

Overview

As an LLMOps Engineer at Quantios, you will play a foundational role in building and operating the company’s first generation of Large Language Model–powered agentic products. You will work closely with AI developers, architects, DevOps engineers, and Product Owners to design, deploy, monitor, and optimise LLM pipelines, RAG architectures, and agent-based systems. This is a hands-on role suited to someone who enjoys solving complex technical problems, building scalable AI infrastructure, and shaping early-stage best practices.

Job Responsibilities

Model, Data, and RAG Pipelines – Design, implement, and maintain ingestion pipelines for LLM training and retrieval-augmented generation (RAG) datasets.
Model, Data, and RAG Pipelines – Develop and optimise chunking, embedding, enrichment, and indexing processes using LangChain or equivalent frameworks.
Model, Data, and RAG Pipelines – Manage the lifecycle of prompt templates, embedding models, LLM chains, evaluators, and model configurations.
Model, Data, and RAG Pipelines – Support experimentation, evaluation, and benchmarking of foundation models, prompts, and retrieval strategies.
LLM Infrastructure & Operations – Build and operate infrastructure for AI components using Azure AI Foundry, Azure OpenAI, Azure App Services, and related cloud services.
LLM Infrastructure & Operations – Implement secure hosting for RAG applications, vector databases, and agent runtimes.
LLM Infrastructure & Operations – Define and maintain CI/CD pipelines for LLM artefacts (datasets, prompts, model configs, evaluation suites) using Azure DevOps.
LLM Infrastructure & Operations – Collaborate with DevOps engineers to support environment provisioning, scalability, reliability, and performance.
Observability, Quality & Monitoring – Establish foundational observability for LLM-based systems, including telemetry, latency tracking, cost visibility, and model diagnostics.
Observability, Quality & Monitoring – Monitor and surface signals such as hallucination rates, evaluation scores, retrieval quality, and content safety triggers.
Observability, Quality & Monitoring – Implement automated evaluation pipelines for prompts, responses, and RAG relevance metrics.
Observability, Quality & Monitoring – Ensure LLM quality gates are integrated into CI/CD workflows.
Security, Governance & Compliance – Apply responsible AI principles in line with Quantios’ AI and ISMS policies.
Security, Governance & Compliance – Ensure privacy, access control, and logging for all model interactions and vector index operations.
Security, Governance & Compliance – Support red-team style penetration testing for prompt injection, leakage, and model-based social engineering risks.
Security, Governance & Compliance – Contribute to documenting LLM pipelines, governance patterns, and internal standards.
Security, Governance & Compliance – Work with AI developers to integrate LLM and RAG components into product features.
Security, Governance & Compliance – Partner with Portfolio Architects to evaluate new AI technologies, patterns, and architectural approaches.
Security, Governance & Compliance – Collaborate with Product Owners to shape technical feasibility, performance considerations, and release planning for AI-enabled features.
Security, Governance & Compliance – Participate in Agile ceremonies, contribute to estimation, and help the team deliver high-quality AI capabilities.
Security, Governance & Compliance – Stay up to date with emerging tools in LLMOps, RAG optimisation, evaluation methodologies, and vector search technologies.
Security, Governance & Compliance – Propose improvements to scalability, model performance, prompt engineering practices, and developer workflows.
Security, Governance & Compliance – Contribute to establishing early LLMOps best practices that will scale as the organisation’s AI capability grows.

Job Requirements

Bachelor’s degree in Computer Science, Software Engineering, Data Engineering, or a related field; or equivalent industry experience.
4+ years of experience in software engineering, data engineering, machine learning engineering, or DevOps. Preferably within cloud environments.
Hands-on experience with Python and modern AI frameworks (e.g., LangChain, Semantic Kernel, MC-based tools, or equivalent).
Familiarity with vector databases, embeddings, and retrieval pipelines (Azure AI Search, Pinecone, Chroma, Redis Vector, or similar).
Strong understanding of CI/CD, version control, and environment management (Azure DevOps preferred).
Experience with container orchestration using Kubernetes (AKS or equivalent) and containerized deployments.
Experience with observability tooling and practices (Azure Monitor, logging, tracing, metrics).
Knowledge of modern front-end or service development technologies (React, TypeScript, C#, or equivalent) is beneficial.
Strong problem-solving, analytical, and debugging skills with a passion for building reliable AI-driven systems.
Excellent communication skills and ability to collaborate across multidisciplinary teams.

Get your free, confidential resume review.

or drag and drop a PDF, DOC, DOCX, ODT, or PAGES file up to 5MB.

Top locations

Top companies

Top positions