We are seeking a highly skilled Senior GenAI Platform Engineer who can independently manage the full lifecycle of Generative AI (GenAI) solution development – from model selection and fine‑tuning to backend integration and production deployment. This role requires a hybrid skillset across GenAI solution engineering, backend development, and Site Reliability Engineering (SRE), ensuring robust, scalable, and reliable GenAI solutions fully integrated with core banking and enterprise systems.
Mandatory Skill Set
- 3–5 years of hands‑on experience in developing, fine‑tuning, and deploying GenAI/LLM applications (OpenAI, Anthropic, Gemini, Llama, etc.)
- Experience with RAG pipelines and frameworks such as LangChain, LlamaIndex, or Hugging Face Transformers
- Hands‑on experience with Docker, Kubernetes, and Terraform for containerization, orchestration, and infrastructure automation
- Experience with LLMOps and SRE principles – reliability, scalability, monitoring, and performance optimization
- Proven experience implementing CI/CD pipelines (GitLab CI, Jenkins, or similar tools)
- Strong backend development skills using Python, Golang, or Node.js for APIs and microservices
Desired Skill Set
- Experience in cloud environments such as AWS, GCP, or Azure
- Familiarity with vector databases (e.g., Pinecone, Weaviate, FAISS, Milvus) for context retrieval
- Working knowledge of banking system integrations, especially core banking or CRM platforms
- Exposure to API Gateway management and secure data exchange protocols
- Background in CI/CD pipeline design using GitLab CI, Jenkins, or similar tools
Responsibilities
- Lead the end‑to‑end design, development, and deployment of GenAI solutions integrated with enterprise systems
- Build and optimize RAG pipelines and backend services powering GenAI capabilities
- Develop microservices and APIs to integrate AI functionalities within core banking or enterprise applications
- Design and implement LLM observability and performance monitoring frameworks
- Automate and manage infrastructure through Terraform, Docker, and Kubernetes
- Set up and maintain Model Context Protocol (MCP) servers for efficient context streaming
- Implement LLMOps best practices for continuous monitoring, versioning, and retraining of models
- Define and manage CI/CD pipelines for infrastructure, backend code, and GenAI models
- Ensure system reliability, scalability, and cost optimization across environments
- Collaborate cross‑functionally with business, data, and IT teams to ensure smooth solution delivery
Location: Kuala Lumpur City Centre, Kuala Lumpur, MY