Job Search and Career Advice Platform

Enable job alerts via email!

Software Engineer, AI Large Model Platform

TikTok Pte. Ltd.

Singapore

On-site

SGD 80,000 - 110,000

Full time

3 days ago
Be an early applicant

Generate a tailored resume in minutes

Land an interview and earn more. Learn more

Job summary

A leading social media platform is seeking a technical expert to design and build an end-to-end model lifecycle management platform in Singapore. This role involves high availability and scalability of systems, focusing on automated workflows for risk control and operational efficiency. Candidates should be proficient in Go, Python, or Java, with strong skills in systems design and optimizing high-concurrency environments. The ideal candidate will collaborate closely with product and algorithm teams and has experience with machine learning frameworks.

Qualifications

  • Proficient in at least one of Go/Python/Java, with a solid foundation in computer science.
  • Experience in designing and optimizing high-concurrency and distributed systems.
  • Excellent cross-team communication and project advancement skills.

Responsibilities

  • Design and build an end-to-end model lifecycle management platform.
  • Lead the development of a model service gateway and resource scheduling system.
  • Implement an automated model iteration workflow system.

Skills

Proficient in Go/Python/Java
Designing and optimizing high-concurrency systems
Cross-team communication
Passion for technology

Tools

PyTorch
TensorFlow
Job description
Responsibilities

Team Introduction We are building an automated moderation and risk control system for international monetization, aiming to create a highly available and scalable Agentic System that horizontally supports risk control, user experience, and operational efficiency, while vertically consolidating a self-iterating closed-loop capability centered on policies. The platform needs to handle hundreds of millions of daily tasks and collaborate closely with product, operations, development, and algorithm teams to form an end-to-end solution from strategy generation and machine moderation execution to attribution optimization.

  • Design and build an end-to-end model lifecycle management platform, covering the complete workflow from model training, evaluation, and version control to online deployment and monitoring, thereby enhancing model development and operational efficiency.
  • Design and build automated moderation workflow and agentic system
  • Lead the development of a model service gateway and resource scheduling system to achieve automated scheduling of heterogeneous computing resources, a unified inference interface, high-concurrency load balancing, and establish a comprehensive service health monitoring and self-healing mechanism.
  • Design and implement an automated model iteration workflow system, integrating processes for auto-triggered evaluation, data feedback, strategy comparison, and intelligent deployment, promoting a data-driven continuous optimization loop for models within risk control strategies.
  • Ensure high availability and scalability of the platform.

Design and implement disaster recovery and degradation strategies to support the stable operation of hundreds of millions of daily tasks, while continuously optimizing system performance and resource utilization.

Qualifications

Minimum Qualification(s) - Technical Skills: Proficient in at least one of Go/Python/Java, with a solid foundation in computer science and system design capabilities.

Possess experience in designing and optimizing high-concurrency and distributed systems. - Business & Architectural Mindset: Strong capability in business abstraction and complex system design, able to independently architect key modules or subsystems, with a clear consideration for the long-term impact of technical decisions.

Collaboration & Drive: Excellent cross-team communication and project advancement skills, capable of efficiently collaborating with product, algorithm, and operations teams.

Demonstrates strong passion for technology, outstanding self-motivation, responsibility, and learning agility.

Preferred Qualification(s): - Experience in Machine Learning/Large Models is a plus: Understanding of fundamental principles of machine learning/deep learning, familiarity with model development or deployment workflows using frameworks like PyTorch/TensorFlow.

Prior project experience in any related area such as large model training/fine-tuning, inference optimization, or Agent application development is preferred.

Get your free, confidential resume review.
or drag and drop a PDF, DOC, DOCX, ODT, or PAGES file up to 5MB.