Enable job alerts via email!

LLM Global Data - Training Operation (Coding) Specialist

BYTEDANCE PTE. LTD.

Singapore

On-site

SGD 45,000 - 65,000

Full time

Today
Be an early applicant

Boost your interview chances

Create a job specific, tailored resume for higher success rate.

Job summary

ByteDance is seeking an LLM Training Operations Specialist/Analyst to manage coding-focused projects within our Global Data Team. This role involves overseeing the operational workflows for LLM training, enhancing data quality, and mentoring team members. Candidates should hold a Bachelor's in Computer Science and have programming experience, particularly in Python or Java.

Qualifications

  • 1-2 years of experience on software engineering teams.
  • Experience with programming languages such as Python, Java, Go, or C.
  • Strong project management skills and ability to manage complex workflows.

Responsibilities

  • Lead and manage multiple coding-focused LLM training projects.
  • Design and optimize workflows for coding-focused LLM training projects.
  • Conduct quality and productivity improvement experiments for code-related training data.

Skills

Project Management
Communication
Problem Solving
Python
Java

Education

Bachelor's degree in Computer Science

Job description

About ByteDance

Founded in 2012, ByteDance's mission is to inspire creativity and enrich life. With a suite of more than a dozen products, including TikTok, Lemon8, CapCut and Pico as well as platforms specific to the China market, including Toutiao, Douyin, and Xigua, ByteDance has made it easier and more fun for people to connect with, consume, and create content.

Why Join Us

Inspiring creativity is at the core of ByteDance's mission. Our innovative products are built to help people authentically express themselves, discover and connect – and our global, diverse teams make that possible. Together, we create value for our communities, inspire creativity and enrich life - a mission we work towards every day.

As ByteDancers, we strive to do great things with great people. We lead with curiosity, humility, and a desire to make impact in a rapidly growing tech company. By constantly iterating and fostering an "Always Day 1" mindset, we achieve meaningful breakthroughs for ourselves, our Company, and our users. When we create and grow together, the possibilities are limitless. Join us.

Diversity & Inclusion

ByteDance is committed to creating an inclusive space where employees are valued for their skills, experiences, and unique perspectives. Our platform connects people from across the globe and so does our workplace. At ByteDance, our mission is to inspire creativity and enrich life. To achieve that goal, we are committed to celebrating our diverse voices and to creating an environment that reflects the many communities we reach. We are passionate about this and hope you are too.

About the team

As a core member of our LLM Global Data Team, you'll be at the heart of our operations. Gain first-hand experience in understanding the intricacies of training Large Language Models (LLMs) with diverse data sets. Through our carefully designed rotation program, you'll witness how high-quality data is meticulously crafted and used.

As a key member of our LLM Global Data Team, the LLM Training Operations Specialist/Analyst will play a pivotal role in managing the intricate processes involved in training large language models (LLMs) with diverse coding datasets. This role focuses on overseeing and improving operational workflows, primarily for code-related projects, ensuring they are delivered with high quality and efficiency.

Your Role Will Involve:

1. Lead and manage multiple coding-focused LLM training projects, ensuring timelines, quality standards, and objectives are met. Track project progress, identify risks, and implement corrective actions as necessary to keep projects on course. Build and maintain strong relationships with product managers, researchers, data annotators, and other cross-functional team members. Communicate project updates, address concerns, and align expectations to ensure successful project outcomes. Coordinate meetings and discussions with global teams to ensure seamless project execution and work with external vendors and trainers per project demands..

2. Design, manage, and optimize workflows for coding-focused LLM training projects, including training design, QA processes, and performance tracking to meet project needs. Collaborate closely with product managers, project leaders and cross-functional teams to ensure alignment on quality metrics and project expectations.

3. Conduct quality and productivity improvement experiments to enhance operational processes for code-related training data. Lead and support general annotation operation improvement initiatives across various data domains. Develop and maintain technical guidelines and casebooks to support consistent, high-quality data production.

4. Design and implement robust data analysis strategies to evaluate training and evaluation datasets for LLM projects in the coding domain. Analyze annotation quality, model performance, and dataset coverage using statistical, visual, and programmatic methods. Identify data gaps, edge cases, and failure patterns by conducting slice-based evaluations, prompt sensitivity tests, and cluster-based error analysis. Utilize tools such as Python (Pandas, NumPy, Matplotlib) and SQL to generate actionable insights, monitor data pipeline health, and support model training operations. Collaborate with model trainers and researchers to inform training strategies and guide data-centric iterative improvements.

6. Provide mentorship and guidance to team members, helping to develop their skills and ensuring the delivery of high-quality outputs. Foster a collaborative environment where team members can share knowledge and best practices to improve overall performance.

Qualifications

Minimum Qualifications

1. Bachelor's degree in Computer Science, a related technical field, or equivalent practical experience.

2. 1-2 years of experience in project or operations management roles on software engineering teams.

3. 1-2 years of experience with programming languages such as Python, Java, Go, or C, acquired through coding projects or technical roles.

4. Strong communication and problem-solving skills with the ability to understand and convey code-related concepts effectively.

5. Strong project management skills, with the ability to design, manage, and optimize complex workflows.

6. Ability to balance independent judgment with collaborative teamwork in a fast-paced, project-based environment.

7. Deep interest in LLMs, computational thinking, and ability to adapt to a high-intensity work environment.

Preferred Qualifications:

1. Experience in competitive coding such as Codeforce, CPC at regional or international level.

2. Proficiency in Mandarin Chinese (reading and speaking) to effectively communicate with Chinese-speaking global teams.

3. Experience in RLHF annotation and working with leading AI/LLM companies on technical projects.

4. Experience with codebases and understanding of software development processes, coding best practices, and version control systems (e.g., Git). Familiarity with full-stack concepts, including front-end interfaces, back-end logic, and database integration.

5. Proven ability to lead and mentor junior team members in data-related or AI/LLM projects.

6. Enthusiasm for learning, engaging with diverse technical case studies, working with global teams, and comfort with technology tools that enhance project performance.

Note: This role requires a paper test prior to interviews.

Get your free, confidential resume review.
or drag and drop a PDF, DOC, DOCX, ODT, or PAGES file up to 5MB.