Job Search and Career Advice Platform

Enable job alerts via email!

Research Scientist Graduate (Multimodal Interaction & World Model) - 2026 Start (PhD)

BYTEDANCE PTE. LTD.

Singapore

On-site

SGD 80,000 - 100,000

Full time

Yesterday
Be an early applicant

Generate a tailored resume in minutes

Land an interview and earn more. Learn more

Job summary

A leading technology firm in Singapore seeks talented researchers for their Multimodal Interaction & World Model team. This role targets Ph.D. graduates in related fields to explore cutting-edge AI and multimodal technologies. Candidates should have strong analytical, problem-solving, and collaboration skills. Responsibilities include researching advanced multimodal models and contributing to the development of new AI-driven technologies and products, fostering innovation in an inspiring research environment.

Qualifications

  • Final year Ph.D. or recent Ph.D. graduates in Computer Science or related fields.
  • In-depth research in multimodal, AIGC, computer vision, or machine learning.
  • Excellent analytical and problem-solving skills.

Responsibilities

  • Research multimodal understanding, generative models, and machine learning technologies.
  • Explore advanced multimodal models and their applications.
  • Develop new technologies and products centered on AI.

Skills

Analytical skills
Problem-solving skills
Communication skills
Collaboration skills

Education

Ph.D. in Computer Science or related fields

Tools

C/C++ programming
Python programming
Job description

About the team

Welcome to the Multimodal Interaction & World Model team. Our mission is to solve the challenge of multimodal intelligence, virtual‑reality world interaction in AI. We conduct cutting‑edge research on areas such as foundations and applications of multimodal understanding models, multimodal agents and inference, unified models for generation and understanding, and world models. Our team comprises experienced research scientists and engineers dedicated to developing models that boast human‑level multimodal understanding and interaction capabilities. The team also aspires to advance the exploration and development of multimodal assistant products. We foster a feedback‑driven environment to continuously enhance our foundation technologies. Come join us in shaping the future of AI and transforming the product experience for users worldwide.

We are looking for talented individuals to join our team in 2026. As a graduate, you will get opportunities to pursue bold ideas, tackle complex challenges, and unlock limitless growth. Launch your career where inspiration is infinite at Bytedance.

Successful candidates must be able to commit to an onboarding date by end of year 2026. Please state your availability and graduation date clearly in your resume. Applications will be reviewed on a rolling basis – we encourage you to apply early.

Responsibilities
  • Explore and research multi‑modal understanding, generative, machine learning, reinforcement learning, AIGC, computer vision, artificial intelligence and other cutting‑edge technologies.
  • Explore the basic model of large‑scale/ultra‑large‑scale multimodal understanding and generation interweaving, and carry out extreme system optimization; data construction, instruction fine‑tuning, preference alignment, model optimization; improve the ability of data synthesis, scalable oversight, model reasoning and planning, build a comprehensive, objective and accurate evaluation system, and explore and improve the ability of large models.
  • Explore and break through the advanced capabilities of multimodal models and world models, including but not limited to multimodal RAG, visual COT and Agent, and build a universal multimodal Agent for GUI/games and other virtual worlds.
  • Use pre‑training, simulation and other technologies to model various environments in the virtual/real world, provide the basic ability of multimodal interactive exploration, promote application landing, and develop new technologies and new products with artificial intelligence technology as the core.
Qualifications
Minimum Qualifications
  • Final year Ph.D. or recent Ph.D. graduates in Computer Science, Software Engineering, Electronics, Mathematics and other related majors.
  • In‑depth research in one or more fields such as computer vision, multimodal, AIGC, machine learning, rendering generation, etc.
  • Excellent analytical and problem‑solving skills; ability to solve large‑model training and application problems; ability to explore solutions independently.
  • Good communication and collaboration skills, proactive work, able to harmoniously cooperate with the team to explore new technologies and promote technological progress.
Preferred Qualifications
  • Excellent basic algorithms, solid foundation of machine learning, familiarity with CV, AIGC, NLP, RL, ML and other fields of technology; papers published in CVPR, ECCV, ICCV, NeurIPS, ICLR, SIGGRAPH, SIGGRAPH Asia or other top conferences/journals are preferred.
  • Excellent coding ability, proficient in C/C++ or Python programming language; competition winners such as ACM/ICPC, NOI/IOI, TopCoder, Kaggle are preferred.
  • In the fields of multimodal, large model, basic model, world model, RL, rendering generation, leading projects with great influence is preferred.

By submitting an application for this role, you accept and agree to our global applicant privacy policy, which may be accessed here: https://jobs.bytedance.com/en/legal/privacy

If you have any questions, please reach out to us at apac-earlycareers@bytedance.com.

Get your free, confidential resume review.
or drag and drop a PDF, DOC, DOCX, ODT, or PAGES file up to 5MB.