Job Search and Career Advice Platform

Enable job alerts via email!

Model Data Researcher

CLOUDSWAY PTE. LTD.

Singapore

On-site

SGD 50,000 - 70,000

Full time

Today
Be an early applicant

Generate a tailored resume in minutes

Land an interview and earn more. Learn more

Job summary

A data technology company located in Singapore is seeking a Data Processing Specialist to enhance data management for multimodal models. The candidate will be responsible for collecting and optimizing data, researching multimodal models, and ensuring data quality. Proficiency in Python and familiarity with tools like Pandas and NumPy are essential. The ideal candidate will also have experience with large-scale model training and the ability to design effective prompts. This role offers a dynamic environment in the fast-evolving field of generative AI.

Qualifications

  • Strong understanding of multimodal data processing and optimization techniques.
  • Experience with large language models and their practical applications.
  • Proficiency in designing and optimizing prompts for data generation.
  • Familiarity with data processing workflows and tools.

Responsibilities

  • Collect, clean, and format multimodal data for model training.
  • Research and analyze multimodal models to understand capabilities.
  • Provide data support for training and optimizing model performance.
  • Design prompts for generating target data and improving quality.

Skills

Multimodal data cleaning and labeling
Large language models understanding
Prompt design and optimization
Data processing tools (Pandas, NumPy)
Python programming
Problem-solving

Tools

Pandas
NumPy
Linux
Spark
Flink
Job description
Job Description
  • Responsible for the collection, cleaning, labeling, and formatting of multimodal data (text, speech, images, video, etc.). Build efficient data processing workflows to support model training and inference;
  • Call upon and research cutting‑edge multimodal models/large language models (such as chatGPT, SD, etc.), and understand the model's capability boundaries;
  • Provide high‑quality data support for multimodal model training and optimize model performance;
  • Design prompts to generate target data and optimize result quality.
Job Requirements
  • Familiar with multimodal data cleaning, labeling, and loading, and understand data optimization techniques (such as TFRecord, Sharding, etc.);
  • Experience in calling large language models/multimodal models, understanding their capability boundaries and applicable scenarios;
  • Ability to design and optimize prompts to improve the quality and efficiency of generated data;
  • Familiar with data processing tools (such as Pandas, NumPy) and able to complete a full data processing workflow;
  • High standards for data quality, meticulous and responsible, able to identify and resolve data problems;
  • Familiar with Python programming, and understand Linux environment and common development tools.
Bonus points:
  • Familiar with large‑scale model training and have a deep understanding of the role of data in training;
  • Have practical experience in developing multimodal/large language models;
  • Understand distributed data processing technologies (such as Spark, Flink);
  • Familiar with generative AI technologies and data annotation tools.
Get your free, confidential resume review.
or drag and drop a PDF, DOC, DOCX, ODT, or PAGES file up to 5MB.