Job Search and Career Advice Platform

Enable job alerts via email!

Pre-Training Data ML Engineer — Build Scalable Pipelines (Remote)

Cohere

Remote

CAD 80,000 - 120,000

Full time

30+ days ago

Generate a tailored resume in minutes

Land an interview and earn more. Learn more

Job summary

A leading AI research organization is seeking a Machine Learning Engineer specializing in pretraining data to design and manage robust data pipelines. The role focuses on ingesting, cleaning, and optimizing diverse datasets to enhance AI model performance. Ideal candidates will have strong software engineering skills, be proficient in Python, and possess experience with large-scale datasets. This position offers a collaborative work environment and is part of a mission-driven team aimed at advancing AI capabilities.

Benefits

Open and inclusive culture
Weekly lunch stipend
Full health and dental benefits
Parental Leave top-up
Personal enrichment benefits
6 weeks of vacation

Qualifications

  • Experience working with large-scale datasets, including web data and multilingual corpora.
  • Passion for bridging research and engineering to solve complex data-related challenges.

Responsibilities

  • Design and build scalable data pipelines to ingest, clean, filter, and optimize diverse datasets.
  • Conduct data ablations to assess data quality and experiment with data mixtures.
  • Collaborate with cross-functional teams to ensure data pipelines meet model demands.

Skills

Strong software engineering skills
Proficiency in Python
Experience building data pipelines
Familiarity with data processing frameworks
Knowledge of data quality assessment techniques

Tools

Apache Spark
Apache Beam
Pandas
Job description
A leading AI research organization is seeking a Machine Learning Engineer specializing in pretraining data to design and manage robust data pipelines. The role focuses on ingesting, cleaning, and optimizing diverse datasets to enhance AI model performance. Ideal candidates will have strong software engineering skills, be proficient in Python, and possess experience with large-scale datasets. This position offers a collaborative work environment and is part of a mission-driven team aimed at advancing AI capabilities.
Get your free, confidential resume review.
or drag and drop a PDF, DOC, DOCX, ODT, or PAGES file up to 5MB.