Job Search and Career Advice Platform

Enable job alerts via email!

Lead Data Platform Engineer ( AI/ML , Start-up)

TEEMA Solutions Group

Toronto

On-site

CAD 100,000 - 130,000

Full time

Yesterday
Be an early applicant

Generate a tailored resume in minutes

Land an interview and earn more. Learn more

Job summary

A technology startup is seeking a Data Platform Software Lead Engineer to drive the architecture underlying its AI/ML pipelines. In this role, you'll design and build reliable systems for data ingestion and processing, focusing on large-scale code and text datasets. The ideal candidate has over 8 years of experience in data-intensive engineering and skills in Python, Spark, and AWS. Join a diverse team and influence the technical path of an innovative company.

Benefits

Competitive pay with equity
Work with cutting-edge cloud and ML/AI technologies
Collaborate with a diverse, high-caliber team
Dynamic, innovation-driven workplace culture

Qualifications

  • 8+ years in data-intensive software engineering.
  • Proficiency in Python, Go, or Scala; Spark or Ray; Airflow or Prefect; Kafka; Redis; Postgres or ClickHouse; GitHub APIs.
  • Understanding of how datasets power AI/ML workflows.
  • Proven experience in scalable data infrastructure and pipeline development.
  • Skills in web crawling, scraping, and large-scale ingestion.
  • Cloud-native experience (e.g., AWS, containerized compute, security).

Responsibilities

  • Architect and implement scalable data platforms for code/text dataset ingestion, processing, and delivery.
  • Build web-scale crawling and metadata extraction tools from open-source code repositories.
  • Develop reliable, distributed pipelines with frameworks like Spark, Kafka, and Airflow/Prefect.
  • Enable data visualization, sampling, and analytics for research teams to improve model performance.
  • Collaborate with researchers, infrastructure, and compliance teams to meet technical and governance requirements.

Skills

Python
Go
Scala
Spark
Ray
Airflow
Prefect
Kafka
Redis
Postgres
ClickHouse
GitHub APIs
Job description
Data Platform Software Lead Engineer – AI/ML Systems

Location: Hybrid (4 days onsite – Downtown Toronto)
Type: Full-Time | Start-up

About the Role

Join our team as a Data Platform Software Lead Engineer and drive the architecture that fuels our AI and ML pipelines, focusing on large-scale, code-based text datasets. You'll design and build reliable systems for data ingestion, transformation, and delivery—enabling teams to train, iterate, and deploy models with confidence.

What You’ll Do

Architect and implement scalable data platforms for code/text dataset ingestion, processing, and delivery.

Build web-scale crawling and metadata extraction tools from open-source code repositories.

Develop reliable, distributed pipelines with frameworks like Spark, Kafka, and Airflow/Prefect.

Enable data visualization, sampling, and analytics for research teams to improve model performance.

Collaborate with researchers, infrastructure, and compliance teams to meet technical and governance requirements.

Must-Have Skills

8+ years in data-intensive software engineering.

Proficiency in Python, Go, or Scala; Spark or Ray; Airflow or Prefect; Kafka; Redis; Postgres or ClickHouse; GitHub APIs.

Understanding of how datasets power AI/ML workflows.

Proven experience in scalable data infrastructure and pipeline development.

Skills in web crawling, scraping, and large-scale ingestion.

Cloud-native experience (e.g., AWS, containerized compute, security).

Bonus Points;

Curating or preparing code-based datasets for LLMs or AI tooling.

Familiarity with code parsing, tokenization, or embedding.

Startup or fast-paced high-ownership environment experience.

Why Work With Us

Influence the technical path of an AI-first startup.

Work with cutting-edge cloud and ML/AI technologies.

Receive competitive pay with equity.

Collaborate with a diverse, high-caliber team.

Thrive in a dynamic, innovation-driven workplace culture.

Screening Questions

Describe a scalable data pipeline you've built for processing large code or text datasets.

What tools or frameworks did you use for distributed data processing and why?

Share an example of a web crawling or scraping system you’ve developed at scale.

What components of your past data pipelines have you deployed in AWS (e.g., storage, compute, security)?

Have you worked with code-based datasets or applied parsing/tokenization in model workflows?

Apply Now

Interested? Submit your resume and a brief response to the screening questions to: sasha@talenttohire.com

Get your free, confidential resume review.
or drag and drop a PDF, DOC, DOCX, ODT, or PAGES file up to 5MB.