Enable job alerts via email!

Senior Gen AI Engineer

ITCO Solutions

Remote

CAD 120,000 - 160,000

Full time

Today

Be an early applicant

Generate a tailored resume in minutes

Land an interview and earn more. Learn more

Job summary

A leading technology firm is seeking an experienced AI/ML Engineer to lead the pre-training of massive LLMs (100B+ parameters) in a fully remote setting. You will be responsible for architecting training pipelines, optimizing distributed training, and managing extensive datasets. The ideal candidate holds an advanced degree and has a proven track record in large-scale model training and distributed frameworks. Join us to work with cutting-edge technology in a high-impact environment that shapes the future of AI.

Qualifications

Advanced degree from a top-20 global university in CS.
3+ years hands-on experience with large-scale model training.
Proven experience in pretraining models exceeding 10B parameters.
Deep expertise in distributed training frameworks.
Experience with large-scale cloud environments.

Responsibilities

Architect training pipelines for LLMs with 100B+ parameters.
Optimize distributed training performance.
Collaborate with research scientists to implement training runs.
Manage and preprocess petabyte-scale datasets.
Conduct benchmarking and performance tuning.

Skills

Large-scale deep learning model training

Distributed training frameworks

Python

CUDA

Performance optimization

Education

PhD or Master’s in Computer Science or related field

Tools

AWS

Azure

GCP

Kubernetes

Ray

100% Remote

Job Title

AI/ML Engineer – Large Language Model Pretraining (100B+ Parameters)

Log-line

Gen AI Engineer creating and developing LLMs from "Test to Production".

Location

West Coast 100% Remote

Role Overview

As a Gen AI Engineer, you will lead the pre-training of massive LLMs (100B+ parameters), requiring deep expertise in distributed training, large‑scale optimization, and model architecture. This is a rare opportunity to work with petabyte-scale datasets and cutting‑edge compute clusters in a high‑impact environment.

Key Responsibilities

Architect and implement large‑scale training pipelines for LLMs with 100B+ parameters.
Optimize distributed training performance across thousands of GPUs/TPUs.
Collaborate with research scientists to translate experimental results into production‑grade training runs.
Manage and preprocess petabyte-scale datasets for pretraining.
Implement state‑of‑the‑art techniques in scaling laws, model parallelism, and memory optimization.
Conduct rigorous benchmarking, profiling, and performance tuning.
Contribute to client research in LLM architecture, training stability, and efficiency.

Required Qualifications

Advanced degree (PhD or Master’s) in Computer Science, Machine Learning, or related field from a top‑20 global university in CS.
3+ years of hands‑on experience with large‑scale deep learning model training.
Proven experience in pretraining models exceeding 10B parameters, preferably 100B+.
Deep expertise in distributed training frameworks (DeepSpeed, Megatron‑LM, PyTorch FSDP, TensorFlow Mesh, JAX/TPU).
Proficiency with parallelism strategies (data, tensor, pipeline) and mixed precision training.
Experience with large‑scale cloud or HPC environments (AWS, Azure, GCP, Slurm, Kubernetes, Ray).
Strong skills in Python, CUDA, and performance optimization.
Strong publication record in top‑tier ML/AI venues (NeurIPS, ICML, ICLR, ACL, etc.) preferred.

Preferred Skills

Experience with LLM fine‑tuning (RLHF, LoRA, PEFT).
Familiarity with tokenizer development and multilingual pretraining.
Knowledge of scaling laws and model evaluation frameworks for massive LLMs.
Hands‑on work with petabyte‑scale distributed storage systems.

Verify: United States Employment Opportunities Only

Get your free, confidential resume review.

or drag and drop a PDF, DOC, DOCX, ODT, or PAGES file up to 5MB.

Top cities

Top companies

Popular jobs