Enable job alerts via email!

Remote Software Engineer - Distributed ML Training - Gensyn

Gensyn

London

Remote

GBP 70,000 - 100,000

Full time

6 days ago
Be an early applicant

Generate a tailored resume in minutes

Land an interview and earn more. Learn more

Start fresh or import an existing resume

Job summary

A leading company in the machine learning space is seeking a skilled professional to design and implement orchestration systems for ML execution. The position offers an opportunity to work with cutting-edge technology in a fully remote setting, emphasizing autonomy and innovation. Ideal candidates will have a strong background in distributed systems and a willingness to learn Rust, contributing to significant advancements in AI compute resources that benefit humanity.

Benefits

Competitive salary
Equity and tokens
Fully remote work
Relocation assistance
Annual company retreats
Paid sick leave
Private health, vision, dental insurance

Qualifications

  • Hands-on experience with distributed systems and ML training.
  • Willingness to learn Rust, experience in large codebases valued.
  • Understand fundamental networking protocols (IP, TCP, UDP, HTTP).

Responsibilities

  • Design and implement systems for ML orchestration.
  • Continually optimize performance of training algorithms.
  • Collaborate on engineering-wide ML issues, including reproducibility.

Skills

Distributed foundation model training
Networking expertise
Open source contribution
Communication skills
Self-motivation

Education

Computer science background

Job description

The world will be unrecognisable in 5 years.

Machine learning models are driving our cars, testing our eyesight, detecting our cancer, giving sight to the blind, giving speech to the mute, and dictating what we consume, enjoy, and think. These AI systems are already an integral part of our lives and will shape our future as a species.

Soon, we'll conjure unlimited content: from never-ending TV series (where we’re the main character) to personalised tutors that are infinitely patient and leave no student behind. We’ll augment our memories with foundation models—individually tailored to us through RLHF and connected directly to our thoughts via Brain-Machine Interfaces—blurring the lines between organic and machine intelligence and ushering in the next generation of human development.

This future demands immense, globally accessible, uncensorable, computational power. Gensyn is the machine learning compute protocol that translates machine learning compute into an always-on commodity resource—outside of centralised control and as ubiquitous as electricity—accelerating AI progress and ensuring that this revolutionary technology is accessible to all of humanity through a free market.

Our Principles:

AUTONOMY

  • Don’t ask for permission - we have a constraint culture, not a permission culture.
  • Claim ownership of any work stream and set its goals/deadlines, rather than waiting to be assigned work or relying on job specs.
  • Push & pull context on your work rather than waiting for information from others and assuming people know what you’re doing.
  • No middle managers - we don’t (and will likely never) have middle managers.

FOCUS

  • Small team - misalignment and politics scale super-linearly with team size. Small protocol teams rival much larger traditional teams.
  • Thin protocol - build and design thinly.
  • Reject waste - guard the company’s time, rather than wasting it in meetings without clear purpose/focus, or bikeshedding.

REJECT MEDIOCRITY

  • Give direct feedback to everyone immediately rather than avoiding unpopularity, expecting things to improve naturally, or trading short-term pain for long-term pain.
  • Embrace an extreme learning rate rather than assuming limits to your ability/knowledge.
  • Drive ownership to final outcome, despite barriers.
Responsibilities

Design and implement systems for orchestration of ML execution—enabling training across our decentralised and heterogeneous infrastructure.

Performance optimisation—continually profile and optimise our training algorithms.

Implement novel research—develop mechanisms and algorithms to tackle new problems.

Engineering support—collaborate on wider ML issues (e.g., reproducible training).

Write & engage—contribute to technical reports and papers, and discuss with the community.

Minimum requirements

Hands-on experience with distributed foundation model training, designing or working with large cluster training systems.

Networking expertise—understanding and troubleshooting protocols like IP, TCP, UDP, HTTP; experience with NCCL, GLOO, MPI.

Open source experience—contributing to large codebases as maintainer or trusted contributor.

Willingness to learn Rust—since we are a Rust-based company, familiarity or readiness to learn is essential.

Computer science background—knowledge of algorithms, data structures, and computational complexity.

Self-motivated with excellent communication skills.

Comfortable working in an autonomous, research environment with unpredictable timelines.

Nice to haves

Experience in systems programming in Rust (knowledge of lifetimes, Pin, etc.).

Research background in distributed systems or ML domains.

Understanding of blockchain fundamentals.

Compensation / Benefits:

Competitive salary, equity, and tokens

Fully remote work—we hire between West Coast (PT) and Central Europe (CET) time zones.

Relocation assistance—available for those moving after hiring.

Annual company retreats—4 fully paid trips worldwide.

Equipment, paid sick leave, private health, vision, dental insurance—including dependents.

Get your free, confidential resume review.
or drag and drop a PDF, DOC, DOCX, ODT, or PAGES file up to 5MB.