Enable job alerts via email!

Research Technology Developer

Campbell North Ltd.

London

On-site

GBP 60,000 - 100,000

Full time

30+ days ago

Boost your interview chances

Create a job specific, tailored resume for higher success rate.

Job summary

An established industry player is seeking a talented software engineer to join their dynamic Research Technology team. This role focuses on designing and building cutting-edge software for one of the largest HPC clusters globally, facilitating high-volume trades. You will work on developing an exascale filesystem, enhancing a dynamic job scheduler, and creating innovative solutions for managing complex systems. Ideal candidates will have a strong background in computer science, proficiency in Python, and experience with distributed systems. If you're passionate about technology and eager to make a significant impact in a fast-paced environment, this opportunity is perfect for you.

Qualifications

  • 5-10 years experience in large-scale distributed systems.
  • Proficient in Python, with knowledge of Golang and Rust a plus.

Responsibilities

  • Design and build software for HPC cluster focusing on performance.
  • Mentor junior team members and engage with researchers for solutions.

Skills

Problem-solving skills
Analytical skills
Experience with distributed systems
Proficiency in Python
Knowledge of algorithms and data structures
Ability to multitask
Self-motivation

Education

Strong academic grounding in computer science

Tools

Linux
Golang
Rust
GPU tooling

Job description

Company Overview:

Our client is a research-driven organisation led by passionate mathematicians and computer scientists. The Research Technology team lies at the heart of the company, managing one of the largest HPC clusters in the world. This team is critical to the firm's success, facilitating trades with daily volumes exceeding $250 billion globally.

Team Overview:

The Research Technology team is a full-stack team that collaborates closely with researchers to develop a highly performant, reliable, and transparent system. The team builds custom software to support an exa-scale filesystem, job scheduler, and zero-touch platforms for seamless integration with data centre operations. They are also responsible for developing custom file formats, compression algorithms, GPU tooling, and network management software to optimise performance.

Key Responsibilities:

  • Design and build software for the HPC cluster, focusing on performance, reliability, and scalability.
  • Mentor junior team members and push the boundaries of the team’s capabilities.
  • Engage constructively with researchers to find novel and scalable solutions.
  • Promote and implement radical changes and alternative ways of thinking while maintaining a pragmatic approach to minimise operational risks.
  • Manage and maintain a complex live system 24/7, delivering changes on short notice or tight deadlines.

What You Will Be Working On:

  • Developing an exascale filesystem handling billions of directories, a trillion files, and a million clients with complete resiliency against hardware failure.
  • Enhancing a dynamic job scheduler managing over 10 million entries and 100,000 concurrent tasks.
  • Building zero-touch platforms for monitoring, operating, and upgrading tens of thousands of machines.
  • Creating custom file formats, compression algorithms, and GPU tooling to optimise performance from 20,000 high-end GPUs.
  • Expanding the HPC cluster to provide access to more teams and multiple data centres.
  • Improving measurement and optimisation of resource usage across the entire cluster.

Essential Attributes:

  • Strong academic grounding in computer science fundamentals, including algorithms and data structures.
  • Proficiency in at least one statically typed language; experience with Golang and Rust is beneficial but not required. Scripting is primarily in Python.
  • Approximately 5-10 years of experience in designing and building large-scale distributed systems with highly scalable solutions.
  • Excellent problem-solving and analytical skills.
  • Familiarity with the Linux operating system, particularly in diagnosing performance and scalability issues.
  • Ability to multitask, manage multiple projects simultaneously, and prioritise effectively.
  • High self-motivation and the ability to work independently without supervision.
  • Understanding machine learning frameworks and compute offload devices, such as GPUs, is an advantage.

This role offers the opportunity to work in a fast-paced, research-driven environment where you can significantly impact the firm’s HPC infrastructure and overall success. We encourage you to apply if you are a self-starter passionate about developing cutting-edge technology.

Get your free, confidential resume review.
or drag and drop a PDF, DOC, DOCX, ODT, or PAGES file up to 5MB.