Enable job alerts via email!

Platform Specialist - HPC

Squarepoint Capital

London

On-site

GBP 44,000 - 90,000

Full time

30+ days ago

Job summary

A leading company in financial technology seeks an HPC Architect to enhance its critical infrastructure. The ideal candidate will have extensive experience with Linux, system performance optimization, and strong problem-solving skills. This role offers a competitive salary, discretionary bonuses, and comprehensive benefits.

Benefits

Health insurance
Dental insurance
401(k) contributions

Qualifications

  • 10+ years experience with Linux in enterprise environments.
  • Experience identifying performance bottlenecks and tuning systems.
  • Strong communication and problem-solving skills.

Responsibilities

  • Design and document HPC Platform services including servers and storage.
  • Leverage modern computer architectures and enhance system performance.
  • Provide L3 support and manage project delivery with internal partners.

Skills

Linux
System tuning
Problem-solving
Communication

Education

Degree in Engineering
Degree in Computer Science

Tools

Ansible
Terraform
Chef
Kubernetes

Job description

Social network you want to login/join with:

The HPC Architect will be part of a talented global team focused on enterprise or low latency solutions. The candidate must demonstrate superb technical competency in delivering a mission critical infrastructure and ensuring the highest levels of availability, performance, and security.

The candidate will be responsible for research, design, L3 support, and documentation for Squarepoint’s HPC Platform. This will involve collaborating with our business partners, application owners, clients, vendors, and internal teams (Platform, Network, Application Support, Application Development, Quants, etc.) to deliver end to end solutions that can meet the needs for today and scale to meet the needs for tomorrow. This candidate will be flexible and flourish working in a high paced and challenging environment with capacity to grow and learn from peers.

  • Design, document, and enhance platform related services including servers, storage, and cloud.
  • Leverage modern computer architectures including but not limited to GPU, new CPU architectures, and modern HPC storage platforms.
  • Identify inefficient use of compute and storage resources and provide solutions to eliminate them.
  • Provide concise and professional documentation.
  • Efficiently measure HPC system performance using quantitative metrics to show usage and improvements over time.
  • Project delivery/management collaborating with internal and external partners.
  • Provide L3 escalation support to remediate performance and availability issues.
  • Identify areas of improvement before they become a problem.
  • Ability to customize solutions based on evolving requirements.

Required Qualifications:

  • 10+ years working with Linux (RHEL/Rocky/CentOS/OEL preferred) in an enterprise environment with the following areas of focus: operations, systems engineering and systems performance.
  • System tuning (memory/CPU/network) for high bandwidth compute infrastructure.
  • Experience with identifying low-level performance bottlenecks: induced by the OS, from software architecture, HPC storage, or on the network layer.
  • Full understanding of network protocols such as TCP, UDP, RDMA and how to properly tune servers and network for each
  • Physical server architecture understanding differences between CPU chipsets and when is the right time for each (Intel/AMD/ARM).
  • Experience working with applications written in Python and/or C++
  • Well-organized, proactive, resourceful, able to handle a fast-paced environment, question the status quo, accountable and possesses an ownership mindset.
  • Critical thinking and problem-solving skills to tackle troubleshooting the unknown, glitches and the obscure.
  • Strong communication: verbal and written.
  • Degree in Engineering, Computer Science, or related Information Technology experience.

Nice to have:

  • Experience with configuration management tools i.e. Ansible, Chef, and Terraform.
  • Familiarity with different network switch vendors and different switch architectures.
  • Experience with KDB (Q).
  • Ability to debug and enhance applications using at least one of the frameworks: XGBoost, LightGBM, PyTorch, Tensorflow
  • Kubernetes and how to integrate HPC workflows into it.

The minimum base salary for this role is $60,000 if located in New York. This expectation is based on available information at the time of posting. This role may be eligible for discretionary bonuses, which could constitute a significant portion of total compensation. This role may also be eligible for benefits, such as health, dental, and other wellness plans, as well as 401(k) contributions. Successful candidates’ compensation and benefits will be determined in consideration of various factors.

Get your free, confidential resume review.
or drag and drop a PDF, DOC, DOCX, ODT, or PAGES file up to 5MB.

Similar jobs