Job Responsibilities
- Support, maintain, and enhance the firm's trading Linux infrastructure.
- Support, maintain, and improve the firm's HPC infrastructure for research purposes.
- Provide support specifically for Linux and HPC environments, including emergency response and execution of planned changes, updates, and deployment projects within the Linux server infrastructure.
- Manage HPC systems to support trading operations and Condor Job scheduler.
- Perform advanced profiling and troubleshooting of performance issues within the Linux servers environment.
- Contribute to the development and refinement of tools to automate provisioning, configuration, and monitoring of Linux servers.
- Manage core services such as DHCP, LDAP, DNS, and NFS across on-premises data centers and public clouds.
- Participate in on-call rotations and occasional weekend shifts.
- Engage in daily communication with trading teams and core engineering.
- Stay updated with the latest technologies and best practices in HPC, storage, and GPU computing.
Qualifications
- Experience in maintenance, operation, and administration of advanced Linux environments.
- Proficiency in developing automation and monitoring tools.
- Deep understanding of Linux OS internals and concepts.
- Knowledge of Intel-based hardware and server components.
- Strong skills in Python and Bash scripting for automation tasks.
- Understanding of Linux server networking and protocols.
- Participation in open source or personal projects is a plus.
- Knowledge of Linux configuration management, source control, CI/CD, and automated deployment.
- Excellent communication skills and team collaboration abilities.
Preferred Qualifications
- Experience with containerization and orchestration tools like Docker and Kubernetes.
- Familiarity with cloud platforms and hybrid cloud environments.
- Knowledge of parallel file systems, batch systems, and high-performance network interconnects.
- Experience with VAST and Weka storage solutions.
- Understanding of trading infrastructure and low-latency systems.
- Strong problem-solving skills in fast-paced environments.
- Experience managing hybrid cloud and on-premises environments.
- Experience with Infrastructure as Code (IaC) practices.
Note: This job posting is active and accepting applications.