We are seeking a skilled Site Reliability Engineer (SRE) to ensure the reliability, performance, and scalability of proprietary trading systems. As an SRE, you will work closely with developers, traders, and infrastructure teams to maintain and enhance high-performance, low-latency trading platforms built primarily on Linux and Python, for a very exciting trading firm.
Key Responsibilities
- System Reliability & Performance:Monitor, analyze, and improve the reliability and performance of custom-built trading systems operating on Linux environments.
- Automation & Tooling:Develop and maintain automation tools and scripts (primarily in Python) to streamline deployment, monitoring, and incident response.
- Incident Management:Respond to production incidents, perform root cause analysis, and implement preventative measures to minimize downtime and trading disruptions.
- Deployment & CI/CD:Design and manage CI/CD pipelines to ensure smooth, automated, and reliable software releases.
- Monitoring & Observability:Implement and maintain comprehensive monitoring, alerting, and logging solutions to proactively identify and resolve system issues.
- Collaboration:Work closely with software engineers, traders, and IT operations to understand system requirements, troubleshoot issues, and optimize performance.
- Documentation & Best Practices:Create and maintain clear documentation for operational procedures, incident resolution, and system architecture. Promote and enforce SRE best practices across the team.
- Security & Compliance:Ensure trading systems comply with internal security policies and external regulations, implementing necessary controls and audits.
Requirements
- Experience:3+ years in a Site Reliability Engineer, DevOps, or related role, preferably in financial services, trading, or high-frequency environments.
- Linux Expertise:Advanced knowledge of Linux systems administration, performance tuning, and troubleshooting.
- Python Proficiency:Strong programming and scripting skills in Python for automation, tooling, and system integration.
- Monitoring Tools:Hands-on experience with monitoring and observability tools (e.g., Prometheus, Grafana, ELK stack, Nagios).
- CI/CD & Automation:Experience designing and managing CI/CD pipelines and configuration management (e.g., Jenkins, GitLab CI, Ansible).
- Networking:Solid understanding of networking concepts, low-latency systems, and protocols relevant to trading environments.
- Problem Solving:Excellent analytical and problem-solving skills, with a proactive approach to identifying and addressing reliability risks.
- Collaboration:Strong communication skills and ability to work effectively in a fast-paced, collaborative team environment.
Preferred Qualifications
- Experience with high-frequency or algorithmic trading systems.
- Familiarity with market data feeds, FIX protocol, or order management systems.
- Knowledge of C++ or other performance-oriented languages is a plus.
- Understanding of financial markets and trading concepts.
Benefits
- Work on cutting-edge trading technology in a high-impact role.
- Competitive compensation and benefits package.
- Opportunities for professional growth and learning in a dynamic, fast-paced environment.
- Collaborative and innovative team culture.