Lead Snowflake Engineer, Cloud Site Reliability Engineering
London Stock Exchange Group, St Louis, United States
Role Profile
The Lead Snowflake Site Reliability Engineer (SRE) will be responsible for ensuring the reliability, availability, and scalability of our systems while driving continuous improvement through automation and data analysis. This includes monitoring systems, identifying and resolving issues, and implementing automation to enhance efficiency. The SRE will collaborate closely with development and platform teams to ensure our systems are reliable and scalable. The role involves working with technologies such as Snowflake, GitLab, Python, Docker, Terraform, AWS, and Azure.
Responsibilities
- Site Reliability: Provide recommendations to teams on issue resolution, CI/CD automation, and developing tools to improve the DevOps framework.
- Observability: Manage system metrics to ensure comprehensive understanding of system health.
- Build: Collaborate with the Snowflake Platform team to develop monitoring capabilities for Operations, Security, and Financial Operations.
- Compliance: Ensure adherence to internal security policies and standards.
- Incident Response: Assist in incident management, root cause analysis, and post-mortem reviews.
- Scalability and Cloud Migration: Design and implement scalable cloud solutions and migrate systems to cloud environments.
- Adaptability to Change: Keep systems up-to-date through upgrades, patches, and migrations, adapting to evolving technologies and industry trends.
- CI/CD Processes: Support and develop CI/CD pipelines on GitLab to enhance automation.
- Change & Release: Support change management processes for new services and deployments.
- Documentation: Create comprehensive documentation to support observability, automation, and system resiliency.
- Center of Excellence (CoE): Contribute to best practices, standards, and guidance for integration and engineering practices.
- Level Objectives: Document and understand SLIs, SLOs, SLAs, KPIs, and OKRs.
- On-Call Support: Participate in on-call rotations to ensure continuous system reliability.
- Culture: Promote a customer-focused and continuous improvement mindset within the SRE team.
Candidate Profile / Key Skills
- Minimum 6 years of industry experience, preferably in financial services.
- Bachelor's degree or equivalent in Computer Science, IT, or related field.
- Experience with Snowflake, ideally in an administrative capacity.
- Proven leadership with mentoring and guiding engineers.
- Strong programming/scripting skills, especially Python, Bash, PowerShell.
- Experience with cloud platforms (AWS, Azure).
- Familiarity with CI/CD tools (GitLab CI, Jenkins), with GitLab preferred.
- Experience with infrastructure automation tools like Terraform, Ansible.
- Proficiency in version control systems (Git).
- Experience with APM tools such as DataDog, Grafana.
- Knowledge of SQL and relational databases.
- Excellent communication and collaboration skills.
- Skills in cost optimization techniques.
- Proactive, resilient, positive attitude.
- Strong problem-solving and decision-making abilities.
London Stock Exchange Group is a leading global financial markets infrastructure and data provider committed to driving financial stability and sustainable growth. Our values of Integrity, Partnership, Excellence, and Change underpin our culture and decision-making. We offer a diverse, inclusive workplace and support initiatives for sustainability and economic growth.