Job Description:
Hello, I hope you are doing well.
Position: Site Reliability Engineer
Location: France (Remote)
Language: English
Experience: 8 Years
Duration: 6 months Contract, extendable
Note: Banking or BFSI domain experience is mandatory.
Primary Responsibilities:
- Develop software to make infrastructure services self-managing and self-service.
- Deliver continuous service improvement by developing Infrastructure as Code.
- Eliminate manual, repetitive, automatable, tactical tasks that add no value.
- Improve system performance, optimize resource utilization, distribute load, and reduce latency.
- Identify SLOs (Service Level Objectives) to meet availability and latency goals.
- Develop proactive monitoring solutions that alert on symptoms, not just outages.
- Perform detailed root cause analysis (RCA) on incidents and outages to prevent recurrence.
- Partner with development teams to improve services via rigorous testing and release procedures.
- Identify technical debt and collaborate with application teams to create remediation plans.
- Develop standard operational procedures and produce effective documentation.
- Analyze workloads and devise suitable cloud migration strategies where appropriate.
- Ensure project and investment workloads are delivered according to plans and budgets.
- Liaise with infrastructure control and IT risk teams to satisfy audit requests.
- Deputise for the team lead when required and act accordingly.
- Identify cost-saving and optimization opportunities across the organization.
- Build strong working relationships across the organization.
- Adhere to the core values of the bank.
Essential Skills and Qualifications:
- Exceptional knowledge of PowerShell, including automation, API integration, and modularization.
- Strong skills in Microsoft Windows Server internals and related technologies.
- Experience managing and maintaining Active Directory, DHCP, DNS, LDAP, and Kerberos.
- Advanced knowledge of clustering, high availability, replication, and disaster recovery techniques.
- Ability to tune network, storage, server, and virtualization layers for optimal performance and reliability.
- Ability to interpret and implement CIS security hardening recommendations.
- Extensive experience in hardware performance monitoring and tuning low latency systems.
- Fluency in backup and recovery processes and procedures.
- Excellent performance tuning skills and in-depth knowledge of system internals and performance analysis tools.
- Awareness of security and auditing requirements in a regulated environment.
Highly Desirable Skills:
- Experience writing and managing Ansible playbooks on AWX / Ansible Tower.
- Networking protocols knowledge (TCP/IP, DNS, DHCP, VLANs).