Job Description
Hello, I hope you are doing well.
Position
Site Reliability Engineer
Location
France (REMOTE)
Language
English
Experience
8 Years
Duration
6 months Contract, with possibility of extension
Note
Banking or BFSI domain experience is mandatory.
Primary Responsibilities
- Develop software to make infrastructure services self-managing and self-service.
- Deliver continuous service improvement by developing Infrastructure as Code.
- Eliminate manual, repetitive, automatable tasks that add no value.
- Improve system performance, optimize resource utilization, distribute load, and reduce latency.
- Identify Service Level Objectives (SLOs) to meet availability and latency goals.
- Develop proactive monitoring solutions that alert on symptoms, not just outages.
- Perform detailed root cause analysis (RCA) on incidents and outages to prevent recurrence.
- Partner with development teams to enhance services via rigorous testing and release procedures.
- Identify technical debt and collaborate with application teams for remediation plans.
- Develop standard operational procedures and produce effective documentation.
- Analyze workloads and devise suitable cloud migration strategies where appropriate.
- Ensure project and investment workloads are delivered according to plans and budgets.
- Liaise with infrastructure control and IT risk teams to satisfy audit requirements.
- Deputize for the team lead when required and act accordingly.
- Identify cost-saving and optimization opportunities across the organization.
- Build strong working relationships across the organization.
Essential Skills
- Exceptional knowledge of PowerShell, including automation, API integration, and modularization.
- Strong skills in managing and maintaining Microsoft Windows Server internals and related technologies.
- Experience with managing Active Directory, DHCP, DNS, LDAP, and Kerberos.
- Advanced knowledge of clustering, high-availability, replication, and disaster recovery techniques.
- Ability to tune network, storage, server, and virtualization layers for optimal performance and reliability.
- Ability to interpret and implement CIS security hardening recommendations.
- Experience in hardware performance monitoring and tuning low-latency systems.
- Proficiency in backup and recovery processes and procedures.
- Excellent performance tuning skills and in-depth knowledge of system internals and performance analysis tools.
- Awareness of security and auditing requirements in a regulated environment.
Highly Desirable Skills
- Experience writing and managing plays/playbooks on AWX / Ansible Tower.
- Networking protocols knowledge (TCP/IP, DNS, DHCP, VLANs).