Production and Reliability Management Expert
05/29/2025
Contract
Active
Job Description:
Job Summary
Client is seeking a skilled Production & Reliability Management Expert to join our Cyber Data Risk & Resilience (CDRR) team within the Identity & Access Management (IAM) domain. In this role, you will act as a key member of a global team responsible for safeguarding the firm through the reliability and operational excellence of IAM control platforms. You’ll be managing incident response, supporting Agile development integration, and driving automation initiatives while working with the latest cloud and data technologies.
This is a unique opportunity to contribute to cybersecurity defense at a global financial leader through cutting-edge technologies and agile development principles.
Key Responsibilities
- Manage critical production incidents and communicate effectively with key business and technology stakeholders
- Embed production support principles in Agile/DevOps development cycles to ensure high standards for production readiness
- Own issue resolution and incident management, including leading incident calls and coordinating cross-functional teams
- Reduce support costs through automation, optimization, and development of operational tools
- Analyze technical debt and operational inefficiencies to prioritize remediation and stability improvements
- Identify, design, and implement automation solutions for business process improvements
- Develop, test, and deploy automation code; monitor and troubleshoot automation workflows
- Collaborate with stakeholders to understand requirements and deliver scalable and reliable solutions
- Work within Agile, Scrum, DevOps, and Site Reliability Engineering (SRE) frameworks to ensure continuous delivery and operational excellence
Required Qualifications- Bachelor’s degree in Computer Science, Software Engineering, or a related technical field
- 4–5+ years of industry experience in software development and production support
- Strong Java development experience in building multi-threaded, scalable applications
- Proficiency in Python and Shell scripting
- Hands-on experience with web programming and developing REST/SOAP APIs
- Strong SQL skills and familiarity with DB2, Sybase, or Snowflake
- Experience with automated testing, SDLC pipelines, and automated deployment practices
- Solid working knowledge of Unix/Linux environments and infrastructure components like load balancing
- Familiarity with DevOps tools such as Ansible, GitHub, or other CI/CD and release management tools
- Excellent problem-solving skills and ability to work independently in high-pressure environments
- Strong interpersonal and communication skills to effectively interact across all organizational levels
Preferred Qualifications (if any)- Experience working in financial services or cybersecurity operations
- Familiarity with IAM platforms and concepts such as user lifecycle, entitlements, and privileged access management
- Understanding of cloud technologies, infrastructure-as-code, and enterprise monitoring systems
- Certifications in Agile, DevOps, or SRE methodologies (e.g., SAFe, CKA, SRE Practitioner)
Certifications (if any)- Relevant technical certifications (e.g., Java, Python, DevOps, Cloud, or SRE) are a plus but not required.