Enable job alerts via email!

Site Reliability Engineer II

Bank of America

England

On-site

GBP 100,000 - 125,000

Full time

30+ days ago

Boost your interview chances

Create a job specific, tailored resume for higher success rate.

Job summary

An established industry player is seeking a Site Reliability Engineer to enhance the reliability of critical applications. This role involves collaborating with engineering teams, implementing monitoring solutions, and driving incident resolution. The ideal candidate will possess strong analytical skills, a solid understanding of ITIL processes, and experience with tools like Splunk and Dynatrace. Join a dynamic team where you can make a significant impact on service reliability and efficiency, while also developing your professional skills in a supportive environment focused on growth and wellness.

Qualifications

  • Knowledge of ITIL processes, incident management, and problem management.
  • Experience with monitoring tools and scripting languages.

Responsibilities

  • Develop and maintain reliability scripts and tools for operational needs.
  • Collaborate with teams to implement monitoring capabilities.

Skills

Analytical Thinking
Automation
Collaboration
Production Support
Result Orientation
Application Development
Architecture
Influence
Project Management
Solution Design
Adaptability
DevOps Practices
Risk Management
Solution Delivery Process
Stakeholder Management

Tools

Splunk
Dynatrace
Windows Operating System
Linux Operating System
Oracle
MS SQL
Wireshark

Job description

Site Reliability Engineer II page is loaded

Site Reliability Engineer II

Apply locations Pennington time type Full time posted on Posted Yesterday job requisition id 25012946

Job Description:

At Bank of America, we are guided by a common purpose to help make financial lives better through the power of every connection. We do this by driving Responsible Growth and delivering for our clients, teammates, communities and shareholders every day.

Being a Great Place to Work is core to how we drive Responsible Growth. This includes our commitment to attracting and developing exceptional talent, supporting our teammates’ physical, emotional, and financial wellness, recognizing and rewarding performance, and how we make an impact in the communities we serve.

At Bank of America, you can build a successful career with opportunities to learn, grow, and make an impact. Join us!

Job Description:
This job is responsible for partnering with engineering and technology teams to implement measures as prescribed by lead/senior SRE engineers. Key responsibilities include ensuring appropriate instrumentation, tooling, ticketing, alerting and on-call routines are in place for key services, identifying root causes of issues through production triage efforts, and suggesting code enhancements to technology teams to automate services and improve reliability and efficiency. Job expectations include using software development skills to improve efficiency and to address gaps in reliability.

Position Summary:

The Global Information Security Application Production Services (GIS APS) SWAT team is looking for a candidate to fill a role in Site Reliability Engineer. The candidate should have experience supporting business critical applications in an environment focused on information security.

Some responsibilities of the role include monitoring for and driving the resolution of incidents utilizing methodologies such as ITIL, data analysis through tools like Splunk or Dynatrace, and interacting with both engineering teams and clients to handle requests or issues.

To meet these responsibilities, the candidate should at least have working knowledge of operating systems (Windows and Linux/Unix), database (Oracle, MS SQL) and networking standards such as TCP/IP and SAML as well as an understanding of how Java and Middleware applications function.

Additionally, the candidate should exhibit a self-starting attitude towards driving various types of project work to completion. Some examples include the creation of and maintenance of dashboards, writing and updating technical documentation, and owning or assisting with the development of enhancements aimed at improving the environment.

Responsibilities:

  • Develops and maintains reliability scripts, tools and libraries and leverages them for common instrumentation, automation, and operational needs, and when mentoring Site Reliability Engineer (SRE) resources on reliability practices and established tools/capabilities.
  • Collaborates with Development and Infrastructure teams to understand technical solutions and implement monitoring capabilities outlined in the application and system monitoring designs put forward by the SRE Lead.
  • Partners to implement code changes to make use of common reliability libraries and tools and helps Application Production Services and Application Development teammates understand how to use them.
  • Identifies vulnerabilities and opportunities for reliability improvement, such as investigating low level error rates and 'noise' in monitoring, and defines solutions to reduce manual support effort and/or improve system reliability.
  • Engages as a subject matter expert in major incident triage efforts and failure scenario modelling and diagnosis with Problem Manager root causes for major incident/problem management investigations.
  • Participates regularly in an on-call rotation with Production Support teammates to learn more about reliability issues affecting their portfolio.

Required Qualifications:

  • Foundational knowledge of core ITIL processes such as the management of incidents, changes, and problems.
  • Should exhibit disciplined, process-driven, and results-oriented approach when providing support.
  • Comfortable in the Splunk environment – able to analyze logs, create/modify dashboards, and utilize reporting and alerting functionality.
  • Basic understanding of Federated IAM protocols such as SAML, OAuth, OpenID Connect, and FIDO2.
  • Able to understand and analyze HTTP traces/Wireshark captures.
  • Database/SQL knowledge - basic understanding of how a database functions and able to craft queries to pull data.
  • Working knowledge of both Unix and Windows Operating Systems.
  • Ability to understand and utilize various programming or scripting languages such as shell scripting, Perl, and PowerShell.
  • Practical knowledge of SSL/TLS cryptography and PKI.
  • Knowledge of LDAP and Active Directory services.

Desired Qualifications:

  • Strong knowledge and troubleshooting experiences in Windows, Linux, Oracle and MS SQL env platforms/environments.
  • Analytical skills and expertise in finding root causes and isolating complicated issues with various tools such as Splunk.
  • Knowledge around Multi-Factor Authentication, Single-Sign On, Password Management, and Passwordless Authentication (FIDO2) solutions.
  • Exposure to supporting Web Access Management solutions, such as Ping Access or CA SiteMinder.
  • Experience with Apache and IIS solutions.
  • Understanding of the OSI model.
  • Knowledge of the Software Development Life Cycle.
  • Familiarity and understanding of high-availability environments.

Skills:

  • Analytical Thinking
  • Automation
  • Collaboration
  • Production Support
  • Result Orientation
  • Application Development
  • Architecture
  • Influence
  • Project Management
  • Solution Design
  • Adaptability
  • DevOps Practices
  • Risk Management
  • Solution Delivery Process
  • Stakeholder Management

Shift:

1st shift (United States of America)

Hours Per Week:

40

Get your free, confidential resume review.
or drag and drop a PDF, DOC, DOCX, ODT, or PAGES file up to 5MB.