AVP, Platform SRE Engineer, SRE & Governance, Group Technology
DBS Bank Limited Singapore Apply now Posted 17 hours ago Permanent Competitive
Job Objective: DBS Bank is looking for a Platform SRE Engineer with experience working on enterprise level data engineering, analytics, and observability applications. The SRE engineer would be responsible for ensuring high availability of the platform services and perform continuous improvements to increase the platform's efficiency and resiliency. The SRE engineer will also perform automation development tasks to remove toil and increase the team's productivity.
Roles and Responsibilities:
Implement and administer Elastic Stack, Confluent Platform (Kafka), Prometheus, Grafana, NGINX.
Configure Elasticsearch index templates and data life cycle management ILM for data retention.
Develop monitoring, alerting solutions using Elastic Watcher and Kibana or Grafana.
Perform application maintenance, patching, upgrade Elastic stack, Confluent Kafka, Grafana, Prometheus, Nginx & other open APM tools.
Automate cluster management routine tasks and optimize processes using APIs and scripting, reducing manual effort, and improving efficiency.
Conduct performance tuning and capacity planning to ensure applications meet scalability and reliability requirements.
Design and develop data engineering pipelines.
Ability to multi-task and prioritize in a fast-paced, team-oriented environment.
Continuously review and enhance monitoring processes and methodologies to improve efficiency and effectiveness.
Identify strategic/tactical solutions and provides risk assessments and recommendations.
Collaborate with the Dev Leads to ensure that the dev team's needs are met through the CI/CD framework, component monitoring and stats, incident escalation, etc.
Develop code (Python, Shell scripting etc.) with quality, scalability, and extensibility.
Develop custom monitoring dashboards and reports to provide actionable insights and drive decision-making processes.
Contribute to internal knowledge bases, create documentation, and share insights with the team to promote a culture of learning and collaboration.
Deliverables:
Ensure on-time delivery of tasks and projects.
Ensure continuous uptime of applications and services.
Ensure no security or audit issues.
Job Dimensions:
Comply with bank standards to track and follow up on the assigned projects.
Cover all areas in application and infrastructure operations of the platform.
Education and Relevant Experience:
You should be a university graduate (computer science or related field) with good experience working with contemporary technologies and scripting languages.
Strong communication skills and ability to explain protocol and processes with team and management.
A passion for learning and using new technologies in the open source communities.
Requirements:
Min 6 years of total IT work experience.
Working knowledge of Grafana, Prometheus, Nginx, Elastic stack (Elasticsearch / Logstash / Kibana / Beats) including data ingestion, management, monitoring & analytics. Able to perform L1/L2 ELK related tasks.
In-depth experience in Unix/Linux/Shell/Python scripting.
Knowledgeable and experienced in SRE (Site Reliability Engineering) practices covering monitoring, observability, performance management, automation, and resiliency.
Good understanding of Network routing, Load balancing and Networking protocols; a base knowledge of TCP/IP, with an understanding of HTTP and DNS.
Ability to contribute to discussions on design and strategy.
Adequate knowledge of database systems (RDBMS, MariaDB, SQL, NOSQL), Object Oriented Programming and web application development.
Good problem diagnosis and creative problem-solving skills.
Experience in NodeJS, Spring boot could be a plus.
Experience in automation tools (e.g. Ansible) & DevOps pipelines would be a plus.
Knowledge and experience in Observability stack - AppDynamics, Dynatrace, APM tools & Open Telemetry is an added advantage.
Experience in architecting a highly resilient Confluent Kafka infrastructure and deep dive knowledge in Kafka.
Experience in developing CI/CD pipelines and tool sets like Bitbucket, Jenkins, JIRA.
Strong hands-on experience in Linux platform, containers, experience in configuring reverse proxies, SSL/TLS.
Strong, committed, and reliable team player, able to take direction but also willing to contribute to discussions on design and strategy.
Self-driven, committed, and reliable team player.
Apply Now We offer a competitive salary and benefits package and the professional advantages of a dynamic environment that supports your development and recognizes your achievement.