Site Reliability Engineer with Java background
Site Reliability Engineer with Java background
5 days ago Be among the first 25 applicants
Direct message the job poster from Compunnel Inc.
Manager- Recruitment @ Compunnel Inc. | HR Business Partnering
Job Title: Site Reliability Engineer with Java and Microservices
Location: Toronto, ON (Onsite)
Duration: Contract (Long term)
Job Description:
Responsibilities:
- Work in collaboration with Application Development, Quality, Product and Data Engineering teams to champion SRE/DevOps culture and practices.
- Develop strategic objectives to improve service and product availability, performance, incident MTTR, change success rate, and ensure feedback loops to development teams.
- Build and maintain reliable systems and platforms using SRE and DevSecOps principles, focusing on observability, resiliency, self-healing, and reliability testing.
- Collaborate with application and business teams to establish SLO/SLI, create dashboards for various views to track value, and enable effective decision-making.
- Apply innovative reliability approaches from architecture to operations, following agile methodologies.
- Stay updated on the latest trends in observability, automation, platform technology, including AIOps and MLOps for reliability and resiliency.
- Address toil from inception through operations by leveraging sense and response, advanced monitoring, and automation tools.
- Lead or participate in communities of practice to foster collaboration, set objectives, and lead initiatives.
Qualifications:
- Deep knowledge and experience in observability, toil management, monitoring tools (Dynatrace, CloudWatch, Azure Monitor), resilient architecture, IaC, CaC, JSON, Typescript, API and webhook development using Python, Node.js, Ruby, PowerShell, Shell scripting.
- Expertise in Dynatrace features (DT Guardian, RUM, synthetic testing, AI event correlation).
- Experience with log ingestion (AWS Firehose, Dynatrace Open Pipeline), reporting, dashboards, and operational analytics.
- Automation skills using Ansible Tower, AWS SSM, BitBucket/GitHub to streamline deployment and response.
- Knowledge of cloud orchestration tools (AWS Step Functions, containers, Apache Airflow) focusing on data processing pipelines.
- Strong understanding of data management, data warehouses, lakes, and database reliability (RedShift, RDS, Aurora, PostgreSQL, SQL Server, Oracle).
- Excellent problem-solving, communication, and team leadership skills, with a focus on diversity and inclusion.
Seniority level
Employment type
Job function
Industries
- IT Services and IT Consulting