Site Reliability Engineer
Cynet systems Inc
Toronto
On-site
CAD 90,000 - 115,000
Full time
Job summary
A leading technology solutions provider in Toronto is seeking an experienced Site Reliability Engineer (SRE) to track, implement technical work, and support applications. The ideal candidate must have a Bachelor's degree in a relevant field and 4–5 years of experience in SRE with strong scripting skills in Python and YAML. This position oversees monitoring, production support, and drives continuous innovation in automation processes.
Qualifications
- 4–5 years of experience in SRE or related field.
- Strong hands-on experience with Python, YAML, Shell scripting.
- Experience performing production support including off-hours support.
Responsibilities
- Track, audit, monitor, and implement technical work streams.
- Develop SRE solutions such as monitoring, alerting, machine learning anomaly detection.
- Perform production support including off-hours support and rotational on-call responsibilities.
- Provide consultation on product builds to other teams within the enterprise.
Skills
SRE practices and technologies
Python
YAML
Shell scripting
Azure
Linux
Dynatrace
Prometheus
Moog
Elastic
Chaos Engineering
Ansible
Kafka
Education
Bachelor’s degree in Computer Science, Mathematics, Engineering, Physics
Tools
Dynatrace
Azure Monitor
Catchpoint
Responsibilities
- Track, audit, monitor, and implement technical work streams.
- Act as portfolio SME, documenting common components, core functionalities, and infrastructure of supported applications.
- Serve as an escalation point in the on-call rotation, supporting maintenance, scheduled work, and release deployments.
- Lead incident management and problem management activities, owning RCA action items.
- Drive continuous improvement in productivity, monitoring, tooling, and technical standards.
- Manage technology currency (server patching, certificate renewals, compliance) with a focus on automation opportunities.
- Apply industry-leading technical solutions to meet organizational needs.
- Collaborate across units, departments, and enterprise-wide teams to deliver better solutions.
Engineering
- Develop SRE solutions such as monitoring, alerting, machine learning anomaly detection, self-healing, and reliability testing.
- Apply design thinking and an agile mindset in collaboration with SREs, Scrum Masters, and Incident Leads.
- Contribute to and leverage best practices in SRE.
- Build repeatable automation solutions to simplify manual tasks.
- Support automation adoption for applications in scope.
Production Support
- Perform production support, including off-hours support and rotational on-call responsibilities.
- Assist in incident and problem management for applications in scope.
- Continuously evaluate incidents to identify improvements and prevent recurrence.
- Maintain technology currency with focus on automation.
- Ensure availability and uptime of applications in scope per service level objectives.
- Ensure compliance of systems and applications, maintaining segregation of duties.
Technical Consultation
- Support initiatives outside of application or squad-level scope.
- Provide consultation on product builds to other teams within the enterprise.
Innovation and Learning
- Stay updated on technology changes and continuously learn through training and self-study.
- Provide demos of new technology findings to the team.
Must Have
- Bachelor’s degree in Computer Science, Mathematics, Engineering, Physics, or related technical field, or equivalent practical experience.
- 4–5 years of experience in SRE or related field.
- Advanced knowledge of SRE practices and technologies.
- Strong hands-on experience with Python, YAML, Shell scripting.
- Azure, Linux.
- Dynatrace, Prometheus, PagerDuty, Moog, Client, Elastic, Azure Monitor.
- Chaos Engineering.
- MQ, Kafka.
- Ansible, Azure Automation, Catchpoint.
- Experience performing production support including off-hours support.
Good to Have
- Dynatrace – Less than 1 year.
- Kafka – Less than 1 year.
- Network programming (Perl, Python, Java, etc.) – Less than 1 year.
- Microsoft Azure – Less than 1 year.