Enable job alerts via email!

Site Reliability Engineer

Entrust Datacard

Toronto

Hybrid

CAD 70,000 - 110,000

Full time

30+ days ago

Boost your interview chances

Create a job specific, tailored resume for higher success rate.

Job summary

Join a forward-thinking company as a Site Reliability Engineer, where you'll ensure the reliability and performance of a cutting-edge SaaS platform. This exciting role involves managing cloud environments, deploying automation strategies, and collaborating with development teams to enhance system security and efficiency. You'll have the opportunity to work in a hybrid environment, balancing in-office collaboration with remote flexibility. If you're passionate about technology and eager to make a significant impact in identity-centric security solutions, this role is perfect for you!

Benefits

Flexible working hours
Collaborative environment
Diversity and inclusion initiatives
Career growth opportunities

Qualifications

  • 5+ years in a related role with extensive experience in microservices.
  • Hands-on experience with DevOps tools and cloud solutions.

Responsibilities

  • Monitor system performance using various metrics and tools.
  • Collaborate with teams to identify and mitigate risks.

Skills

DevOps
Site Reliability Engineering
Cloud Computing
Microservices
Troubleshooting
Incident Management
Automation
Root Cause Analysis

Education

Bachelor’s Degree in Computer Science
Equivalent experience

Tools

Ansible
Terraform
Jenkins
Octopus deploy
Splunk
Prometheus
Grafana
Datadog
Azure
AWS

Job description

Career Growth, Flexibility and Collaboration!

Entrust is an innovative leader in identity-centric security solutions, providing an integrated platform of scalable, AI-enabled security offerings. Headquartered in Minnesota, we offer our colleagues the ability to work globally, in a flexible and collaborative environment. Our team makes an impact!!

The Company: Entrust relies on curious, dedicated and innovative individuals whom anticipate the future and provide solutions for a more connected, mobile and secure world. Entrust’s technologies and expertise help government agencies, enterprises and financial institutions in more than 150 countries serve and safeguard citizens, employees and consumers.

We Believe: Securing identities is most effective when we value all identities. We are committed to ensuring that, through diversity and inclusion, the many voices that make up our communities are heard. From unconscious bias training for managers to global affinity groups that create connections both within and across our enterprise, Entrust expects and encourages all individuals to accept and respect one another. And, of course, to be themselves.

Position Overview:The Instant Financial Issuance (IFI) Cloud Service includes a wide array of components including web services, application servers, and databases hosted in a Hybrid cloud environment. The Site Reliability Engineer (SRE) will be responsible for ensuring that the SaaS platform is reliable, available, and performant, as well as scalable, secure, and cost-effective. Ultimately, the individual will be responsible for the functional management of all the IFIaaS cloud environments, applications, networks, scoping projects, and the resolution of application and network issues.

Responsibilities:

  1. Monitor system issues using various metrics, such as uptime, latency, error rate, throughput, and availability
  2. Deploy and maintain monitoring and on-call tools i.e.: Splunk, Prometheus, Grafana, PagerDuty, Datadog, etc.
  3. Create strategies to detect issues, such as setting up alerts, dashboards, and health checks
  4. Address issues as they arise, using troubleshooting techniques, root cause analysis, and incident management.
  5. Design systems to troubleshoot automatically, using self-healing mechanisms, such as auto-scaling, load balancing, and failover, mitigation run books
  6. Collaborate with development teams and other stakeholders to identify potential risks, such as security vulnerabilities, performance bottlenecks, deployment issues, or configuration errors
  7. Implement various risk mitigation strategies, such as patching, backup, redundancy, encryption, or testing
  8. Design, build and maintain robust infrastructure built on Azure and AWS, leveraging native cloud technologies i.e. AKS, EKS, managed SQL, Mongo, etc.
  9. Define and follow a clear incident response process, which includes roles, responsibilities, escalation, communication, and resolution
  10. Use automation and orchestration tools to speed up the recovery process, such as restoring backups, rolling back changes, or deploying fixes
  11. Design, implement and maintain robust CI/CD pipelines to automate software delivery process
  12. Automate configuration management tasks across multiple servers in Hybrid cloud environments using tools like Ansible, Terraform, etc.
  13. Define IaC to provision and manage cloud resources in Hybrid environments (Azure, AWS, On-Prem) including complete lifecycle management scaling and decommissioning.
  14. Implement best practices and standards to prevent or reduce the occurrence of emergencies, such as code reviews, testing, and monitoring.
  15. Implement and support a hybrid cloud environment in Microsoft Azure and on-premise
  16. Update incident response run Books, automation and create new templates as required
  17. Manage activities with complete integrity and in accordance with the organization's policies, systems, practices, and programs
  18. Collaborate with product teams and other teams to understand the user needs, expectations, and satisfaction.
  19. Learn from incidents and post-mortems and implement the action items to prevent recurrence or improve response.
  20. Suggest and implement new solutions and technologies to enhance the system and the service, such as optimization, automation, or innovation.
  21. Provide after-hours support for production issues on rotational basis with other team members to ensure system availability 24/7/365.

Basic Qualifications:

  1. Bachelor’s Degree in Computer Science, Software Engineering, or equivalent combination of education and experience
  2. 5+ years of related experience as a Software Engineer, DevOps Engineer, Site Reliability Engineer or a role in similar capacity
  3. Extensive experience working with enterprise level micro-services applications, including deployment and maintenance of the applications in distributed environments.
  4. Demonstrated hands-on experience and expertise with DevOps tooling (Ansible, Terraform, Jenkins, Octopus deploy, etc.) networks, network security, high-level managerial skills
  5. In-Depth hands-on experience with on-prem and cloud compute, storage and networking solutions (vmWare, NetApp, Azure, AWS, etc)

Where you will be: This role is hybrid, requiring three days a week in-office at our offices in Ottawa, Canada or Denver, CO, as specified in the job description. At Entrust, we have a distributed workforce.

About Entrust:

Entrust keeps the world moving safely by enabling trusted identities, payments and data protection around the globe. Today more than ever, people demand seamless, secure experiences, whether they’re crossing borders, making a purchase, or accessing corporate networks. With our unmatched breadth of digital security and credential issuance solutions, it’s no wonder the world’s most entrusted organizations trust us.

For more information, visit www.entrust.com. Follow us on LinkedIn, Facebook, Instagram, and YouTube.

Entrust Corporation is an EOE/AA/Veteran/People with Disabilities employer.

Updated 9/14/2020

NO AGENCIES, NO RELOCATION

#LI-GR1

#ENT123

For US roles, or where applicable:

Entrust is an EEO/AA/Disabled/Veterans Employer

For Canadian roles, or where applicable:

Entrust values diversity and inclusion and we are committed to building a diverse workforce with wide perspectives and innovative ideas. We welcome applications from qualified individuals of all backgrounds, and we strive to provide an accessible experience for candidates of all abilities.

If you require an accommodation, contact accessibility@entrust.com.

Recruiter:

Grace Rusingiza Grace.Rusingiza@entrust.com

Get your free, confidential resume review.
or drag and drop a PDF, DOC, DOCX, ODT, or PAGES file up to 5MB.

Similar jobs

Senior Turbine Reliability Engineer

Ctrl

Toronto null

Remote

Remote

CAD 80,000 - 110,000

Full time

4 days ago
Be an early applicant

Senior System Safety Engineer

Aversan Inc

Toronto null

Remote

Remote

CAD 80,000 - 120,000

Full time

2 days ago
Be an early applicant

Senior System Safety Engineer

Aversan Inc

Toronto null

Remote

Remote

CAD 90,000 - 120,000

Full time

8 days ago

Remote - Principal Site Reliability Engineer

Dayforce

null null

Remote

Remote

CAD 83,000 - 150,000

Full time

Today
Be an early applicant

Sr Site Reliability Engineer

Notified

Toronto null

On-site

On-site

CAD 90,000 - 120,000

Full time

Yesterday
Be an early applicant

Site Reliability Engineer

Diversis Capital LLC

null null

Remote

Remote

CAD 90,000 - 130,000

Full time

2 days ago
Be an early applicant

Site Reliability Engineer III

Guidewire Software

null null

Remote

Remote

CAD 90,000 - 130,000

Full time

3 days ago
Be an early applicant

Senior Site Reliability Engineer

TripStack Inc.

Toronto null

On-site

On-site

CAD 109,000 - 119,000

Full time

4 days ago
Be an early applicant

Site Reliability Engineer

OneStudyTeam

Toronto null

On-site

On-site

CAD 90,000 - 130,000

Full time

Today
Be an early applicant