Enable job alerts via email!

Senior Software Engineer, Observability

Paymentus

Richmond Hill

On-site

CAD 100,000 - 120,000

Full time

9 days ago

Job summary

A leading payment solutions provider in York Region is looking for a Senior Observability & Performance Engineer. This role involves enhancing observability practices, defining performance metrics, and troubleshooting performance issues in Node.js and Java applications. The ideal candidate will have a Bachelor's degree in Computer Science, extensive experience in observability and performance, and strong analytical skills.

Qualifications

5+ years of development experience with Node.js and Java.
3+ years in Observability Engineer or similar roles focused on performance.
Experience with InfluxDB and Prometheus in production.

Responsibilities

Analyze metrics, logs, and traces from applications.
Define KPIs, SLIs, and SLOs for services.
Implement performance monitoring solutions.

Skills

Observability

Performance Engineering

Node.js

Java

Cloud Infrastructure

Analytical Skills

Collaboration

Education

Bachelor's degree in Computer Science or related field

Tools

InfluxDB

Prometheus

Grafana

Kubernetes

Overview

Senior Observability & Performance Engineer (Node.js & Java)

Our team is responsible for ensuring the reliability, performance, and scalability of our critical applications in a dynamic public cloud environment, focusing on visibility and health of our systems.

The Opportunity

We are seeking a highly skilled and proactive Senior Observability & Performance Engineer to join our team. You will gain deep insights into our existing Node.js and Java-based microservices, understand their instrumentation, and drive initiatives to measure and optimize performance. You will evolve our observability practices to proactively identify bottlenecks, improve system efficiency, and enhance the user experience.

What you will do

Deep Dive into Existing Codebases: Jump into existing Node.js and Java applications to understand how metrics, logs, and traces are generated and consumed.
Evaluate & Enhance Instrumentation: Assess the quality and completeness of observability data. Identify gaps and implement improvements to capture crucial performance metrics and contextual information (logs, traces).
Define & Implement Performance Metrics: Collaborate with development teams to define KPIs, SLIs, and SLOs for our applications and services.
Establish Performance Baselines & Monitoring: Implement robust monitoring and alerting solutions using tools like InfluxDB and Prometheus to track metrics, identify deviations, and detect performance degradations.
Performance Analysis & Root Cause Identification: Analyze performance data to identify bottlenecks, diagnose issues, and pinpoint root causes in distributed systems.
Capacity Planning & Optimization: Use performance insights to assist with capacity planning and recommend architectural or code changes for optimization and resource efficiency.
Troubleshooting & Incident Response: Support incident response by leveraging observability tools to quickly identify and troubleshoot production issues related to performance and reliability.
Collaboration & Knowledge Sharing: Work with Node.js and Java teams to evangelize observability best practices and guide instrumentation, fostering a performance-aware culture.
Tooling & Automation: Develop and maintain observability tools and automation to streamline data collection, analysis, and visualization.
Continuous Improvement: Research and evaluate new observability patterns, tools, and technologies to enhance monitoring capabilities.

What you will bring

Proven experience as an Observability Engineer, Performance Engineer, or SRE with a focus on system performance and monitoring.
Expertise with Node.js and Java ecosystems, including runtime characteristics, common performance pitfalls, and instrumentation best practices.
Hands-on experience with observability platforms and tools, specifically:
InfluxDB for time-series data storage and querying.
Prometheus for metrics collection and alerting.
Familiarity with Grafana (visualization), distributed tracing, and log management systems (e.g., ELK Stack) is highly desirable.
Solid understanding of performance testing methodologies (load, stress, scalability).
Experience with public cloud infrastructure (AWS, Azure, GCP) and cloud-native architectures (microservices, containers).
Familiarity with Kubernetes and container orchestration.
Ability to read and understand code to identify performance improvements.
Excellent analytical and problem-solving skills with a data-driven approach.
Strong communication and collaboration skills to work effectively with development and operations teams.
Proactive mindset with a passion for optimizing system performance and reliability.

Education & Experience

Bachelor's degree in Computer Science, Software Engineering, or a related technical field.
5+ years of development experience (Node.js and Java).
3+ years of progressive experience in roles such as Observability Engineer, Performance Engineer, Site Reliability Engineer (SRE), or similar with a focus on performance, monitoring, and reliability.
Demonstrated experience with deep dives into codebases (Node.js and Java), evaluating and enhancing instrumentation, and defining / implementing performance metrics.
Proven history of implementing and utilizing observability platforms and tools like InfluxDB and Prometheus in production.
Shell scripting experience.
Understanding of Linux operating systems.
Some AWS experience (Storage, Compute, Networking).
Strong troubleshooting skills.

Bonus points if you have

Experience with Infrastructure as Code (IaC) tools (Terraform, Salt).
Experience with large cloud projects.
AWS-specific knowledge of EC2, S3, VPC, Classic ELB / NLB / ALB, Lambda, CloudWatch, VPC, Transit Gateway.
Experience with chaos engineering principles.
Contributions to open-source observability projects.

Supervisory Responsibility

This role will not have any supervisory requirements.

Work Environment

This job operates in a professional office environment and may require use of standard office equipment.

Physical Demands

This role requires extended periods of sitting or standing at a computer workstation.

Position Type / Expected Hours of Work

This is a full-time position. Days and hours are Monday through Friday, with an on-call rotation (2 weeks primary, 2 weeks secondary) to provide 24/7 support during rotations, typically 4 of every 8 weeks.

Travel

Travel requirement is less than 5% and may vary based on business needs.

Other Duties

Please note this job description is not exhaustive. Duties and responsibilities may change at any time with or without notice.

EEO Statement

Paymentus is an equal opportunity employer. We do not discriminate on race, religion, color, age, sex, sexual orientation, national origin, citizenship status, or any other classification protected by law. Our management is dedicated to fair hiring, placement, promotion, transfer, demotion, and compensation practices.

Reasonable Accommodation

We provide reasonable accommodations to qualified applicants and employees with known disabilities, unless doing so would impose an undue hardship. Please discuss needs with Human Resources or your supervisor.

Get your free, confidential resume review.

or drag and drop a PDF, DOC, DOCX, ODT, or PAGES file up to 5MB.

Top cities

Top companies

Popular jobs

Senior Software Engineer, Observability

Paymentus

Richmond Hill

On-site

CAD 100,000 - 120,000