Overview
Senior Observability & Performance Engineer (Node.js & Java)
Our team is responsible for ensuring the reliability, performance, and scalability of our critical applications in a dynamic public cloud environment, focusing on visibility and health of our systems.
The Opportunity
We are seeking a highly skilled and proactive Senior Observability & Performance Engineer to join our team. You will gain deep insights into our existing Node.js and Java-based microservices, understand their instrumentation, and drive initiatives to measure and optimize performance. You will evolve our observability practices to proactively identify bottlenecks, improve system efficiency, and enhance the user experience.
What you will do
- Deep Dive into Existing Codebases: Jump into existing Node.js and Java applications to understand how metrics, logs, and traces are generated and consumed.
- Evaluate & Enhance Instrumentation: Assess the quality and completeness of observability data. Identify gaps and implement improvements to capture crucial performance metrics and contextual information (logs, traces).
- Define & Implement Performance Metrics: Collaborate with development teams to define KPIs, SLIs, and SLOs for our applications and services.
- Establish Performance Baselines & Monitoring: Implement robust monitoring and alerting solutions using tools like InfluxDB and Prometheus to track metrics, identify deviations, and detect performance degradations.
- Performance Analysis & Root Cause Identification: Analyze performance data to identify bottlenecks, diagnose issues, and pinpoint root causes in distributed systems.
- Capacity Planning & Optimization: Use performance insights to assist with capacity planning and recommend architectural or code changes for optimization and resource efficiency.
- Troubleshooting & Incident Response: Support incident response by leveraging observability tools to quickly identify and troubleshoot production issues related to performance and reliability.
- Collaboration & Knowledge Sharing: Work with Node.js and Java teams to evangelize observability best practices and guide instrumentation, fostering a performance-aware culture.
- Tooling & Automation: Develop and maintain observability tools and automation to streamline data collection, analysis, and visualization.
- Continuous Improvement: Research and evaluate new observability patterns, tools, and technologies to enhance monitoring capabilities.
What you will bring
- Proven experience as an Observability Engineer, Performance Engineer, or SRE with a focus on system performance and monitoring.
- Expertise with Node.js and Java ecosystems, including runtime characteristics, common performance pitfalls, and instrumentation best practices.
- Hands-on experience with observability platforms and tools, specifically:
- InfluxDB for time-series data storage and querying.
- Prometheus for metrics collection and alerting.
- Familiarity with Grafana (visualization), distributed tracing, and log management systems (e.g., ELK Stack) is highly desirable.
- Solid understanding of performance testing methodologies (load, stress, scalability).
- Experience with public cloud infrastructure (AWS, Azure, GCP) and cloud-native architectures (microservices, containers).
- Familiarity with Kubernetes and container orchestration.
- Ability to read and understand code to identify performance improvements.
- Excellent analytical and problem-solving skills with a data-driven approach.
- Strong communication and collaboration skills to work effectively with development and operations teams.
- Proactive mindset with a passion for optimizing system performance and reliability.
Education & Experience
- Bachelor's degree in Computer Science, Software Engineering, or a related technical field.
- 5+ years of development experience (Node.js and Java).
- 3+ years of progressive experience in roles such as Observability Engineer, Performance Engineer, Site Reliability Engineer (SRE), or similar with a focus on performance, monitoring, and reliability.
- Demonstrated experience with deep dives into codebases (Node.js and Java), evaluating and enhancing instrumentation, and defining / implementing performance metrics.
- Proven history of implementing and utilizing observability platforms and tools like InfluxDB and Prometheus in production.
- Shell scripting experience.
- Understanding of Linux operating systems.
- Some AWS experience (Storage, Compute, Networking).
- Strong troubleshooting skills.
Bonus points if you have
- Experience with Infrastructure as Code (IaC) tools (Terraform, Salt).
- Experience with large cloud projects.
- AWS-specific knowledge of EC2, S3, VPC, Classic ELB / NLB / ALB, Lambda, CloudWatch, VPC, Transit Gateway.
- Experience with chaos engineering principles.
- Contributions to open-source observability projects.
Supervisory Responsibility
This role will not have any supervisory requirements.
Work Environment
This job operates in a professional office environment and may require use of standard office equipment.
Physical Demands
This role requires extended periods of sitting or standing at a computer workstation.
Position Type / Expected Hours of Work
This is a full-time position. Days and hours are Monday through Friday, with an on-call rotation (2 weeks primary, 2 weeks secondary) to provide 24/7 support during rotations, typically 4 of every 8 weeks.
Travel
Travel requirement is less than 5% and may vary based on business needs.
Other Duties
Please note this job description is not exhaustive. Duties and responsibilities may change at any time with or without notice.
EEO Statement
Paymentus is an equal opportunity employer. We do not discriminate on race, religion, color, age, sex, sexual orientation, national origin, citizenship status, or any other classification protected by law. Our management is dedicated to fair hiring, placement, promotion, transfer, demotion, and compensation practices.
Reasonable Accommodation
We provide reasonable accommodations to qualified applicants and employees with known disabilities, unless doing so would impose an undue hardship. Please discuss needs with Human Resources or your supervisor.