About the role
We’re looking for a Senior Software Quality Engineer to own test strategy end-to-end for backend services. You’ll build scalable automation and performance frameworks, integrate them into CI/CD, and validate resiliency and operational readiness across AWS/Azure environments. You’ll partner closely with engineering, SRE, and product to enable fast, reliable releases.
Key responsibilities
Strategy and planning
- Own test strategy, planning, and estimation for services and programs
- Define quality gates, risk-based coverage, and release-readiness criteria
Automation and quality engineering
- Design and maintain unified automation frameworks (Java, Cucumber, Robot Framework)
- Build API and integration tests (Postman), reduce flakiness, and improve maintainability
- Standardize builds (Gradle) and containerize test tooling (Docker)
Performance engineering
- Design, execute, and analyze load/stress/soak tests with Gatling
- Model realistic workloads, establish SLOs, and provide tuning recommendations
- Track throughput, latency (P95/P99), error budgets, and capacity signals
Resilience and operational readiness
- Run chaos tests with Litmus; validate failure handling, timeouts, and fallbacks
- Verify backup/restore and disaster recovery objectives (RTO/RPO)
- Lead game-days and resilience drills; document runbooks and playbooks
Observability and feedback loops
- Instrument and monitor with Prometheus, Grafana, and New Relic
- Wire test results and service telemetry into dashboards and alerts
- Enable data-driven go/no-go decisions with objective quality signals
CI/CD and DevOps integration
- Integrate tests into pipelines (Git/GitHub), enforce quality gates, and parallelize execution
- Support trunk-based development, shift-left checks, and stable environments
Collaboration and enablement
- Partner with developers, SRE, and product to triage, root-cause, and prevent defects
- Mentor engineers on testing best practices and reliability-first design
- Contribute to documentation, standards, and continuous improvement
What we expect from the candidate (must-haves)
- 10+ years in Quality Engineering/SDET roles focused on backend or platform services
- Strong coding with Java and hands-on automation using Cucumber and/or Robot Framework
- Proven experience building CI/CD-integrated test frameworks (Git/GitHub, Gradle, Docker)
- Performance testing expertise with Gatling (workload design, analysis, recommendations)
- Chaos and resilience testing experience (Litmus) and operational readiness validation
- Observability: Prometheus/Grafana/New Relic for metrics, dashboards, SLOs, and alerting
- API testing experience (Postman), strong understanding of REST and common integration patterns
- Cloud experience with AWS and/or Azure
- Solid grasp of testing strategy: functional, integration, system, and non-functional
- Excellent communication, critical thinking, and cross-functional collaboration
Nice to have
- Hercules or similar performance harness tooling
- Experience with Azure DevOps, GitHub Actions, or Jenkins (pipelines and environments)
- Contract testing, service virtualization, or test containers
- Kubernetes familiarity (Litmus typically runs on K8s), IaC basics (e.g., Terraform)
- Domain knowledge in banking/fintech, compliance-minded testing
Success metrics you’ll influence
- Reduced test cycle time and flakiness rate; improved pipeline pass rate
- Meaningful automation coverage aligned to business risk
- Measurable improvements in P95/P99 latency and error budgets
- Fewer escaped defects and faster MTTD/MTTR via actionable telemetry
- Consistent, auditable release-readiness signals
First 90 days
- 0–30: Onboard, baseline current coverage and performance; ship quick wins in CI gating
- 31–60: Deliver Gatling suites and dashboards (Prometheus/Grafana/New Relic); standardize framework patterns
- 61–90: Run first chaos game-day; validate backup/restore; publish reliability playbooks; measure impact
Tech stack you’ll use
- Languages/Frameworks: Java, Cucumber, Robot Framework
- Performance/Resilience: Gatling, Litmus, Hercules (nice to have)
- API/Tools: Postman, Git/GitHub, Gradle, Docker
- Observability: Prometheus, Grafana, New Relic
- Cloud: AWS, Azure