Overview
Role Name: Mainframe Performance and Capacity Management Engineer
Location: Montreal, QC
Location note: Remote role (Anywhere in Canada, but needs to work EST hours)
Responsibilities
Performance Management and Capacity Planning
- Monitor real-time z/OS system health and performance across CPU, memory, DASD, and WLM-managed workloads, using tools including RMF, SmartIS, IzPCA, MICS, and other internal tools. Analyze performance data to identify trends, bottlenecks, and potential issues.
- Detect, troubleshoot, and resolve resource anomalies, workload misbehaviors, and degradation risks in production systems. Partner with incident response teams to resolve performance issues quickly and accurately.
- Develop and implement performance tuning strategies by recommending changes to service definitions, dispatching priorities, and workload placement.
- Contribute to capacity planning by forecasting and modeling workload resource demand and capacity requirements.
- Support cost modeling, vendor reporting (SCRT), infrastructure sizing, and resource optimization efforts.
Data Analysis
- Collect and analyze system performance data to generate reports and dashboards.
- Identify key performance indicators (KPIs) and develop metrics to track system performance.
- Visualize, summarize and present data findings, recommendations, and methodology to senior leadership, department leadership and enterprise stakeholders (technical and non-technical stakeholders).
Collaboration and Communication
- Work closely with cross-functional teams, including operations, development, and infrastructure teams.
- Provide technical support and guidance to team members and stakeholders.
- Participate in on-call rotations and provide timely responses to performance and observability issues.
- Participate in migration of performance / capacity tooling to Git change management and DevOps deployment pipelines.
Experience
- Bachelor’s degree in information systems, Mathematics, Finance or another quantitative or related subject.
- Years of mainframe systems experience with proficiency in performance management for large, multi-processor, multi-LPAR, Parallel Sysplex environments utilizing z/OS.
- Proven experience in mainframe performance monitoring, observability, capacity management, and data analysis.
- Proven experience resolving systems performance problems in real-time via adjustments to WLM and batch initiators.
- Strong understanding of SMF / RMF data.
- Proficiency in REXX / Python, Job Control Language (JCL) & DB2.
- Strong understanding of Batch Processing and Job Scheduling.
- Advanced user of MS Excel (Charts, Pivot tables, VLOOKUPs, PowerPivot) and PowerPoint for data visualization.
- Experience with mainframe monitoring tools and performance tuning techniques.
- Experience working with large highly transactional datasets to draw insights and create organizational value.
- Experience with DevOps is a plus.
- Experience working with ADABAS is a plus.
Qualifications
- Strong analytical, problem solving and strategic thinking skills including the ability to prioritize.
- Proficiency in data analysis and creation of dashboards and various visualizations that will provide actionable insights for operations, engineering and management decision making.
- Working knowledge of MS Office suite for management reporting using both PowerPoint and Excel analytical functions.
- Comfortable taking information from disparate systems to bring data elements together for meaningful insights.
- Analyze and solve business problems at their root, stepping back to understand the broader context.
- Excellent written and verbal communication skills.
- Ability to work independently and as part of a team.
- Detail-oriented with strong organizational skills.
- Deep expertise with SMF / RMF data, WLM service definitions, and z/OS workload behavior.
- IBM SCRT Reporting (plus).
- MICS Product & Reporting experience (huge plus).
- iZPCA Product experience (plus).
- Exhibit good time management skills, independent thinking and decision-making capabilities.
- High sense of urgency and ability to prioritize competing requirements in a fast-paced environment.