Job Description:- Coordinate system refreshes, restore tests, and DR failovers, especially for Epic or other mission‐critical applications.
- Own P1/P2 escalations when L2 cannot resolve and lead major incident war rooms (root‐cause analysis, post‐incident reviews).
Azure Policies, Security and Network:
- Enforce Azure Policies and RBAC; manage vulnerability scans (Microsoft Defender, or other tools), patching any discovered weaknesses.
- Update and maintain firewall rules, NSGs, or other network security baselines.
Migrations and Decommissioning:
- Oversee more complex migrations between environments or Azure regions (sometimes involving re‐platforming or re‐architecting).
- Perform advanced data snapshot validations and coordinate system retirement/decommission tasks.
Resiliency Testing and Audits:
- Plan and execute chaos engineering exercises, including rollback or failback scenarios.
- Conduct regular security audits, ensuring that any deviations from compliance are documented and remediated.
Mentoring and Training Responsibilities:
- Deliver quarterly training sessions to L2 on new tools, processes, or changes to the environment.
- Act as final escalation point for complex or unknown technical issues.
Skills Set:
- Deep knowledge of Azure services, including network configuration, VM management, storage, AD, DNS, and security controls.
- Ability to architect and troubleshoot large, complex environments—both manually and with automated tools.
- Strong scripting or automation capabilities (PowerShell, Azure CLI) for large‐scale patching or configuration updates.
- Experience in incident management, root‐cause analysis, and producing post‐mortem reviews.
- Familiarity with Epic implementations (on-prem / cloud).