Enable job alerts via email!

Technical Architect - Monitoring, Azure Cloud Administration, (Azure IAAS, PAAS) Virtual machin[...]

Blue Yonder

United States

Remote

USD 80,000 - 120,000

Full time

25 days ago

Boost your interview chances

Create a job specific, tailored resume for higher success rate.

Job summary

An established industry player is seeking a Monitoring Specialist to enhance their monitoring capabilities and ensure operational readiness. In this role, you will be responsible for setting up and maintaining a centralized monitoring configuration, transforming traditional monitoring methods into advanced alert systems, and implementing synthetic monitoring using Python and Selenium. You will collaborate with cross-functional teams to create dashboards and establish telemetry for automated data collection. This is a fantastic opportunity to work in a dynamic environment where your contributions will directly impact customer experience and operational efficiency.

Qualifications

  • 10-12 years of IT experience with 6 years in Monitoring.
  • Hands-on experience with monitoring tools and cloud platforms.

Responsibilities

  • Ensure effective monitoring framework and enhance pro-active identification.
  • Create and implement synthetic monitoring for customer experience.

Skills

Monitoring Tool Administration
Logs Indexing & pipeline
Azure
VMWare
Ansible
Python
Selenium
Terraform
Shell
Windows
Linux
GROK parsing
Problem-solving skills
Communication skills
Time management
Team collaboration

Education

Associate’s degree in Computer Science

Tools

AppDynamics
SolarWinds
Grafana
Zabbix
DataDog
ELK Stack

Job description

Key Accountability

  • Monitoring Effectiveness – Ensuring the monitoring framework and enhancements are setup to increase Pro-active identification & resolution prior to customer impact.
  • Setup & maintain centralized Monitoring Configuration by code
  • Consistently drive the alert volume down and eliminate false alerts
  • Setup advanced monitoring alerts for golden signals i.e. Latency, Errors, Throughputs & Saturation.
  • Transform from traditional CPU, Memory symptomatic monitors to more advanced alert co-relation pinpointing directly to issues for predictive monitoring
  • Create & implement Synthetic or End User Monitoring using Python, Selenium for customer experience monitoring
  • Set up API End point monitoring & measure uptime & availability across customers, products & infrastructure endpoints.
  • Implement SLOs, SLIs, Error Budgets concepts to measure & setup Maturity model
  • Maintain & Manage Code Repository built to scale and security measures
  • Leverage Automation to push changes on monitoring tools
  • Setup Orchestration mechanism for on-boarding & decommissioning to ensure Operational Readiness
  • Setup Dashboards & Create visibility across all Cross-functional teams
  • Establish Telemetry for automated collection of data across Metrics, Logs & Traces
  • Continuous Analysis on Data to acknowledge gaps and implementing improvements

Minimum Requirements

  • Associate’s degree (or equivalent) in Computer Science; Information Technology or related field preferred
  • 10-12 years of IT experience with 6 years of Monitoring Experience
  • Experience in Administrating Monitoring Tools – AppDynamics, SolarWinds, Grafana, Zabbix, DataDog, ELK Stack etc.
  • Hands-on experience on Logs, Metrics, Traces, Parsing, RegEx, Tagging
  • Hands-on experience on implementing APM, EUM, Synthetics, API endpoint etc.
  • Hands-on experience on integrations with ITSM tools such as Service Now & Jira
  • Hands-on experience on Ansible, Python, Selenium, Shell
  • Hands-on experience on Enterprise scale of Azure, VM Ware & AWS
  • Hands-on experience on creating dashboards and analysis
  • Excellent interpersonal, influencing skills, interacting appropriately with colleagues of many technical skill levels, remaining calm and courteous while working in a high-stress situation to resolve problems.

Skills:

  • Technical Skills: Monitoring Tool Administration, Logs Indexing & pipeline, Azure, VMWare, Ansible, Python, Selenium, Terraform, Shell, Windows, Linux, GROK parsing
  • Problem-solving skills – should be able to devise technical and creative solutions. Use Analytics to understand pattern and pro-actively identify gaps
  • Communication skills – Effective communication is key in this role to gather data about problems, prepare detailed notes and reports, and update users with further steps
  • Time management – Need to maintain excellent time management skills and should be able to set priorities when handling multiple cases.
  • Team collaboration – To routinely work with other functions to resolve user issues, so they need to successfully collaborate with team members and coworkers.
  • Highly motivated, hands-on personality.
  • Ability to learn quickly in a challenging environment.

Our Values

If you want to know the heart of a company, take a look at their values. Ours unite us. They are what drive our success – and the success of our customers. Does your heart beat like ours? Find out here: Core Values

All qualified applicants will receive consideration for employment without regard to race, color, religion, sex, sexual orientation, gender identity, national origin, disability or protected veteran status.

Get your free, confidential resume review.
or drag and drop a PDF, DOC, DOCX, ODT, or PAGES file up to 5MB.