Enable job alerts via email!

Staff Cloud Operations Engineer – Monitoring Lead (9810)

Extreme Networks

United States

Remote

USD 120,000 - 180,000

Full time

3 days ago
Be an early applicant

Boost your interview chances

Create a job specific, tailored resume for higher success rate.

Job summary

Extreme Networks is seeking a highly skilled Staff Cloud Operations Engineer – Monitoring Lead to drive optimization of monitoring and alerting strategies across cloud infrastructures. This role demands expertise in major cloud platforms and monitoring tools, as well as the ability to solve complex problems collaboratively across distributed teams.

Qualifications

  • 8+ years in Cloud Operations, DevOps, or Site Reliability Engineering roles.
  • Deep expertise with public cloud platforms (AWS, Azure, GCP).
  • Proven experience as a technical lead in a monitoring-focused role.

Responsibilities

  • Design and implement monitoring strategies for cloud infrastructure.
  • Evaluate and integrate various monitoring tools.
  • Provide 24/7 support for cloud services.

Skills

Cloud Operations
Monitoring
Problem-solving
Analytical skills
Troubleshooting
Collaboration

Education

BS in Computer Science or Engineering

Tools

Prometheus
Grafana
Datadog
Splunk
Elasticsearch
Kubernetes
Docker

Job description

There has never been a better time to join Extreme, with several acquisitions extending our portfolio and go to market strategy, we have seen enormous opportunity and growth within the region.

Aside from being a Technology Leader in the Gartner Magic Quadrant, we also adamantly promote an internal culture that truly embraces diversity, inclusion, and equality in the workplace. Having Diversity and Inclusion as part of our core values and beliefs, we’re proud to foster an environment where every Extreme employee can thrive because of their differences, not despite them.

Staff Cloud Operations Engineer – Monitoring Lead

We are seeking a highly skilled and experienced Staff Cloud Operations Engineer – Monitoring Lead to join our growing Cloud Operations team. In this critical role, you will be responsible for designing, implementing, and optimizing our comprehensive monitoring and alerting strategy across our cloud infrastructure and applications. You will drive proactive identification of issues, ensure system health, and contribute significantly to our operational excellence and reliability goals. We're looking for the best and the brightest 'A' players who want to make a difference doing a job they love.


  • Lead the design, implementation, and continuous improvement of our end-to-end monitoring and alerting framework for cloud infrastructure (AWS, Azure, GCP), applications, and services.
  • Define key performance indicators (KPIs), service level indicators (SLIs), and service level objectives (SLOs) for critical systems.
  • Evaluate, select, and integrate monitoring tools (e.g., Prometheus, Grafana, Datadog, Splunk, CloudWatch, Azure Monitor, GCP Operations Suite) to meet evolving needs.
  • Develop and implement automation scripts and tools (e.g., Python, Bash, PowerShell) to streamline monitoring deployment, configuration, and incident remediation.
  • Build and maintain dashboards, alerts, and reports that provide actionable insights into system performance, health, and availability.
  • Analyze monitoring data to identify performance bottlenecks, resource inefficiencies, and potential cost optimization opportunities.
  • Collaborate with engineering teams to implement performance improvements and cost-saving measures.
  • Create and maintain comprehensive documentation for monitoring systems, procedures, and best practices.
  • Proactively identify areas for improvement in our cloud operations and monitoring capabilities.
  • Provide 24* 7 support for Cloud services
  • Participate in cloud security and compliance implementation.
Ideal Qualifications:
  • BS level technical degree required; Computer Science or Engineering background preferred.
  • 8+ years of progressive experience in Cloud Operations, DevOps, or Site Reliability Engineering roles, with a strong focus on monitoring.
  • Deep expertise with at least one major public cloud platform (AWS, Azure, or Google Cloud Platform).
  • Proven experience as a technical lead or senior contributor in a monitoring-focused role.
  • Working knowledge of container-based architecture and deployment (Docker, Kubernetes.)
  • Extensive experience with various monitoring and observability tools (e.g., Prometheus, Grafana, Datadog, Splunk, ELK Stack, vendor-specific monitoring solutions).
  • Excellent problem-solving, analytical, and troubleshooting skills.
  • Working knowledge of Elasticsearch, PostgreSQL, Redis, Ignite, Kafka and RabbitMQ.
  • Comfortable working within a distributed team located in multiple time zones.

Get your free, confidential resume review.
or drag and drop a PDF, DOC, DOCX, ODT, or PAGES file up to 5MB.

Similar jobs

VP Financial Crimes Policy Adherence AML Transaction Monitoring Lead Analyst - Retail Bank

Citi

Remote

USD 107.000 - 161.000

6 days ago
Be an early applicant

Staff Data Platform Engineer - (Remote - US)

Jobgether

Remote

USD 120.000 - 160.000

6 days ago
Be an early applicant

Lead Monitoring & Observability Engineer – AWS & APM Tools

Fannie Mae

Great Falls Crossing

Remote

USD 138.000 - 180.000

6 days ago
Be an early applicant

Monitoring Specialist

VARITE INC

Remote

USD 100.000 - 140.000

3 days ago
Be an early applicant

Sr Staff Security Operations Engineer (REMOTE)

GEICO

Chevy Chase

Remote

USD 130.000 - 260.000

6 days ago
Be an early applicant

Director, Monitoring & Site Management

Alimentiv

Raleigh

Remote

USD 126.000 - 211.000

Today
Be an early applicant

Sr Controls Assurance & Monitoring Manager (Remote)

SouthState Bank

Winter Haven

Remote

USD 130.000 - 200.000

Yesterday
Be an early applicant

Application Engineer | Application Installer

SPS Commerce

Minneapolis

Remote

USD 83.000 - 125.000

6 days ago
Be an early applicant

Security Specialist - Monitoring & Detection (f/m/d)

Deutsche Börse Group

Remote

USD 90.000 - 130.000

6 days ago
Be an early applicant