Enable job alerts via email!

Principal Site Reliability Engineer (Wildfire Cloud Infrastructure) Santa Clara, California, Un[...]

Palo Alto Networks, Inc.

Santa Clara (CA)

On-site

USD 160,000 - 225,000

Full time

15 days ago

Boost your interview chances

Create a job specific, tailored resume for higher success rate.

Job summary

A leading company in cybersecurity is seeking a Site Reliability Engineer to enhance its hybrid infrastructure. The role focuses on automation, security, and reliability of services, requiring a mix of technical skills in Kubernetes, Python, and cloud solutions. The successful candidate will work closely with developers and data scientists to ensure production readiness and performance, while also mentoring others in SRE culture.

Benefits

FLEXBenefits wellbeing spending account
Mental and financial health resources
Personalized learning opportunities

Qualifications

  • Expertise in managing applications in Kubernetes clusters.
  • Experience in Production Engineering or Site Reliability.
  • Ability to diagnose complex distributed systems.

Responsibilities

  • Contribute to SRE and DevOps success.
  • Design and operate reliable Cloud infrastructure.
  • Participate in on-call rotation and lead root cause analysis.

Skills

Configuration management
Python
Go
Linux administration
Network troubleshooting
CI/CD pipelines

Education

BS or MS in Computer Science

Tools

Kubernetes
Terraform
Ansible
Docker

Job description

Our Mission

At Palo Alto Networks everything starts and ends with our mission:

Being the cybersecurity partner of choice, protecting our digital way of life.
Our vision is a world where each day is safer and more secure than the one before. We are a company built on the foundation of challenging and disrupting the way things are done, and we’re looking for innovators who are as committed to shaping the future of cybersecurity as we are.

Who We Are

We take our mission of protecting the digital way of life seriously. We are relentless in protecting our customers and we believe that the unique ideas of every member of our team contributes to our collective success. Our values were crowdsourced by employees and are brought to life through each of us everyday - from disruptive innovation and collaboration, to execution. From showing up for each other with integrity to creating an environment where we all feel included.

As a member of our team, you will be shaping the future of cybersecurity. We work fast, value ongoing learning, and we respect each employee as a unique individual. Knowing we all have different needs, our development and personal wellbeing programs are designed to give you choice in how you are supported. This includes our FLEXBenefits wellbeing spending account with over 1,000 eligible items selected by employees, our mental and financial health resources, and our personalized learning opportunities - just to name a few!

At Palo Alto Networks, we believe in the power of collaboration and value in-person interactions. This is why our employees generally work full time from our office with flexibility offered where needed. This setup fosters casual conversations, problem-solving, and trusted relationships. Our goal is to create an environment where we all win with precision.

Your Career

Palo Alto Networks runs a large hybrid infrastructure and is one of the largest GCP customers. As a Site Reliability Engineer, you will be part of a team supporting the services running on this infrastructure. This includes automation, architecture, performance, metrics, troubleshooting, security, and reliability.

Our stack includes Kubernetes, Docker, GCP, AWS, Ansible, Terraform, Vault, Gitlab, Spinnaker, Pub/sub, Bigtable, Memorystore, Bigquery, RabbitMq, Kafka, MySQL, Python, and Go. We don’t expect you to know all these, but we do expect you to learn the ones needed for this role.

Your Impact

  • Contribute to the success of SRE and DevOps
  • Develop expertise in new technologies
  • Work with developers, researchers, data scientists, and security experts
  • Design, build, and operate reliable, secure Cloud infrastructure
  • Ensure that applications are production-ready, scalable, and reliable
  • Develop tools and automation frameworks
  • Automate robust deployment of robust services
  • Orchestrate end-to-end monitoring and alerting
  • Participate with SRE and Dev teams in the on-call rotation
  • Lead root cause analysis of critical business and production issues
  • Mentor and champion SRE culture
  • Participate in design reviews

Your Experience

  • BS or MS in Computer Science, a related field, or equivalent professional experience or equivalent military experience
  • Expertise in configuration management with a framework such as Ansible, Terraform, Helm, Kubernetes
  • Proficient in Python and/or Go
  • Expertise in managing applications in the Kubenetes cluster with autoscaling enabled
  • Experience in Production Engineering, DevOps, or Site Reliability
  • Expertise in the public cloud (GCP or AWS), especially in GCP
  • Strong Linux administration, internals, and network troubleshooting
  • Proficiency with programming languages like Python, Golang, and shell scripting to automate tasks
  • Experience with CI/CD pipelines, GitLab, and GitHub preferred
  • Ability to diagnose and troubleshoot complex distributed systems handling high-volume transactions
  • Excellent written and verbal communication, able to collaborate and rally support
  • Self-disciplined, self-managed, self-motivated, and strong sense of ownership, urgency, and drive
  • Passion for infrastructure and monitoring as code
  • Ready to understand and dissect new technology stacks quickly

The Team

Our engineering team is at the core of our products – connected directly to the mission of preventing cyberattacks. We are constantly innovating – challenging how we and the industry think about cybersecurity. Our engineers don’t shy away from building products to solve problems no one has pursued before.

We define the industry, instead of waiting for directions. We need individuals who feel comfortable in ambiguity, excited by the prospect of a challenge, and empowered by the unknown risks facing our everyday lives that are only enabled by a secure digital environment.

Compensation Disclosure

The compensation offered for this position will depend on qualifications, experience, and work location. For candidates who receive an offer at the posted level, the starting base salary (for non-sales roles) or base salary + commission target (for sales/commissioned roles) is expected to be between $160,000 - $225,000/YR. The offered compensation may also include restricted stock units and a bonus. A description of our employee benefits may be foundhere.

#LI-TD1

Our Commitment

We’re problem solvers that take risks and challenge cybersecurity’s status quo. It’s simple: we can’t accomplish our mission without diverse teams innovating, together.

We are committed to providing reasonable accommodations for all qualified individuals with a disability. If you require assistance or accommodation due to a disability or special need, please contact us ataccommodations@paloaltonetworks.com.

Palo Alto Networks is an equal opportunity employer. We celebrate diversity in our workplace, and all qualified applicants will receive consideration for employment without regard to age, ancestry, color, family or medical care leave, gender identity or expression, genetic information, marital status, medical condition, national origin, physical or mental disability, political affiliation, protected veteran status, race, religion, sex (including pregnancy), sexual orientation, or other legally protected characteristics.

All your information will be kept confidential according to EEO guidelines.

Get your free, confidential resume review.
or drag and drop a PDF, DOC, DOCX, ODT, or PAGES file up to 5MB.