Enable job alerts via email!

Site Reliability Engineer II

PROS

United States

Remote

USD 80,000 - 120,000

Full time

30+ days ago

Boost your interview chances

Create a job specific, tailored resume for higher success rate.

Job summary

An established industry player is seeking a Site Reliability Engineer II to join their dynamic team. In this pivotal role, you will monitor and enhance service performance while troubleshooting complex systems. You will collaborate with product teams to optimize reliability and scalability, leveraging your expertise in scripting and automation. Your contributions will directly impact the efficiency and stability of critical systems, making this an exciting opportunity for those passionate about technology and innovation. If you're ready to take on challenges in a supportive environment, this position is perfect for you.

Qualifications

  • Advanced scripting and automation skills for deployment and maintenance.
  • Proficiency in high-level programming languages like Ruby, Go, or Java.
  • Experience with monitoring tools like Prometheus and Grafana.

Responsibilities

  • Monitor service performance and troubleshoot complex systems.
  • Implement reliability enhancements and maintain documentation.
  • Collaborate with teams to resolve performance bottlenecks.

Skills

Operating Systems Knowledge
Networking
Database Management
Scripting and Automation
Ruby
Go
Java
Monitoring and Alerting (Prometheus, Grafana)
Cloud Environment Optimization
RESTful API Design
API Testing Tools (Postman)
Communication Skills
Time Management
Crisis Management
Problem-Solving Skills
Teamwork
Innovation
IT Security Best Practices

Education

University Degree in Computer Science

Tools

Prometheus
Grafana
Postman

Job description

PROS, Holdings, Inc. (NYSE: PRO) provides AI-powered solutions that optimize selling in the digital economy. PROS solutions make it possible for companies to price, configure and sell their products and services in an omnichannel environment with speed, precision and consistency. Our customers, who are leaders in their markets, benefit from decades of data science expertise infused into our industry solutions.

The Site Reliability Engineer II is a primary team member who works to administer, support, troubleshoot, and problem solve complex systems and services.

A Day in the Life of the Site Reliability Engineer II:

  • Monitor service performance, reliability metrics, and infrastructure stability.
  • Perform in-depth analysis of system performance and identify areas for improvement.
  • Participate in disaster recovery testing and implement reliability enhancements.
  • Define and maintain Service Level Objectives (SLOs) and related visualizations/alerts.
  • Collaborate with product teams to resolve performance bottlenecks.
  • Implement and maintain automated deployments and self-service tools.
  • Create and troubleshoot automation scripts for operational tasks.
  • Leverage automation to improve system scalability and efficiency.
  • Participate in Follow-the-sun on-call rotations and respond to incidents promptly.
  • Troubleshoot and resolve production incidents, identifying root causes and creating detailed post-incident reports.
  • Work with development teams to address reliability and performance concerns.
  • Maintain and update documentation, including user stories and operational processes.
  • Share knowledge through team sessions and contribute to continuous improvement.
  • Implement automation for security auditing and vulnerability mitigation.
  • Collaborate with security teams to enhance cloud security posture.
  • Identify root causes of incidents and outages and participate in detailed post-incident analysis and documentation.

Required Qualifications - About you:

We are looking for candidates who possess the rare combination of the following achievements, skills, and behaviors.

  • Working knowledge of operating systems, networking and database management.
  • Advanced scripting and automation for deployment, scaling and maintenance tasks.
  • Proficiency in at least one high-level programming language (Ruby, Go, Java).
  • Knowledge of infrastructure and configuration management via automation.
  • Advanced skills in creating monitoring and alerting rules (Prometheus, Grafana).
  • Implement and optimize Cloud environments.
  • Knowledge of RESTful API design and development.
  • Familiarity with API testing tools (e.g., Postman).
  • Excellent communication skills.
  • Excellent time management, organizational skills, crisis management and problem-solving skills.
  • Ability to work in a team and independently.
  • Willing to innovate, learn and share knowledge.
  • University degree in computer science or related.
  • Developing and implementing IT security best practices and procedures.
  • Excellent command of English language.

It would be considered a plus:

  • Applicable IT Certifications.
  • System administrator experience.
  • Previous experience with cloud services - including open-source technology, software development, system engineering, scripting languages and multiple cloud provider environment.

Skills & Personal Characteristics:

  • Ownership
  • Innovation
  • Care

Work Environment:

Most work activities are performed in an office or home-office environment and require little to moderate physical exertion. Work activities may require periods of extended hours, critical deadlines and stressful situations. To successfully complete the tasks of this position, individuals must be able to communicate clearly (in writing and orally), comprehend business terminology, interpret numerical data.

This job description is intended to convey information essential to understanding the scope of the job and the general nature and level of work performed by job holders within this job. This job description is not intended to be an exhaustive list of qualifications, skills, efforts, duties, responsibilities or working conditions associated with the position.

Get your free, confidential resume review.
or drag and drop a PDF, DOC, DOCX, ODT, or PAGES file up to 5MB.

Similar jobs

Site Reliability Engineer

Jobot

Indianapolis

Remote

USD 100.000 - 150.000

10 days ago

Software Engineering Site Reliability Engineer Professional JERSEY CITY, US

Avature

New Jersey

Remote

USD 111.000 - 191.000

6 days ago
Be an early applicant

Software Engineer II - Site Reliability Engineer

The Walt Disney Company

California

On-site

USD 114.000 - 169.000

7 days ago
Be an early applicant

Senior Site Reliability Engineer

Akamai Technologies GmbH

Remote

USD 106.000 - 222.000

6 days ago
Be an early applicant

Site Reliability Engineer II

IBM Computing

Austin

Remote

USD 90.000 - 150.000

Yesterday
Be an early applicant

Site Reliability Engineer II

FICO

Remote

USD 70.000 - 110.000

Yesterday
Be an early applicant

Software Engineer, Site Reliability Engineering, Campus

Google

Durham

On-site

USD 118.000 - 170.000

5 days ago
Be an early applicant

Site Reliability Engineer - Azure Red Hat OpenShift

Red Hat

Raleigh

Hybrid

USD 94.000 - 152.000

2 days ago
Be an early applicant

Site Reliability Engineer

IBM

Jersey City

Remote

USD 90.000 - 140.000

5 days ago
Be an early applicant