Enable job alerts via email!

Sr System Reliability Engineer

Disney Cruise Line - The Walt Disney Company

London

On-site

Confidential

Full time

12 days ago

Boost your interview chances

Create a job specific, tailored resume for higher success rate.

Job summary

An innovative company is seeking a Systems Reliability Engineer to architect, design, and automate applications at scale. This role involves collaborating with development teams to create resilient architectures, improve operational excellence, and ensure application stability. The ideal candidate will have extensive experience in systems administration on both Linux and Windows platforms, with a strong background in CI/CD processes and cloud automation tools. Join a dynamic team where you can contribute to exciting projects that enhance the quality of services offered to guests.

Qualifications

  • 5+ years in technical operations and software engineering.
  • Expertise in systems administration across Linux and Windows.

Responsibilities

  • Create and deliver new technologies and platforms.
  • Provide systems administration and application support.

Skills

Linux Administration
Windows Administration
CI/CD (GitLab CI, Jenkins)
Programming (Go, Python, Ruby, Node)
Cloud Automation (Boto, CloudFormation, Terraform)
Container Computing (Docker, Kubernetes)
Web Technologies (Java, Node.js, Tomcat)
Network Protocols (HTTP, DNS)
Troubleshooting
Project Management

Education

Bachelor of Science in Computer Science

Tools

Git
AWS
Google Cloud
Azure
Docker
Kubernetes

Job description

Systems Reliability Engineers use a software engineering approach to architect, design, automate, monitor, and build applications at scale. This includes operating and engineering software with close business segment alignment to deliver platforms through efficient, effective and resilient architectures. SREs are talented engineers that are focused on improving quality through a data driven approach: instrumentation, automation, and functional/unit testing.

Responsibilities:

  • The SRE will help create, build and deliver new technologies or platforms. This will include consultation, designing, building, and supporting development pipelines, automating infrastructure and operations, creating telemetry for monitoring, engineering high reliability and reinforcing best practices to secure our company and guest data.

  • Have expert level systems administration skills on both the Linux and Windows platforms

  • Work with CI/CD platforms (Gitlab CI or Jenkins), strong systems development (Go, Python, Ruby, Node) and cloud automation tools (Boto, CloudFormation, Terraform), source control, cloud hosting, container computing, web technologies

  • Maintain expertise on systems, operational excellence and application stability, security, performance, and capacity management, as well as documentation.

  • Work closely with development teams across Disney to brainstorm, architect, gather requirements, troubleshoot, and provide stellar customer support

  • Be prepared to work in an extremely collaborative and high-energy environment.

  • Lead project/planning efforts, architectural design, engineering, attending meetings w/ various teams.

  • Implement, integrate and configure solutions, tools, infrastructure and systems.

  • Provide systems administration and application support – Level 2 & 3 maintenance and support

Basic Qualifications:

  • Understand how to install and configure operating systems, specifically with expertise in Linux and Windows Server.

  • Software Development Continuous Integration (CI) Pipeline knowledge (GitLab CI, Github Actions)

  • Experience with Source Control Management systems (Git)

  • Experience in public and private cloud hosting services (AWS, Google Cloud, Azure, OpenStack, CloudStack) as well as familiarity with container computing (eg. Docker, ECS, Kubernetes, Terraform).

  • Experience as a subject matter expert on at least one OS and proficient in multiple operating systems, including OS performance monitoring, setup, configuration, tuning, and troubleshooting.

  • Proficiency in web or web server technologies: Java, Node.js, Tomcat, IIS, Apache/nginx, MySQL, PostgreSQL, etc., including being able to perform basic setup, configuration, and troubleshooting.

  • Understanding of internet technologies and network protocols, including HTTP, basic load balancing configurations, security zones, VIPs, SNMP, REST and DNS.

  • Ability to implement existing base standards for new systems and/or applications with mentoring for all of the following:

    • Site monitoring and instrumentation

    • Application monitoring and instrumentation

    • System monitoring and instrumentation

    • Resiliency and performance

  • Able to diagnose simple to complex system problems.

  • Has experience on one or more load balancer platforms (setting up pools, VIPs, layer 7 routing, debugging).

  • Able to author tools and scripts to be used by others to automate repeatable production tasks in standard languages like Bash, Ruby, Python, or Go.

  • Advanced skills in at least one programming language such as Python, PHP, Ruby, Java, Go, Swift or C++ and able to build unit test suites for all software being developed.

  • Experience supporting and/or developing backend tools or services

  • Able to perform and provide in depth analysis on load test runs against a moderately complex system.

  • Demonstrates exceptional troubleshooting methodology, including the ability to author and instruct new methodologies to the SRE team.

  • Independently resolve moderately to highly complex system and application incidents.

  • Able to identify and propose system and application fixes for performance bottlenecks.

  • Able to evaluate new application requirements for capacity and run-time best practices.

  • Able to evaluate new system and/or infrastructure solutions for technical feasibility against known requirements and standards.

  • Effective at dealing with change: Able to transition in role or handle a significant modification to workflow or technology with minimal ramp-up time and with very little guidance.

  • Excellent verbal and written communication to all levels in the organization.

  • Serves as primary point of contact with Manager.

  • Demonstrates curiosity and continuous learning and self-improvement.

  • Ability to lead functional teams in systems integration and design including writing operational specs, architectural diagrams, test plans and requirements management.

  • Effective project management and planning on large-scale projects (familiarity with agile/scrum and water-fall project management a plus).

  • Construction of concise and complete technical documentation and the ability to design and deliver training to other staff

  • Detailed understanding of the goals and requirements of the business supported.

Required Education:

Bachelor of Science degree in computer science or related field or equivalent experience in technical operations and software engineering with 5 years of related work experience.

#DISNEYTECH


The hiring range for this position in California is $138,900 - $186,200 per year and in Washington is $145,400 - $195,000 per year. The base pay actually offered will take into account internal equity and also may vary depending on the candidate’s geographic region, job-related knowledge, skills, and experience among other factors. A bonus and/or long-term incentive units may be provided as part of the compensation package, in addition to the full range of medical, financial, and/or other benefits, dependent on the level and position offered.
Get your free, confidential resume review.
or drag and drop a PDF, DOC, DOCX, ODT, or PAGES file up to 5MB.

Similar jobs

Senior Site Reliability Engineer

Auros

Greater London

Remote

GBP 60,000 - 100,000

12 days ago

Site Reliability Engineer (Home-based)

JR United Kingdom

London

Remote

GBP 60,000 - 80,000

Today
Be an early applicant

Site Reliability Engineer – FinTech / Global Payments – London HQ / Remote First

Future Talent Group

Greater London

Remote

GBP 50,000 - 90,000

14 days ago

Site Reliability Engineer – FinTech / Global Payments – London HQ / Remote First

JR United Kingdom

London

Remote

GBP 60,000 - 95,000

12 days ago

Remote Site Reliability Engineer

TN United Kingdom

London

Remote

GBP 60,000 - 100,000

14 days ago

Site Reliability Engineer, Americas

TN United Kingdom

London

Remote

GBP 55,000 - 90,000

15 days ago

Site Reliability Engineer

ZipRecruiter

Chelmsford

Remote

GBP 60,000 - 100,000

6 days ago
Be an early applicant

Asset Coordinator

TN United Kingdom

Southend-on-Sea

Remote

GBP 35,000 - 38,000

6 days ago
Be an early applicant

Site Reliability Engineer

Bentley Whitaker Search and Selection

London

Remote

GBP 55,000 - 70,000

7 days ago
Be an early applicant