Enable job alerts via email!

Cloud Reliability Engineer

Marathon TS

Leominster (MA)

On-site

USD 60,000 - 100,000

Full time

5 days ago
Be an early applicant

Boost your interview chances

Create a job specific, tailored resume for higher success rate.

Job summary

An innovative firm is seeking a Cloud Reliability Engineer to join a dynamic team in supporting critical infrastructure for the Department of Defense. This role involves ensuring the uptime of a multi-tenant cloud environment, utilizing cutting-edge monitoring tools, and conducting thorough incident responses. Candidates should have a solid background in both Linux and Windows operating systems, as well as experience with virtualization technologies like OpenStack and VMware. The position requires a proactive approach to problem-solving in a 24x7 operational setting, making it ideal for those who thrive in fast-paced environments and are eager to make a significant impact in their field.

Qualifications

  • 2+ years of experience in cloud reliability and incident response.
  • Hands-on experience with virtualization and distributed computing.

Responsibilities

  • Ensure uptime of multi-tenant infrastructure and conduct incident response.
  • Work with engineering teams to improve platforms and eliminate complexity.

Skills

Linux Operating Systems
Windows Operating Systems
Incident Response
Virtualization Technologies
Distributed Computing Technologies
Networking Fundamentals
Monitoring Tools
Ansible
Docker Containers
CompTIA Security+ Certification

Education

Associate's Degree in Engineering or Computer Technology

Tools

OpenStack
Citrix XenServer
Red Hat Enterprise Virtualization
VMware
Puppet

Job description

Overview
Marathon TS is seeking a Cloud Reliability Engineer in Chantilly, VA to support our Department of Defense / Intelligence Community customer as part of a highly talented, highly motivated and high-performing team. As part of the Infrastructure Operations and Maintenance Support team you will be responsible for the availability, performance, monitoring, and incident response, among other things, of the Cloud Infrastructure that we support in a 24x7 environment.

Responsibilities

  • Ensure the uptime of our multi-tenant infrastructure
  • Work closely with the engineering teams to improve our platforms and eliminate complexity from architecture and processes
  • Configure and use state-of-the-art monitoring tools to gather insights and then act upon the results
  • Conduct incident response and in-depth root cause analysis.
  • This position is hands-on, requiring the ability to provide first level system and network support and problem resolution identification.
  • The candidate would be responsible for the monitoring the daily software and network operations in a distributed environment.
  • Also responsible for monitoring, working with users on fault isolation and resolution, as well as system analysis and reporting.
  • This job will include shift work to allow for complete 24x7 monitoring of software systems.

Qualifications
Required Qualifications:

  • You have at least an associate's degree in Engineering or Computer Technology or Advanced Military Training.
  • You have at least 2 years of relevant experience
  • You have experience working with Windows and Linux operating systems.
  • You have experience with distributed computing technologies.
  • You have experience with virtualization technologies (e.g. OpenStack, Citrix XenServer Red Hat Enterprise Virtualization, and/or VMWare), Docker Containers, Ansible, and Heat templates.
  • You have experience with front end processing and network gateway appliances and /or software.
  • You have experience working in a customer environment and/or a classified environment.
  • You have a background in supporting software and/or network operations with a clear understanding of networking fundamentals.
  • You have experience with Linux/Unix and Windows operating systems.
  • You hold a current CompTIA Security+, CASP or CISP certification. Computing Environment Certification (e.g. Linux+, RHCSA, RHCE, MCSA).
  • You are able to effectively communicate both with customers and technical staff.
  • You have an active TS/SCI security clearance, willing to undergo and pass a polygraph examination
  • You are willing to work in a 24x7 environment

Desired Qualifications:

  • Have an active TS/SCI with Polygraph
  • Have experience with infrastructure automation technologies including OpenStack, Ansible, Heat, Puppet, etc. Experience on Cloud Computing Fundamentals.
  • Have a good understanding of KVM Virtualization technologies.
  • Have previous experience with networking equipment.
  • Have experience with Intelligence or DoD programs, either within the military or as a civilian contractor, is desired.

Marathon TS is committed to the development of a creative, diverse and inclusive work environment. In order to provide equal employment and advancement opportunities to all individuals, employment decisions at Marathon TS will be based on merit, qualifications, and abilities. Marathon TS does not discriminate against any person because of race, color, creed, religion, sex, national origin, disability, age or any other characteristic protected by law (referred to as "protected status ").

#CJJOBS

Company Description

Marathon TS provides a full range of professional services for clients that require support from professionals with specialized skills and experience in a specific technical area or subject matter. Marathon TS also provides IT solutions, including strategy, operations, transformation and mission support.

Company Description

Marathon TS provides a full range of professional services for clients that require support from professionals with specialized skills and experience in a specific technical area or subject matter. Marathon TS also provides IT solutions, including strategy, operations, transformation and mission support.

Get your free, confidential resume review.
or drag and drop a PDF, DOC, DOCX, ODT, or PAGES file up to 5MB.

Similar jobs

Site Reliability Engineer

Saviance

Boston

Remote

USD 90,000 - 140,000

3 days ago
Be an early applicant

L3 Cloud DevOps Engineer / Site Reliability Engineer (SRE)

NTD software

Remote

USD 80,000 - 120,000

8 days ago

Site Reliability Engineer II

IBM Computing

Austin

Remote

USD 90,000 - 150,000

Yesterday
Be an early applicant

Site Reliability Engineer (SRE)

Air Apps

San Francisco

Remote

USD 90,000 - 150,000

Today
Be an early applicant

[Hiring] Site Reliability Engineer @JatApp

JatApp

Remote

USD 80,000 - 120,000

Yesterday
Be an early applicant

Site Reliability Engineer Lead

IGT Solutions

Remote

USD 90,000 - 150,000

Yesterday
Be an early applicant

System Safety Engineer

Leidos

Remote

USD 89,000 - 163,000

Today
Be an early applicant

System Safety Engineer

Leidos

Huntsville

Remote

USD 89,000 - 163,000

Yesterday
Be an early applicant

Site Reliability Engineer II

FICO

Remote

USD 70,000 - 110,000

Yesterday
Be an early applicant