Job Requirements and Responsibilities
To be considered for this position, you must have an active TS/SCI W/ Polygraph security clearance (U.S. citizenship required).
The Cloud Systems Administrator will contribute to:
- Provide support for implementation, troubleshooting, and maintenance of IT systems.
- Manage IT system infrastructure and related processes.
- Support day-to-day operations, monitoring, and problem resolution for client/server/storage/network devices and mobile devices.
- Deliver Tier 1 (Help Desk) and Tier 2 (Escalation) problem diagnosis and resolution.
- Support escalation processes and communicate status updates to management and customers.
- Configure and manage UNIX and Windows operating systems, including troubleshooting and network configuration, to enhance system reliability and performance.
The role also involves supporting large clusters, requiring:
- At least three years of experience in system administration and monitoring of large distributed systems, including multiple clusters, spanning at least 3 racks with a minimum of 60 nodes per site.
- Experience diagnosing and troubleshooting large-scale cloud computing systems, with familiarity in distributed storage and retrieval technologies such as Hadoop, Cassandra, Scality, Swift, Gluster, Lustre, GPFS, Amazon S3, or similar big data or HPC technologies.
- Ability to work within a team, follow SOPs, communicate effectively, accept feedback, and receive guidance from senior technical staff.
- Willingness to learn new technologies and leverage team resources for professional growth.
- Independently handle complex tasks and mentor junior staff.
- Experience in planning, leading, and managing complex technical projects involving multiple teams.
Additional technical skills include:
- Five years of experience scripting with Bash, Perl, or Python.
- Seven years of experience with Linux core components, including LDAP, DHCP, DNS, and TFTP management.
- Experience with configuration management tools like Puppet and SALT.
- Expertise in Linux PXE/network provisioning, RAID utilities, TFTP, and disk scripting.
- Experience troubleshooting hardware via remote utilities such as VNC, serial over LAN, IPMI, and BIOS configurations.
- Understanding of corporate architecture, openSSL, and Java keystores.
- Experience with hardware troubleshooting, including SGI/HP systems.
Education: Three years of relevant experience is required; a Bachelor’s Degree in Engineering, Systems Engineering, Computer Science, or Mathematics is highly desirable and equivalent to two years of experience. A Hadoop/Cloud System Administrator Certification or similar is required.
Preferred additional skills include:
- Knowledge of SSH tunneling, SOCKS proxies, and utilities like rysn, pdsh, pdcp, WinSCP.
- Basic network concepts such as VLANs, port channel bonding, and switch interactions.
- Experience with load balancers like HAProxy and NGINX.
- Experience with Kubernetes, Docker, log aggregation tools like Elasticsearch, Logstash, Grafana, and Rsyslog.