Position Description:
The Private Cloud SRE L3 team is part of the Enterprise Computing organization within the Company. The team has presence in cities globally and is focused on supporting cloud and container-based platforms for internal and external clients. You will integrate with the global follow the sun operations model, which translates to responsibility for technologies supported by the team in the respective regions. Team members frequently interact with engineering teams and collaborate on the testing and certification of software deployed to the platform.
Primary Responsibilities:
• Provide L3 support for our Company’s private cloud, including on-call rotation
• Work closely with the internal engineering team and provide input on testing of new component releases and infrastructure upgrades, as well as performance, capacity, and monitoring
• Create and improve processes for support, including training, documentation, customer engagement, automation, and scripting, incident, problem, and change management
• Work together with L2 teams and other L3 team members internationally
Required Skills:
• 5 to 7 years of relevant experience
• 3 to 5 years of Linux experience.
• Sound knowledge of server infrastructure, virtualization, cloud computing
• Proven Kubernetes and Docker experience
• Excellent understanding of internet and networking protocols, including TCP/IP, HTTP/HTTPS
• Strong understanding of security protocols, e.g. SSL/TLS, Kerberos
• Strong organizational skills and ability to manage multiple tasks and high-pressure situations for outage resolution
• Experience with Agile and DevOps/SRE concepts
• Have administrative competence in at least one major scripting language or platform (for example Python)
• Communicate effectively with various user groups, e.g. developers and engineers, as well as remote team members
Nice to have:
• Knowledge of system monitoring in cloud environments, including cloud – specific products and tools
• Experience in developing monitoring architecture and implementing monitoring agents, dashboards, and alerts
• Experience operating in large, enterprise environments
• Experience with maintaining high-availability production systems
• Experience in enterprise-level hosting environments, in particular cloud and container technologies