Site Reliability Engineer II - Real-Time and Big Data
Esri
Dubai
USD 60,000 - 120,000
Job description
Join us to work collaboratively with our talented team of dynamic and passionate engineers to deliver capabilities that enable our customers to make a difference. You'll deploy and operate ArcGIS Velocity and ArcGIS Workflow Manager SaaS solutions. You will also have the opportunity to design, deploy, and operate next-generation real-time and big data GIS software-as-a-service (SaaS) capabilities for thousands of cloud users worldwide.
Our teams have a broad mix of experience levels and tenures that support an environment that promotes professional development. We care about your career growth and strive to assign projects based on what will help each team member develop into a better-rounded engineer and enable them to take on more complex tasks in the future.
Our team also puts a high value on work-life balance, and we understand that striking a healthy balance between your personal and professional life is crucial to your happiness and success here. We offer a flexible hybrid schedule so you can have a more productive and well-balanced life both in and outside of work.
Responsibilities
Collaborate with a team of SRE engineers to operate SaaS capabilities across multiple regions on the cloud platform
Design, implement, configure, and utilize monitoring systems to monitor the health of SaaS products
Manage infrastructure used for ArcGIS Velocity and ArcGIS Workflow Manager, respond to alerts, and troubleshoot problems to resolution
Develop, implement, and maintain automation solutions for repetitive operational tasks, such as deployment pipelines, incident resolution, and scaling processes
Design and implement the deployment and upgrade containerized micro-service components that, when combined, power Esris SaaS offerings
Create and automate Git workflows to simplify code integration, testing, and infrastructure deployments
Participate in technical spike efforts, bringing new innovative ideas to future versions of our software
Troubleshoot the system incidents and provide root cause analysis reports
Provide rotational on-call technical support
Requirements
5+ years of experience managing Kubernetes (EKS), logging and monitoring (ELK, Prometheus), and container technologies (Docker)
Proficient in using Terraform for automating infrastructure provisioning and management
Ability to design and automate Git workflows for streamlined code integration, testing, and infrastructure deployment
Ability to write scripts to deploy infrastructure and/or applications (Bash, Python, Terraform)
Expert level understanding and experience with cloud computing platforms (AWS or Microsoft Azure)
Strong knowledge of Linux Operating system administration, including troubleshooting, performance tuning, and shell scripting
Proficient in cloud networking, including VPCs, subnets, security groups, and VPNs in platforms like AWS or Azure
Skilled in identifying and resolving system and application issues through effective troubleshooting and root cause analysis
Working knowledge of a source control and issue management system
Bachelors in computer science, computer engineering, GIS, or information systems
Recommended Qualifications
Experience designing, administering, and/or maintaining cloud environments, such as AWS or Azure, supporting 24×7 high-availability production environments
Interest in working with GitOps principles to automate the deployment of applications on Kubernetes clusters
Certifications: AWS Certified Solution Architect Associate, CKA/CKAD or similar
Experience managing OpenSearch (datastore or logstore), and Kafka for managing distributed data streams and ensuring high availability in large-scale systems
Ability to work with continuous integration and delivery best practices
Knowledge of operating resilient, highly available, scalable, and performance SaaS capabilities
Knowledge of Esri ArcGIS or other web mapping technologies