Overview
We are looking for a skilled and motivated Cloud Region Build Site Reliability Engineer (SRE) to join our Oracle Cloud Infrastructure Region Build team. In this role, you will be responsible for building, deploying, and maintaining compute cloud infrastructure services across multiple regions to ensure high availability, scalability, and performance. You will work closely with engineering, product, and operations teams to design and implement robust automation and monitoring solutions, and lead efforts to improve system reliability and efficiency.
Responsibilities
- Work with Site Reliability Engineering (SRE) team to build, and maintain OCI compute cloud infrastructure and services across multiple geographic regions.
- Understand the end-to-end configuration, technical dependencies, and overall behavioral characteristics of Oracle Cloud Region Build services.
- Demonstrate clear understanding of automation and orchestration principles. Act as ultimate escalation point for complex or critical issues that have not yet been documented as Standard Operating Procedures (SOPs).
- Be part of incident response to help removing blockers during the region build process.
- Continuously improve compute cloud infrastructure region build.
- Participate in on-call rotations and provide support for critical infrastructure issues.
- Automate infrastructure provisioning, configuration, and deployment using tools like Terraform.
- Collaborate with cross-functional teams to design and roll out new cloud region builds and expansions.
- Utilize a deep understanding of service topology and their dependencies required to troubleshoot issues and define mitigations.
- Collaborate with software engineers to build scalable, reliable, and highly available cloud-native systems.
- Monitor system health and performance using tools like Grafana.
Qualifications
- Bachelor’s degree in Computer Science, Engineering, or related technical field (or equivalent experience).
- Proven experience (3+ years) as an SRE, Cloud Engineer, or DevOps Engineer in cloud environments.
- Strong knowledge of cloud platforms such as AWS, GCP, or Azure with hands-on experience in building and managing regional deployments.
- Expertise in Infrastructure as Code (Terraform, CloudFormation, Ansible, etc.).
- Proficient with scripting languages (Python, Bash, Go, etc.).
- Experience with monitoring, alerting, and logging tools (Prometheus, Grafana, ELK stack, Datadog, etc.).
- Solid understanding of networking, security, and distributed systems in cloud environments.
- Experience working in Agile teams and collaborating with software engineers and product teams.
- Strong troubleshooting and problem-solving skills.
- Excellent communication and documentation skills.