¡Activa las notificaciones laborales por email!
Mejora tus posibilidades de llegar a la entrevista
Elabora un currículum adaptado a la vacante para tener más posibilidades de triunfar.
A leading technology services company is seeking a Senior Site Reliability Engineer (SRE) to ensure the reliability and performance of critical systems. The role involves developing automation, leading incident responses, and working closely with engineering teams to implement best practices in cloud-native tools and infrastructure as code. This fully remote position requires strong technical skills and a customer-first mindset.
Job Title:Senior Site Reliability Engineer (SRE)
Experience:5+ years Location:Mexico/LATAM
Engagement Type:Full-Time/contractual, Fully Remote
Job Description:
We are seeking a skilledSenior Site Reliability Engineer (SRE)to join our offshore team. In this role, you will be responsible for ensuring thereliability, performance, and scalabilityof our critical systems. You'll develop automation, build monitoring solutions, lead incident response, and work closely with engineering teams to implement infrastructure as code, CI/CD, and cloud-native tools.
Job Responsibilities:
Maintain thereliability, availability, and performance of critical systems
Develop and maintainautomationscripts and tools to streamline operations
Develop and maintainmonitoringdashboards and alerts
Leadincidentresponse, conduct post-mortem analysis, and implement preventative measures
Optimize systemperformanceand scalability
Implement and maintainsecuritybest practices
Create and maintain comprehensive system and processdocumentation
Participate in on-call rotations for 24/7criticalsystem support
Must Haves:
Kubernetes (hands-on experience)– managing and deploying workloads
AWS Cloud Platform– deep understanding and production experience
Infrastructure as Code (IaC)– using tools like Terraform (or CloudFormation/Ansible)
Scripting/Programming– Proficiency in Python or Go
Monitoring & Alerting– Experience with Prometheus, Grafana
CI/CD Pipelines– Jenkins, GitLab CI, or similar
Incident Management– Proven experience in responding to and analyzing outages
Linux Systems & Networking– Strong fundamentals
Good to Haves:
ArgoCD, Linkerd, Karpenter, or other Kubernetes-related tools
Logging tools – Loki, ELK Stack
Security best practices – Cloud and container security knowledge
Leadership/Mentorship – Experience guiding junior engineers
Post-mortem writing & RCA – Comfortable documenting incidents and learnings
Experience in distributed systems or high-availability architectures
Recruitment Process:
AI-based online screening test
Assignment
2 client interviews
CEO Discussion
Offer: Successful candidates will receive an offer to join the team.
Soft Skills
Excellent verbal and written communication skills in English - Must
Strong problem-solving ability with a customer-first mindset
Accountability – Takes ownership of reliability and incident outcomes.
Demonstrated ability to operate in high-pressure, multitasking environments independently
Passion for supporting and helping others
About Us:
We atThink Future Technologies(TFT) provide Technology Services to our customers, enabling them to achieve superior business outcomes. We come in as a trusted Partner completely owning the Technology piece. We brainstorm on our customer's business problems, arrive at the right solution framework, deploy the right blend of technical resources, and thereon provide optimal delivery at every step of the project implementation.