Enable job alerts via email!

Senior Site Reliability Engineer

Nami Technology Joint Stock Company

United States

Remote

USD 120,000 - 160,000

Full time

Today

Be an early applicant

Boost your interview chances

Create a job specific, tailored resume for higher success rate.

Job summary

A leading company is seeking a Senior Site Reliability Engineer to architect and maintain critical infrastructure for their AI-driven SaaS offerings. The role involves collaboration across teams to ensure system reliability, performance optimization, and security compliance. Ideal candidates will have extensive experience in cloud platforms, automation, and strong communication skills. Enjoy a competitive salary, equity options, and a flexible work environment that supports professional growth.

Benefits

Equity options

Comprehensive insurance package

Continuous learning opportunities

Flexible work hours

Qualifications

5+ years of experience in site reliability engineering or DevOps.
Expertise in cloud platforms and infrastructure-as-code tools.
Strong scripting and programming skills.

Responsibilities

Architect and implement scalable cloud infrastructure.
Develop systems for 99.99% uptime and incident response.
Drive automation of infrastructure provisioning.

Skills

Problem-Solving

Communication

Education

Bachelor’s degree in Computer Science

Tools

Terraform

Ansible

Docker

Kubernetes

Jenkins

GitLab CI

Prometheus

Grafana

Datadog

As a Senior Site Reliability Engineer, you will play a critical role in architecting and maintaining the infrastructure that powers our AI-driven SaaS and private cloud offerings. You will collaborate with cross-functional teams to implement best-in-class reliability practices, optimize system performance, and ensure seamless operations for our enterprise clients.

Key Responsibilities

Design and Build Infrastructure: Architect and implement scalable, secure, and highly available cloud infrastructure to support our SaaS and private cloud platforms.
System Reliability: Develop and maintain systems to ensure 99.99% uptime, including monitoring, alerting, and incident response strategies.
Automation: Drive automation of infrastructure provisioning, configuration management, and deployment pipelines to improve efficiency and reduce human error.
Performance Optimization: Identify and resolve performance bottlenecks in distributed systems, ensuring low-latency and high-throughput operations.
Security and Compliance: Implement security best practices and ensure compliance with industry standards (e.g., SOC 2, GDPR, HIPAA) for our private cloud deployments.
Incident Management: Lead incident response, root cause analysis, and post-mortem processes to prevent recurrence and improve system resilience.
Collaboration: Work closely with software engineering, data science, and product teams to align infrastructure capabilities with business needs.
Capacity Planning: Forecast resource requirements and plan for scalable growth to meet increasing customer demand.
Documentation: Maintain clear and comprehensive documentation of infrastructure designs, processes, and operational procedures.

Your skills and experience

Experience: 5+ years of experience in site reliability engineering, DevOps, or a related field, with a focus on cloud-based systems.
Technical Skills:
- Expertise in cloud platforms (e.g., AWS, Azure, GCP) and infrastructure-as-code tools (e.g., Terraform, Ansible, CloudFormation).
- Proficiency in containerization and orchestration (e.g., Docker, Kubernetes).
- Strong scripting and programming skills (e.g., Python, Go, Bash).
- Experience with CI/CD pipelines and tools (e.g., Jenkins, GitLab CI, ArgoCD).
- Knowledge of monitoring and observability tools (e.g., Prometheus, Grafana, Datadog, ELK stack).
- Knowledge of OSI model
- Familiarity with networking, security, and database systems (e.g., SQL, NoSQL).
Problem-Solving: Proven ability to troubleshoot complex, distributed systems and resolve issues under pressure.
Communication: Excellent verbal and written communication skills, with the ability to collaborate effectively with technical and non-technical stakeholders.
Education: Bachelor’s degree in Computer Science, Engineering, or a related field (or equivalent experience).
Nice-to-Have:
- Experience in AI/ML infrastructure or high-performance computing.
- Certifications in cloud platforms or SRE-related disciplines (e.g., AWS Certified DevOps Engineer, Google SRE).
- Familiarity with private cloud deployments and hybrid infrastructure.

Why you'll love working here

Impactful Work:

Be a key contributor to a fast-growing AI company transforming the B2B SaaS landscape.
Take part in high-impact projects with opportunities to quickly develop your skills, lead teams, and grow your career globally.
Your contributions are recognized with both professional advancement and strong financial rewards.

Collaborative Culture:

Join a passionate, innovative, and young team that values diversity, creativity, and open communication.
Experience a dynamic, democratic work environment with regular team building, sports events, and company trips that strengthen bonds and make work more enjoyable.

Competitive Compensation:

Enjoy an attractive, negotiable salary based on your experience and capabilities, along with equity options.
Benefit from a comprehensive package including social, health, and unemployment insurance, aligned with FPT Corporation’s standards.

Professional Growth:

Access continuous learning opportunities, including AWS training and certification programs.
You’ll be encouraged to take initiative, gain leadership experience, and explore international career development pathways.

Flexible Work:

Work from anywhere with a remote-friendly policy and flexible hours designed to support your productivity and work-life balance.

Get your free, confidential resume review.

or drag and drop a PDF, DOC, DOCX, ODT, or PAGES file up to 5MB.

Similar jobs