We are looking for an experienced Infrastructure Engineer to design, implement, and optimize data center and storage solutions. The ideal candidate brings solid hands‑on expertise with software‑defined storage, container orchestration, and infrastructure automation. You’ll work collaboratively to deliver scalable, reliable storage and compute infrastructure while contributing to AI/ML workload enablement.
Key Responsibilities
- Design and implement software‑defined storage solutions using Ceph, focusing on performance, scalability, and reliability
- Deploy and manage Kubernetes clusters for containerized workloads and storage orchestration
- Automate infrastructure provisioning and configuration using IaC tools (Juju, Terraform)
- Optimize storage performance metrics including IOPS, latency, and throughput for diverse workload requirements
- Participate in the solution lifecycle: requirements analysis, design, implementation, testing, and ongoing optimization
- Deploy and configure hardware infrastructure including servers, storage systems, and networking components
- Collaborate with DevOps, application, and platform teams to ensure infrastructure meets business and technical needs
- Monitor and troubleshoot infrastructure issues; implement improvements based on performance analysis
- Stay current with storage technologies, container ecosystems, and infrastructure best practices
Required Qualifications
- 5‑7 years of experience in infrastructure engineering with focus on storage and data center technologies
- Strong hands‑on experience with Ceph storage (block, object, and file storage modes)
- Proficiency with Kubernetes including storage integration (CSI drivers, persistent volumes, StatefulSets)
- Solid experience with Infrastructure as Code using Terraform and/or Juju
- Understanding of storage performance concepts: IOPS, throughput, latency tuning, and capacity planning
- Experience with Linux system administration and storage protocols (iSCSI, NFS, RBD, S3)
- Practical deployment experience: server configuration, storage setup, and system integration
- Good communication skills with ability to document architectures and explain technical concepts
- Relevant certifications a plus: CKA (Certified Kubernetes Administrator), Linux certifications, or vendor‑specific storage certifications
Preferred Qualifications
- Experience supporting AI/ML workloads and understanding their storage requirements (high IOPS, parallel access patterns)
- Knowledge of storage backends for AI frameworks (distributed storage, GPU‑optimized data pipelines)
- Familiarity with monitoring and observability tools (Prometheus, Grafana)
- Experience with automation frameworks (Ansible, Python scripting)
- Experience with Golang for Kubernetes plugin
- Understanding of software‑defined networking (SDN) and container networking
- Exposure to hybrid cloud or multi‑cluster environments