Job Summary
We are seeking a highly skilled and motivated OpenShift Cloud Platform and Deployment Engineer with a minimum of 6 years of relevant experience in infrastructure/platform operations, including 3+ years of hands-on administration of Red Hat OpenShift 4.x in enterprise production environments. The engineer will be responsible for managing Day 2 operations of OpenShift clusters, including core services like ACS (Advanced Cluster Security), ACM (Advanced Cluster Management), Quay, and OpenShift Service Mesh, while actively supporting application deployment workflows and CI/CD practices.
Key Responsibilities
- Operate and maintain Red Hat OpenShift 4.x clusters with a focus on operations:
- Cluster health monitoring, node scaling, patching, and upgrades
- Certificate management, storage troubleshooting, and network policy enforcement
- Manage and support platform services:
- ACS – Vulnerability management, runtime security, and compliance policy enforcement
- ACM – Multi-cluster governance, policy management, and cluster lifecycle automation
- Quay – Secure and manage container image registries with vulnerability scanning
- Service Mesh (Istio) – Service-to-service communication, observability (Kiali, Jaeger), and mTLS enforcement
- Provide application deployment support:
- Troubleshoot CI/CD issues with Tekton, Jenkins, ArgoCD, or GitHub Actions
- Assist development teams with onboarding and deployment best practices
- Troubleshoot and resolve production incidents, participate in root cause analysis (RCA), and create documentation
- Implement platform backup, disaster recovery (e.g., Velero), and availability strategies
- Collaborate with Red Hat and vendor support to resolve complex platform issues
- Maintain accurate and up-to-date documentation (SOPs, runbooks, RCA reports, etc.)
Required Skills & Experience
- 6+ years of relevant experience in infrastructure/platform operations, including:
- 3+ years of hands-on experience administering Red Hat OpenShift 4.x
- Strong understanding of Kubernetes core concepts, Operators, Helm, CRDs, and container orchestration
- Experience in:
- ACS for container security and runtime policies
- ACM for multi-cluster operations and policy governance
- Quay registry for secure image lifecycle management
- OpenShift Service Mesh (Istio) for service communication and tracing
- Linux administration (RHEL preferred) and strong familiarity with CLI tools
- Experience with GitOps and CI/CD tools: Tekton, ArgoCD, Jenkins, GitHub Actions
- Scripting/automation: Bash, Python, or Ansible
- Familiarity with observability and monitoring tools like Prometheus, Grafana, EFK, Loki
Soft Skills
- Strong analytical and troubleshooting skills
- Effective communicator with technical and non-technical teams
- Ability to work independently and collaboratively in a team environment
- Proactive in knowledge sharing and documentation