Operations Support Engineer
We are seeking a dedicated Operations Support Engineer/ Manager to ensure the smooth functioning of mission‑critical systems and applications. The successful candidate will provide hands‑on support, monitor infrastructure, troubleshoot incidents, and collaborate with cross‑functional teams to maintain secure, efficient, and scalable operations.
Responsibilities
- Cloud Operations Management
- Oversee daily operations of cloud‑based data sharing platforms across the cloud and on‑prem environment.
- Collaborate with engineers, developers, and IT teams to ensure high availability, optimal performance, robust security, and superior user experience.
- Technical Support Leadership
- Develop and implement strategies to enhance the efficiency and effectiveness of the Level 1 technical support team.
- Investigate and resolve escalated technical issues, ensuring timely ticket management and escalation of complex problems to higher‑level teams.
- Incident & Problem Management
- Manage incident resolution processes, recommend system improvements, and establish new operational procedures to drive continuous improvement.
- Conduct thorough outage analyses to identify root causes, prevent recurrence, and strengthen system reliability.
- Service Management & Governance
- Administer IT service management processes including performance, event, incident, problem/escalation, configuration, and change management within an agile framework.
- Define, monitor, and report on KPIs and customer‑facing service metrics to assess process health and service quality.
- Monitoring & Optimization
- Partner with engineering teams to design and implement monitoring tools that track usage, performance, and costs.
- Optimize operations and reduce expenses through proactive monitoring and resource management.
- Knowledge Sharing & Documentation
Requirements
- Bachelor’s degree in computer science, IT, or related field.
- 8+ years of experience managing large‑scale IT operations in cloud environments.
- 5+ years of hands‑on expertise with AWS.
- Strong skills in Terraform, automation, and cloud configuration management.
- Experience with API Gateway (policy creation, API security, routing, monitoring).
- Proficiency in Solace brokers (publish/subscribe, queues, request/reply, MQTT).
- Knowledge of secure file transfer protocols (SFTP, HTTP/API, PGP) and workflow integration.
- Familiarity with DevOps practices, Agile methodologies, and ITIL processes.
- Hands‑on experience with monitoring tools (ELK stack / OpenSearch, Elasticsearch, Logstash, Kibana).
- Excellent troubleshooting, problem‑solving, and stakeholder management skills.
- Strong communication, planning, and team leadership abilities.