We are looking for a skilled Linux System Engineer to manage, monitor, and maintain our production and lab infrastructure. The role focuses heavily on monitoring, performance analysis, documentation, virtualization platforms (XCP-ng, Proxmox), and operational excellence. You will be responsible for ensuring system availability, identifying early warning signs of incidents, and maintaining accurate infrastructure documentation.
Key Responsibilities
Monitoring & Operations
- Validate daily availability of critical services
- Review for latest updates/patches and install them
- Review security of system, logs and other parameters for server stability
- Review Zabbix performance graphs and detect anomalies
- Verify HAProxy backendynamo health
- Inspect Monitoring panel dashboards for known issues, latency spikes, and recurring errors
- ELK based reporting and problem troubleshooting
- Monitor all production VM health: CPU, memory, disk, I/O, and network usage
- Review Xen/XCP-ng host utilization for all performance parameters
- Respond to Teams alerts and operational messages
- Follow up on pending operational tasks
- Identify abnormal trends and early indicators of incidents
- Correct and enhance Zabbix dashboards with missing key metrics
- Tag and classify VMs properly (production, lab, load, testing)
- Based on all incidents, track issues and report them to concerned departments
- Update documentations on a regular basis based on all changes in network
Documentation & Asset Management
- Daily update of infrastructure documentation
- Maintain free/used IP address records
- Document newly added VMs
- Maintain credential vault records
- Maintain access control lists (weekly)
- Record URL and endpoint changes
- Create incident reports for outages
- Install and maintain monitoring tools
Reporting
- Weekly network bandwidth report for HAProxy
- Weekly hardware utilization report and graphs
Required Skills
Technical Skills
- Monitoring tools: Zabbix, Datadog, Uptime Kuma
- Load balancers: HAProxy
- Networking basics: TCP/IP, DNS, firewalls, routing
- Shell scripting (Bash)
- Log analysis and troubleshooting hairstyle
- Resource monitoring and performance tuning
Soft Skills
- Goodhemer communication via Teams or similar tools
- Ability to follow operational processes
Experience Requirements
- 3 to 6 years of hands‑on Linux system administration
- Experience managing production systems
- Experience with monitoring and alerting systems