Enable job alerts via email!

Data Centre Operations Engineer | Kuala Lumpur, MY

Hays

Kuala Lumpur

On-site

MYR 60,000 - 80,000

Full time

5 days ago
Be an early applicant

Boost your interview chances

Create a job specific, tailored resume for higher success rate.

Job summary

A leading company in the AI Infrastructure sector is seeking a skilled professional to manage GPU clusters and data centre systems. The ideal candidate will have a strong background in system operations, cloud services, and IT hardware. You'll oversee daily operations, ensure system performance, and collaborate with cross-functional teams to support AI workloads. Attractive incentives, professional training, and medical insurance are offered.

Benefits

Attractive Employee Incentive Scheme
Bonuses
Allowances
Professional training and development programme
Medical Insurance

Qualifications

  • 2+ years of experience in system operations within IT infrastructure or cloud services.
  • Hands-on experience in IT hardware replacement.

Responsibilities

  • Oversee daily operations of GPU clusters and data centre systems.
  • Monitor system health and performance using industry-standard tools.
  • Respond to and resolve operational incidents.

Skills

System Operations
Linux Fundamentals
Kubernetes
Troubleshooting

Education

Bachelor's degree in Computer Science
Bachelor's degree in Information Technology
Bachelor's degree in Electrical Engineering

Tools

Prometheus
Grafana

Job description

Your New Company

My client's nature of business is within the AI Infrastructure sector, providing PaaS / SaaS platforms.

Your New Role

  • Bachelor's degree in Computer Science, Information Technology, Electrical Engineering, or a related field. Equivalent experience will be considered.
  • 2+ years of experience in system operations within IT infrastructure or cloud services.
  • Hands-on experience in IT hardware replacement.
  • Experience in data centre operations, system administration, or a similar role.
  • Knowledge of server hardware, including GPU cards, CPU configurations, and storage solutions.
  • Understanding of Linux fundamentals and Kubernetes environments.
  • Familiarity with monitoring tools (e.g., Prometheus, Grafana) and logging frameworks.


What You'll Need to Succeed

  • Oversee the daily operations of GPU clusters and data centre systems.
  • Monitor system health, performance, and capacity using industry-standard tools and frameworks.
  • Respond to and resolve operational incidents, ensuring minimal downtime and maximum availability.
  • Manage the deployment, configuration, and optimisation of GPU servers, network devices, and supporting infrastructure (e.g., CPU servers and storage).
  • Perform hardware diagnostics and preventative maintenance for GPU servers, storage, and networking equipment.
  • Troubleshoot system issues related to hardware, operating systems, and applications.
  • Work closely with cross-functional teams, including network engineers, system administrators, and developers, to support AI workloads.
  • Maintain accurate documentation for system configurations, processes, and incident reports.
  • Implement and enforce security best practices in system operations.
  • Identify and propose improvements to enhance system performance, reduce costs, and optimise resource utilisation.


What You'll Get in Return

  • Attractive Employee Incentive Scheme + Bonuses + Allowances
  • Professional training and development programme
  • Medical Insurance


What You'll Need to Do Now

If you think this job is for you, what are you waiting for? Hit "apply now" for more details or a confidential discussion. Please contact Julian Yew at Hays on +603-5870-5003
Or email
Julian.Yew@hays.com.my.

At Hays, we value diversity and are passionate about placing people in a role where they can flourish and succeed. We actively encourage people from diverse backgrounds to apply.
Boost your career
Find thousands of job opportunities by signing up to eFinancialCareers today.
Get your free, confidential resume review.
or drag and drop a PDF, DOC, DOCX, ODT, or PAGES file up to 5MB.