Enable job alerts via email!

Associate Director, Operations - GPU Cloud

Singtel Group

Singapore

On-site

SGD 120,000 - 180,000

Full time

3 days ago
Be an early applicant

Boost your interview chances

Create a job specific, tailored resume for higher success rate.

Job summary

A leading company in telecommunications seeks an Associate Director, Operations for its GPU Cloud segment. This pivotal role involves overseeing the GPU Infrastructure-as-a-Service platform, managing operations, and ensuring compliance with service level agreements while leading a high-performing team. The successful candidate will excel in operational management and possess a deep understanding of cloud infrastructure strategies.

Benefits

Full suite of health and wellness benefits
Ongoing training and development programs
Internal mobility opportunities

Qualifications

  • Proven track record in managing complex cloud and data centre infrastructure.
  • Experience in liquid cooling operations preferred.
  • Strong understanding of hardware infrastructure operation and security.

Responsibilities

  • Manage GPU Infrastructure and operations to optimize performance and cost.
  • Lead and mentor operations teams to achieve SLA compliance and operational excellence.
  • Develop operational strategies for GPU Cloud infrastructure reliability.

Skills

Leadership
Communication
Problem-Solving
Operational Management
Linux Administration
Security Management
Technical Expertise

Job description

Select how often (in days) to receive an alert:

Associate Director, Operations - GPU Cloud

To lead and manage the GPU Infrastructure-as-a-Service (IaaS) platform. This role will oversee the GPU infrastructure, storage infrastructure and associated services, ensuring seamless integration and operation.

Infrastructure and Resource Management:

  • Manage the maintenance and operations of Data centre with liquid cooling setup that hosts the GPU cloud.
  • Optimization of GPU infrastructure and associated hardware.
  • Optimize resource allocation to meet the performance requirements of both data centre operations and cloud hardware operations, as well as cost-effectiveness goals.
  • Lead the operations team to ensure compliance to the SLA needs of customers and the product.
  • Enhance system scalability and reliability through automation and continuous improvements. Enforce industry-standard operational process with reference to standards like ISO 27001 or equivalent in the data centre and cloud operations

Operational Excellence:

  • Handle general incidents, including operations management and escalation management across the AI cloud product.
  • Develop and implement operational strategies to ensure the reliability and efficiency of our GPU Cloud infrastructure.
  • Collaborate with other departments to streamline processes, enhance customer experience, and meet service level agreements.
  • Support services and improve the lifecycle of GPU cloud hardware and the data centre environment with monitoring, logging, and alerting through deployment, operation, and refinement.
  • Establish Ops systems/processes (SOPs, EOPs etc) and to manage daily operational issues.
  • Possess strong operational management skill set, which involves organising the internal cross functional teams and external vendors to ensure an efficient and resilient ops setup.

Team Management:

  • Build and lead a high-performing operations team to foster a culture of innovation, collaboration, and continuous improvement.
  • Set clear goals and objectives, mentor team members, and drive professional development initiatives.
  • Oversee resource management and allocation to optimize team productivity and effectively meet operation goals.

Security and Compliance:

  • Lead security incident management processes, focusing on identification, containment, and resolution of threats in the data center environment and GPU cloud hardware.
  • Enforce best practices for security and compliance.
  • Stay abreast of industry security trends and implement measures to safeguard customer data and platform integrity.

Skills for Success

  • Proven track record of managing and escalating complex cloud and data centre infrastructure issues and leading operation teams.
  • Experience in liquid cooling operations would be great
  • Strong understanding of hardware infrastructure operation, security, management, and best practices.
  • Excellent leadership, communication, and interpersonal skills, with the ability to lead cross-functional teams.
  • Proficiency in managing customer interactions and improving service delivery to enhance customer experience.
  • Experienced in Linux and hypervisor administration for GPU infrastructure and cloud.
  • Complex technical problem-solving with a proactive approach to system operation and optimization.
  • Knowledge of storage technologies and experience in capacity planning, troubleshooting, and data protection.
  • Experience in GPU and GPU infrastructure management, including configuration, monitoring, and performance.

Rewards that Go Beyond

  • Full suite of health and wellness benefits
  • Ongoing training and development programs
  • Internal mobility opportunities

Your Career Growth Starts Here. Apply Now!

Get your free, confidential resume review.
or drag and drop a PDF, DOC, DOCX, ODT, or PAGES file up to 5MB.