Enable job alerts via email!

Infrastructure/Server Engineer

Talent Mingle

Freemont

On-site

CAD 80,000 - 100,000

Full time

30+ days ago

Boost your interview chances

Create a job specific, tailored resume for higher success rate.

Job summary

An innovative firm is on the lookout for a skilled engineer to enhance their server management team. This role involves overseeing a fleet of server racks, ensuring high performance and reliability across various hardware components. You will engage in troubleshooting complex hardware issues while collaborating with vendors for maintenance and updates. The ideal candidate will possess a robust background in Linux systems, scripting, and network protocols, along with a passion for problem-solving. Join this dynamic team to contribute to cutting-edge projects in a supportive and challenging environment.

Qualifications

5+ years of experience in server rack management and lab infrastructure management.
Strong experience with Linux or Unix operating systems and scripting languages.

Responsibilities

Manage and maintain server racks and network infrastructure for the lab.
Support failure analysis initiatives and validate hardware failures.

Skills

Server Rack Management

Hardware Debugging

Linux Operating Systems

Scripting Languages

Network Protocols

Problem-Solving Skills

Communication Skills

Education

Bachelor’s Degree in Computer Science

Master’s Degree in Electrical Engineering

Tools

Docker

Kubernetes

VMware

KVM

GPFS/IBM Scale

Dediprog Tools

We are seeking a highly motivated and skilled engineer to join our team. The ideal candidate will have a strong background in managing server hardware including network, storage, compute, and AI. In addition, experienced in validation of failed server hardware.

Roles and Responsibilities:

Manage and maintain fleet of server racks from different OEMs (network, storage, compute, and AI hardware).
High performance clustered file system access and administration, preferably GPFS/IBM Scale.
FC/Infiniband based SAN administration
Interface with OEM vendors for firmware and driver update related maintenance.
Support failure analysis initiatives through the utilization of available HW resources to validate rack-level, system level, module level failures from different Meta's datacenters.
Manage and maintain network infrastructure for the lab, including switches, routers, and firewalls.
Configure and manage network protocols, such as TCP/IP, DNS, and DHCP.
Ensure network security and compliance with company policies and industry standards.
Experience working with LLMs and popular frameworks such as TensorFlow or PyTorch.
Design and implement containerized applications using Docker and Kubernetes.
Manage and maintain virtual machines using popular hypervisors, such as VMware or KVM.
Provide support with failure analysis labs - inventory management, safety audits, and maintaining access controls to critical server hardware.
Support root cause analysis and diagnosing hardware/software issues. Isolate failures in platform, firmware, BIOS, CPLD, and other applications.
Experience working with dediprog tools (FW/BIOS debug).
Provide regular updates to failure analysis lead and collaborate with the team on different mission critical projects.

Qualifications:

Bachelor’s or master’s degree in computer science, Electrical Engineering, or related field.
5+ years of experience in server rack management, lab infrastructure management, and/or related fields.
Experience with debugging and troubleshooting complex hardware issues, including storage, compute, and AI.
Strong experience with Linux (RedHat, Fedora, CentOS, etc.) or Unix operating systems.
Experience with scripting languages, such as Python, PowerShell, PHP, Perl, etc.
Experience working with containerization, Kubernetes, docker, and virtual machine management.
Experience with failed server hardware validation, including BIOS/CPLD FW debug.
Knowledge of network protocols, including TCP/IP, DNS, and DHCP.
Strong knowledge of server hardware components, including motherboards, power distribution boards, and storage systems.
Strong problem-solving skills and ability to work independently.
Excellent communication and documentation skills.

Get your free, confidential resume review.

or drag and drop a PDF, DOC, DOCX, ODT, or PAGES file up to 5MB.

Remote

CAD 90,000 - 150,000

30+ days ago

Infrastructure/Server Engineer

Talent Mingle

Freemont

On-site

CAD 80,000 - 100,000

Full time

Job summary

Qualifications

Responsibilities

Skills

Education

Tools

Job description

Similar jobs

Software Engineer - packaging - optimize Ubuntu Server for public clouds

Hamilton

Remote

CAD 60,000 - 90,000

Software Engineer - packaging - optimize Ubuntu Server for public clouds

Montreal

Remote

CAD 80,000 - 120,000

Software Engineer - packaging - optimize Ubuntu Server

Waterloo

Remote

CAD 70,000 - 90,000

Senior Software Engineer - packaging - optimize Ubuntu Server

Calgary

Remote

USD 80,000 - 120,000

Senior Software Engineer - packaging - optimize Ubuntu Server

Ottawa

Remote

USD 80,000 - 120,000

Software Engineer - packaging - optimize Ubuntu Server for public clouds

Edmonton

Remote

CAD 60,000 - 95,000

Software Engineer - packaging - optimize Ubuntu Server

Ottawa

Remote

USD 60,000 - 100,000

Software Engineer - packaging - optimize Ubuntu Server

Hamilton

Remote

USD 70,000 - 110,000

Staff Software Engineer, Server Security

Remote

CAD 90,000 - 150,000