Job Search and Career Advice Platform

Enable job alerts via email!

Network Engineer AI/ML Infrastructure

Boson AI

Toronto

On-site

CAD 150,000 - 250,000

Full time

Today
Be an early applicant

Generate a tailored resume in minutes

Land an interview and earn more. Learn more

Job summary

A leading AI technology company in Toronto is seeking an experienced Network Engineer to design and optimize their high-performance networking infrastructure. The ideal candidate should have over 4 years of experience in network engineering, particularly in environments using InfiniBand and high-speed Ethernet. This role focuses on ensuring the performance of GPU-to-GPU communication and planning for future capacity as the company scales. Strong problem-solving skills and hands-on experience with network security are essential.

Qualifications

  • 4+ years of network engineering experience in production environments.
  • Strong understanding of L2/L3 networking protocols.
  • Hands-on experience with high-speed networking.

Responsibilities

  • Configure and maintain InfiniBand and Ethernet fabrics.
  • Optimize network performance for RDMA.
  • Manage and troubleshoot network bottlenecks.

Skills

Network engineering experience
Understanding of L2/L3 networking protocols
Experience with high-speed networking
Troubleshooting skills

Tools

InfiniBand
Ethernet switches
Firewall solutions
Network monitoring tools
Job description
About The Role

We're seeking an experienced Network Engineer to design, build, and optimize the high-performance networking infrastructure powering our AI/ML operations in Toronto. You'll work at the cutting edge of network technology—managing InfiniBand and ultra-high-speed Ethernet fabrics that connect NVIDIA H100 and A100 GPUs, over 20PB of Ceph storage, and hundreds of servers.

You'll be hands-on with the full lifecycle of our network infrastructure: planning, building, testing, deploying, and keeping everything running at peak performance. That means troubleshooting issues as they arise, monitoring network performance and throughput, developing automation to streamline operations, and working closely with HPC and ML teams to ensure they have the bandwidth they need. You'll also help us plan for future capacity and evaluate emerging network technologies as we scale to meet increasingly demanding workloads.

Responsibilities
  • Configure and maintain InfiniBand and high-speed Ethernet fabrics
  • Optimize network performance for RDMA, and GPU-to-GPU communication
  • Manage network switches (Mellanox, NVIDIA, Micas Networks)
  • Troubleshoot network bottlenecks and latency issues
  • Plan and execute network upgrades and expansions
  • Network security implementation (firewalls, VLANs, ACLs)
  • Collaborate on storage network optimization infrastructure monitoring
Minimum Qualifications
  • 4+ years of network engineering experience in production environments
  • Strong understanding of L2/L3 networking protocols (TCP/IP, BGP, OSPF, VLANs)
  • Hands‑on experience with high‑speed networking (100Gb+ Ethernet and InfiniBand)
  • Hands‑on experience with network security (firewalls, ACLs, network segmentation)
  • Knowledge of HPC network topologies
  • Experience with InfiniBand fabrics including RDMA, RoCE, IPoIB
  • Strong troubleshooting and problem‑solving skills
Preferred Qualifications
  • Experience in data center environments or AI/ML infrastructure
  • Hands‑on experience with high‑performance Ethernet switches (e.g., Broadcom Tomahawk), and latest InfiniBand switches (e.g., Nvidia/Mellanox)
  • Experience optimizing networks for GPU‑to‑GPU communication
  • Experience with open‑source firewall solutions (OPNsense, pfSense, or similar)
  • Experience with network automation tools
  • Understanding of distributed storage networking (Ceph cluster networks)
  • Familiarity with network monitoring and observability tools (Prometheus, Grafana)
  • Knowledge of multi‑site network connectivity and WAN optimization
  • Familiarity with cloud networking in at least one platform (AWS, GCP, or Azure) including VPC design, site‑to‑site VPN configuration, Direct Connect/ExpressRoute/Cloud Interconnect, hybrid cloud connectivity, and cloud‑to‑datacenter network integration
Benefits

$150,000 - $250,000 a year

If you're a natural problem‑solver with a passion for continuous learning, we'd love to hear from you.

Get your free, confidential resume review.
or drag and drop a PDF, DOC, DOCX, ODT, or PAGES file up to 5MB.