Enable job alerts via email!

AI/ML Application Performance Engineer

Cornelis Networks, Inc.

Chesterbrook (Chester County)

Remote

USD 100,000 - 150,000

Full time

7 days ago
Be an early applicant

Boost your interview chances

Create a job specific, tailored resume for higher success rate.

Job summary

Cornelis Networks, a leader in high-performance networking solutions, is seeking an AI/ML Application Performance Engineer to optimize AI applications and benchmarks for the semiconductor industry. This remote role offers an opportunity to collaborate with industry experts and drive innovation in cutting-edge technologies. The ideal candidate will have a strong background in AI frameworks, performance benchmarking, and experience in high-performance computing environments, along with the ability to effectively communicate findings across teams.

Benefits

Competitive compensation package
Equity and incentives
Health and retirement benefits
Flexible work environment
Generous paid holidays
401(k) with company match
Open Time Off (OTO)

Qualifications

  • 3-5 years of experience in running HPC and/or AI/ML applications.
  • Hands-on experience with analyzing and optimizing networks.
  • Excellent written and verbal communication skills.

Responsibilities

  • Perform benchmarking and optimization of AI/ML applications.
  • Develop and maintain software for running AI/ML benchmarks.
  • Collaborate with cross-functional teams and assist in performance benchmarking.

Skills

Benchmarking
Optimization
AI Frameworks
Python
UNIX/Linux

Education

Bachelor’s degree in computer science, engineering, math
Master’s preferred

Tools

HPC resource management and job scheduling systems
Message Passing Interface (MPI)
Profiling tools (NVIDIA Nsight Systems)

Job description

Cornelis Networks delivers the world’s highest performance scale-out networking solutions for AI and HPC datacenters. Our differentiated architecture seamlessly integrates hardware, software and system level technologies to maximize the efficiency of GPU, CPU and accelerator-based compute clusters at any scale. Our solutions drive breakthroughs in AI & HPC workloads, empowering our customers to push the boundaries of innovation. Backed by top-tier venture capital and strategic investors, we are committed to innovation, performance and scalability - solving the world’s most demanding computational challenges with our next-generation networking solutions.

We are a fast-growing, forward-thinking team of architects, engineers, and business professionals with a proven track record of building successful products and companies. As a global organization, our team spans multiple U.S. states and six countries, and we continue to expand with exceptional talent in onsite, hybrid, and fully remote roles.

Cornelis Networks is hiring talented AI/ML Application Performance Engineerto help drive innovation and contribute to the development of cutting-edge technologies in the semiconductor industry. In this role, you will be responsible for providing technical expertise in AI and Machine Learning (ML) that can be applied to a diverse range of AI/ML use cases,working alongside a team of industry experts to shape the future of high-performance networking solutions.

Key Responsibilities:

  • Perform benchmarking and optimization of open source and industry-standard AI/ML applications with current and future HPC hardware
  • Develop, execute, and maintain software required to run AI/ML applications and benchmarks
  • Participate in the development of supporting libraries and middleware
  • Assist sales and marketing teams by delivering proof points and performance benchmarking comparisons between Cornelis Omni-Path and competing interconnects
  • Collect and analyze performance data, identifying performance limitations, and determining the best approach and techniques to improve performance
  • Present research findings both within company and to external stakeholders
  • Collaboration with cross-functional teams across all levels of a corporation to evangelize the capabilities and performance advantages of Cornelis products

Preferred Qualifications:

  • Knowledge of HPC resource management and job scheduling systems (e.g., SLURM, PBS).
  • Hands-on experience with analyzing and optimizing networks to improve scale-out performance using a range of profiling tools such as NVIDIA Nsight Systems
  • Experience with AI frameworks like NeMo, PyTorch Lightning, Megatron-LM, and DeepSpeed

Minimum Qualifications:

  • Bachelor’s degree (Master’s preferred) in computer science, engineering, math, or related technical discipline
  • 3-5 years of experience running HPC and/or AI/ML applications on clusters
  • Ability to set up, run, and analyze AI/ML application benchmarks and demonstrate a proficient understanding in message passing, scaling optimization, and identifying performance bottlenecks
  • Ability to modify AI/ML models and distribute training across networks, outside of a single GPU compute platform
  • Experience with Message Passing Interface (MPI) and compiling software with a variety of compilers (Intel, gcc, etc.) and libraries
  • Extensive Python and shell script experience
  • Experience with HPC network architectures such as Omni-Path, InfiniBand, or Ethernet
  • Experience operating in UNIX or Linux computing environment
  • Excellent written and verbal communication skills

Location: This is a remote position for employees residing within the United States.

We offer a competitive compensation package that includes equity, cash, and incentives, along with health and retirement benefits. Our dynamic, flexible work environment provides the opportunity to collaborate with some of the most influential names in the semiconductor industry.

At Cornelis Networks your base salary is only one component of your comprehensive total rewards package. Your base pay will be determined by factors such as your skills, qualifications, experience, and location relative to the hiring range for the position. Depending on your role, you may also be eligible for performance-based incentives, including an annual bonus or sales incentives.

In addition to your base pay, you’ll have access to a broad range of benefits, including medical, dental, and vision coverage, as well as disability and life insurance, a dependent care flexible spending account, accidental injury insurance, and pet insurance. We also offer generous paid holidays, 401(k) with company match, and Open Time Off (OTO) for regular full-time exempt employees. Other paid time off benefits include sick time, bonding leave, and pregnancy disability leave.

Cornelis Networks does not accept unsolicited resumes from headhunters, recruitment agencies, or fee-based recruitment services. Cornelis Networks is an equal opportunity employer, and all qualified applicants will receive consideration for employment without regard to race, color, religion, sex, sexual orientation, gender identity or expression, pregnancy, age, national origin, disability status, genetic information, protected veteran status, or any other characteristic protected by law. We encourage applications from all qualified candidates and will accommodate applicants’ needs under the respective laws throughout all stages of the recruitment and selection process.

Get your free, confidential resume review.
or drag and drop a PDF, DOC, DOCX, ODT, or PAGES file up to 5MB.

Similar jobs

AI/ML Application Performance Engineer

Cornelis Networks

Chesterbrook

Remote

USD 127,000 - 184,000

9 days ago

Senior Performance Engineer - Load Testing & System Optimization (Remote)

Cognizant North America

St. Louis

Remote

USD 83,000 - 132,000

22 days ago

Senior Performance Engineer – Load Testing & System Optimization (Remote)

Cognizant

St. Louis

Remote

USD 83,000 - 132,000

21 days ago