Enable job alerts via email!

Senior Machine Learning Engineer

Ipro Networks Pte. Ltd.

Palo Alto (CA)

Remote

USD 220,000 - 300,000

Full time

5 days ago

Be an early applicant

Boost your interview chances

Create a job specific, tailored resume for higher success rate.

Job summary

A leading company in AI infrastructure is seeking a Senior Machine Learning Engineer to optimize state-of-the-art models across multiple hardware environments. This role involves performance tuning, collaboration with engineering teams, and leveraging advanced knowledge of PyTorch and transformer models to enhance the efficiency and accessibility of AI technologies.

Benefits

Generous stock options

Full health coverage

Flexible PTO

Home office support

Qualifications

Deep experience in profiling and optimizing PyTorch code for performance.
Familiarity with PyTorch profiler and memory/trace viewers.
Strong understanding of transformer models and attention mechanisms.

Responsibilities

Design and maintain abstractions for efficient model performance across hardware platforms.
Collaborate with teams to uncover bottlenecks and enhance performance.
Benchmark model performance to guide product decisions.

Skills

Profiling and optimizing PyTorch code

Deep understanding of transformer models

Hands-on experience with parallel inference strategies

Tools

torch.compile

torch.XLA

Senior Machine Learning Engineer (Remote, US)

Compensation: $220K–$300K + Equity
Department: Applied Research
Location: Remote (US-based) | Full-Time

We’re seeking a Senior Machine Learning Engineer to help optimize the performance of state-of-the-art foundation models across a diverse range of hardware environments. If you're passionate about performance tuning, systems-level thinking, and scaling ML workloads beyond NVIDIA/CUDA constraints, this is your chance to shape the frontier of AI infrastructure.

What You’ll Be Doing:

Design and maintain abstractions that scale model performance efficiently across heterogeneous hardware platforms—not just CUDA/NVIDIA.
Profile and optimize memory usage, latency, and throughput in PyTorch; build or integrate low-level solutions (e.g., Triton kernels) as needed.
Benchmark our model and system performance to guide product decisions around cost, throughput, and deployment tradeoffs.
Collaborate with hardware and systems partners to uncover bottlenecks and push for performance improvements in future iterations.
Work hand-in-hand with research and engineering teams to ensure systems are planned and built with efficiency in mind from the start.

Qualifications:

Deep experience profiling and optimizing PyTorch code for performance (memory, latency, throughput).
Familiarity with tools like torch.compile, torch.XLA, PyTorch profiler, and memory or trace viewers.
Experience building performance-portable abstractions and optimizing ML pipelines for a variety of hardware/software stacks.
Strong understanding of transformer models and modern attention mechanisms.
Hands-on work with parallel inference strategies (tensor parallelism, pipeline parallelism, etc.).

Bonus Points For:

Proficiency with Triton or CUDA, especially writing custom kernels and fusions for hot code paths.
Experience writing high-performance parallel C++, particularly in a machine learning context (e.g., data loading, inference).
Previous work building efficient ML demos or inference environments (Gradio, Docker, etc.).
Experience deploying models on non-NVIDIA hardware platforms.

Why This Role Matters:

You’ll be building the technical backbone that allows cutting-edge multimodal AI models to run smoothly and efficiently across the world. Your work will directly influence how our models scale and how accessible they are in terms of cost, performance, and reach.

Compensation & Benefits:

Base Salary: $220,000 – $300,000 / year (based on experience & location)
Equity: Generous stock options
Benefits: Full health coverage, flexible PTO, home office support, and more

Join a lean, expert team building next-gen AI from the ground up. If you thrive at the intersection of ML, systems, and performance—and love solving deep efficiency challenges—we want to hear from you.

Get your free, confidential resume review.

or drag and drop a PDF, DOC, DOCX, ODT, or PAGES file up to 5MB.

Similar jobs