Enable job alerts via email!

Senior Machine Learning Engineer

Signify Technology

San Francisco (CA)

Hybrid

USD 170,000 - 250,000

Full time

4 days ago

Be an early applicant

Boost your interview chances

Create a job specific, tailored resume for higher success rate.

Job summary

A leading technology firm seeks a Software Engineer focused on AI Performance Optimization to join a stealth-mode team. This critical role involves building a high-performance AI inference platform, working with cutting-edge technologies and collaborating with leading researchers in the domain. Candidates will leverage their expertise to enhance AI model efficiency and solve complex performance challenges. The position offers competitive compensation, equity, and the flexibility of remote work, all while contributing significantly to the AI revolution.

Benefits

Medical insurance

Vision insurance

401(k)

Qualifications

Strong background in software engineering or applied machine learning.
Experience with performance tuning and profiling.
Familiarity with AI frameworks and compiler frameworks.

Responsibilities

Apply research to improve inference speed for LLMs.
Implement performance techniques like quantization and caching.
Architect systems for distributing AI workloads across GPUs.

Skills

Performance tuning

Profiling

GPU programming

Distributed systems

Memory management

Education

Advanced degree in CS, engineering, or applied math

Tools

CUDA

C++

Python

PyTorch

TensorFlow

Direct message the job poster from Signify Technology

Software Engineer – AI Performance Optimization

We’re looking for a Software Engineer focused on AI Performance to help push the boundaries of efficient inference at scale. As generative AI continues to explode in size and complexity, inference is becoming the next major bottleneck. This role is a chance to be part of a stealth-mode team building a high-performance, hardware-aware AI inference platform from the ground up.

Our mission is to radically improve how AI models run in production—through advanced compiler design, low-level GPU optimization, and deep integration with modern AI frameworks like PyTorch and LangChain. You’ll work at the cutting edge of model acceleration, collaborating with top researchers and engineers from world-class institutions and successful AI startups.

What You’ll Do

Apply state-of-the-art research to improve inference speed and quality for the latest LLMs and generative models.
Implement performance techniques like quantization, KV caching, FlashAttention, and fused ops to maximize throughput and minimize latency.
Architect and optimize systems that distribute AI workloads across GPU clusters and multi-node deployments.
Dive deep into GPU kernels, CUDA code, and low-level runtime behavior to accelerate AI execution paths.
Profile and benchmark model runtimes to uncover and fix performance bottlenecks at every level of the stack.

What We’re Looking For

A strong background in software engineering, systems, or applied machine learning.
Experience with performance tuning, profiling, or systems-level optimization.
Comfortable working close to the metal with CUDA, C++, and Python.
Familiar with AI frameworks such as PyTorch, TensorFlow, or ONNX.
Advanced degree in CS, engineering, or applied math.
Familiarity with compiler frameworks like MLIR or TVM.
Experience with vLLM, LangChain, or custom model runtimes.
Deep knowledge of distributed systems, memory management, or GPU programming.

Why This Role Stands Out

Work on one of the most urgent challenges in scaling generative AI: inference efficiency.
Collaborate directly with leading minds in AI systems research.
Influence and build the full software stack for real-time, high-performance AI serving.
Join a fast-moving, ambitious team with deep technical roots and a clear product vision.
Flexible remote work, competitive comp, equity, and the chance to have impact at the infrastructure layer of the AI revolution.

Ready to optimize the future of AI? Let’s talk.

Seniority level

Seniority level
Mid-Senior level

Employment type

Employment type
Full-time

Job function

Job function
Information Technology and Engineering
Industries
Staffing and Recruiting and Software Development

Referrals increase your chances of interviewing at Signify Technology by 2x

Inferred from the description for this job

Medical insurance

Vision insurance

401(k)

Get notified when a new job is posted.

San Francisco, CA $170,000.00-$250,000.00 2 weeks ago

San Francisco, CA $135,000.00-$200,000.00 3 months ago

San Francisco, CA $130,000.00-$230,000.00 5 months ago

Software Engineer, AI Platform - New Grad

San Jose, CA $119,000.00-$177,000.00 6 days ago

New Grads 2025 - Software Engineer, Algorithm

San Jose, CA $120,000.00-$165,000.00 9 months ago

San Jose, CA $137,500.00-$236,500.00 1 month ago

San Francisco, CA $140,000.00-$215,000.00 1 month ago

San Jose, CA $113,500.00-$250,000.00 1 week ago

New Grads 2025 - Software Engineer - Computer Vision/Deep Learning

San Jose, CA $120,000.00-$165,000.00 8 months ago

San Jose, CA $120,700.00-$228,600.00 1 week ago

Machine Learning Researcher - New College Grad 2025

Mountain View, CA $120,000.00-$200,000.00 5 months ago

San Jose, CA $120,700.00-$228,600.00 1 week ago

San Jose, CA $113,500.00-$250,000.00 1 week ago

Software Engineer, Machine Learning, YouTube Ads

Mountain View, CA $141,000.00-$202,000.00 4 days ago

Cupertino, CA $1,000.00-$20,000.00 1 month ago

Mountain View, CA $125,400.00-$188,100.00 2 weeks ago

San Francisco, CA $130,000.00-$238,000.00 2 days ago

New Grads 2025 - General Software Engineer

San Jose, CA $120,000.00-$165,000.00 4 months ago

San Jose, CA $120,700.00-$228,600.00 2 days ago

San Francisco, CA $88,000.00-$140,000.00 1 month ago

Palo Alto, CA $200,000.00-$300,000.00 1 month ago

Machine Learning Engineer Intern (Search -TikTok.US) - 2025 Summer (BS/MS)

We’re unlocking community knowledge in a new way. Experts add insights directly into each article, started with the help of AI.

Get your free, confidential resume review.

or drag and drop a PDF, DOC, DOCX, ODT, or PAGES file up to 5MB.

Similar jobs