Job Search and Career Advice Platform

Enable job alerts via email!

Senior AI Inference Engineer

Tether Operations Limited

Remote

GBP 80,000 - 100,000

Full time

Yesterday
Be an early applicant

Generate a tailored resume in minutes

Land an interview and earn more. Learn more

Job summary

A tech company in Greater London seeks a C++ Developer to work on AI inference engines for edge devices. The role emphasizes optimizing model performance and collaborating with researchers. Candidates should have strong C++ skills, experience with Llama.cpp, and a related degree. The ideal candidate will have a solid AI R&D track record and the ability to adapt to new technologies.

Qualifications

  • Excellent programming skills in C++, experience in Javascript is a bonus.
  • Strong experience with Llama.cpp and ggml inference engines.
  • Good understanding of deep learning concepts and model architectures.
  • Experience with Watch's and LLMs.
  • Demonstrated ability to rapidly assimilate new technologies.

Responsibilities

  • Work on deploying machine learning models to edge devices using llama.cpp and ONNX.
  • Collaborate with researchers on coding, training and transitioning models.
  • Integrate AI features into existing products.

Skills

C++ programming
Javascript
Llama.cpp experience
ggml inference engines
Deep learning concepts
Rapid assimilation of new technologies

Education

Degree in Computer Science, AI, or Machine Learning
Job description

You’ll work on the C++ layer that powers local AI, porting and enhancing inference engines like llama.cpp, ONNX and similar, to run efficiently on Нижних devices. Your focus is on the runtime: making models load faster, run leaner, and perform well across different hardware. You’ll ensure that the inference layer is stable, optimized, and ready for integration with the rest of the stack.

This role is for engineers who want to work close to the metal, enabling private and fast on-device AI without relying on cloud infrastructure.

Responsibilities
  • Work on deploying machine learning models to edge devices using the frameworks: llama.cpp, ggml, ONNX
  • Collaborate closely with researchers to assist in coding, training and transitioning models from research to production environments
  • Integrate AI features into existing products, enriching them with the latest advancements in machine learning
Qualifications=
  • Excellent programming skills in C++, experience in Javascript is a bonus
  • Strong experience with Llama.cpp and ggml inference engines, which facilitates the deployment of models to specific GPU architectures
  • Good understanding of deep learning concepts and model architectures
  • Experience with Watch's and LLMs
  • Demonstrated ability to rapidly assimilate new technologies and techniques
  • A degree in Computer Science, AI, Machine Learning, or a related field, complemented by a solid track record in AI R&D
Important information for candidates
  • Apply only through our official channels. We do not use المرحلة third-party platforms or agencies for recruitment unless clearly stated. All open roles are listed on our official careers page: https://tether.recruitee.com/
  • Verify the recruiter’s identity. All our recruiters have verified LinkedIn profiles. If you’re unsure, you can confirm their identity by checking their profile or contacting us through our website.
  • Be cautious of unusual communication methods. We do not conduct interviews over WhatsApp, Telegram, or SMS. All communication is done through official company emails and platforms.
  • Double-check email addresses. All communication from us will come from emails ending in @tether.to or @tether.io
  • We will never request payment or financial details. If someone asks for personal financial information or payment at any point during the hiring process, it is a scam. Please report it immediately.
Get your free, confidential resume review.
or drag and drop a PDF, DOC, DOCX, ODT, or PAGES file up to 5MB.