Enable job alerts via email!

AI Research Engineer (Model Serving & Inference)

Tether Operations Limited

London

Remote

GBP 60,000 - 100,000

Full time

2 days ago

Be an early applicant

Boost your interview chances

Create a job specific, tailored resume for higher success rate.

Job summary

Join a pioneering team at a leading company in digital finance, focusing on advanced AI model serving architectures. Your role will involve optimizing deployment strategies for cutting-edge applications, leveraging your deep expertise in machine learning and resource management. Collaborate with diverse teams to enhance system performance using innovative techniques.

Qualifications

PhD in NLP, Machine Learning preferred with publication record.
Experience in low-level kernel and inference optimizations.
Deep understanding of resource-constrained model serving architectures.

Responsibilities

Design and deploy high-performance model serving architectures.
Track metrics like latency and throughput.
Optimize serving pipelines for scalability.

Skills

Model serving architectures

Optimization techniques

Memory management

Kernel development

Latency management

Education

PhD in NLP or Machine Learning

Degree in Computer Science or related field

Join Tether and Shape the Future of Digital Finance

At Tether, we're pioneering a global financial revolution with innovative blockchain solutions that enable seamless digital token transactions worldwide. Our products include the trusted stablecoin USDT, energy-efficient Bitcoin mining solutions, advanced data sharing apps like KEET, and educational initiatives to democratize digital knowledge.

Why join us? Our remote, global team is passionate about fintech innovation. We seek individuals with excellent English communication skills eager to contribute to cutting-edge projects in a fast-growing industry.

About the job:

As part of our AI model team, you will innovate in model serving and inference architectures for advanced AI systems. Your focus will be on optimizing deployment strategies to ensure high responsiveness, efficiency, and scalability across various applications and hardware environments.

Responsibilities:

Design and deploy high-performance, resource-efficient model serving architectures adaptable to diverse environments.
Establish and track performance metrics like latency, throughput, and memory usage.
Develop and monitor inference tests, analyze results, and validate performance improvements.
Prepare realistic datasets and scenarios to evaluate model performance in low-resource settings.
Identify bottlenecks and optimize serving pipelines for scalability and reliability.
Collaborate with teams to integrate optimized frameworks into production, ensuring continuous improvement.

Qualifications:

Degree in Computer Science or related field; PhD preferred in NLP, Machine Learning, with a strong publication record.
Proven experience in low-level kernel and inference optimizations on mobile devices, with measurable improvements.
Deep understanding of model serving architectures, optimization techniques, and memory management in resource-constrained environments.
Expertise in CPU/GPU kernel development for mobile platforms and deploying inference pipelines on such devices.
Ability to apply empirical research to overcome latency, bottleneck, and memory challenges, with experience in evaluation frameworks and iterative optimization.

Get your free, confidential resume review.

or drag and drop a PDF, DOC, DOCX, ODT, or PAGES file up to 5MB.

Be an early applicant

AI Research Engineer (Model Serving & Inference)

Tether Operations Limited

London

Remote

GBP 60,000 - 100,000

Full time

Job summary

Qualifications

Responsibilities

Skills

Education

Job description

Similar jobs

AI Research Engineer (Model Evaluation)

London

Remote

GBP 50,000 - 80,000