¡Activa las notificaciones laborales por email!

Ai Research Engineer (Model Serving & Inference - 100% Remote Spain)

buscojobs España

Alicante

A distancia

EUR 40.000 - 70.000

Jornada completa

Hace 4 días
Sé de los primeros/as/es en solicitar esta vacante

Genera un currículum adaptado en cuestión de minutos

Consigue la entrevista y gana más. Más información

Empieza desde cero o carga un currículum

Descripción de la vacante

A leading company in digital finance is seeking an AI Research Engineer to innovate in model serving and inference architectures. The role offers the opportunity to work on cutting-edge AI projects remotely, focusing on optimizing responsiveness, efficiency, and scalability for advanced AI systems.

Formación

  • Proven track record in AI R&D and publications.
  • Extensive experience in large-scale model serving.
  • Deep understanding of modern serving architectures.

Responsabilidades

  • Design and deploy high-performance model serving architectures.
  • Monitor and test inference pipelines, tracking key metrics.
  • Collaborate with teams for optimized inference frameworks.

Conocimientos

C/C++
AI Research
Model Serving
NLP
Machine Learning
Optimization Techniques
Throughput Optimization
Latency Reduction
Resource Management

Educación

Degree in Computer Science
PhD in NLP or Machine Learning

Herramientas

Triton
CUDA
ThunderKittens

Descripción del empleo

AI Research Engineer (Model Serving & Inference - 100% Remote Spain)

3 days ago Be among the first 25 applicants

Join Tether and Shape the Future of Digital Finance

At Tether, we’re not just building products, we’re pioneering a global financial revolution. Our solutions empower businesses—from exchanges and wallets to payment processors and ATMs—to seamlessly integrate reserve-backed tokens across blockchains. By harnessing blockchain technology, Tether enables you to store, send, and receive digital tokens instantly, securely, and globally, at a fraction of the cost. Transparency is the foundation of trust in every transaction.

Innovate with Tether

Tether Finance : Our product suite features the trusted stablecoin USDT, used worldwide, and digital asset tokenization services.

What we do :

Tether Power : Sustainable energy solutions for Bitcoin mining using eco-friendly practices.

Tether Data : AI and P2P technology solutions like KEET for secure data sharing.

Tether Education : Digital learning platforms for global access.

Tether Evolution : Merging technology and human potential for innovative futures.

Why Join Us?

Our global, remote team is passionate about fintech innovation. Collaborate with top talent, push boundaries, and set industry standards. If you excel in English and want to contribute to cutting-edge platforms, Tether is your place.

About The Job

As an AI model team member, you will innovate in model serving and inference architectures for advanced AI systems. Focus on optimizing deployment and inference for responsiveness, efficiency, and scalability across diverse applications—from resource-limited devices to complex multi-modal systems.

Your expertise should include designing and optimizing model serving pipelines, developing novel serving strategies, and resolving bottlenecks in production to achieve high throughput, low latency, and minimal memory usage.

Responsibilities :

  • Design and deploy high-performance model serving architectures suitable for various environments, including resource-constrained devices, ensuring targets like reduced latency and memory footprint are met.
  • Monitor and test inference pipelines, tracking key metrics such as response latency, throughput, and memory usage; document results and compare against benchmarks.
  • Prepare test datasets and simulation scenarios for real-world deployment challenges, especially on low-resource devices, to evaluate model performance comprehensively.
  • Analyze and optimize computational efficiency, addressing bottlenecks related to processing and memory, to enhance scalability and reliability.
  • Collaborate with cross-functional teams to integrate optimized inference frameworks into production, defining success metrics like improved real-world performance and robustness.
  • Qualifications include a degree in Computer Science or related field, preferably a PhD in NLP, Machine Learning, or related areas, with a proven track record in AI R&D and publications.
  • Extensive experience in large-scale model serving and inference optimization, demonstrating improvements in latency, throughput, and memory footprint, especially on resource-constrained devices.
  • Deep understanding of modern serving architectures and optimization techniques, including low-latency, high-throughput methods, and memory management.
  • Strong expertise in C / C++, Triton, ThunderKittens, CUDA; practical experience in deploying inference pipelines on resource-constrained devices.
  • Ability to apply empirical research to overcome challenges like latency and memory constraints, designing evaluation frameworks and iterating on solutions.

Seniority level

  • Not Applicable

Employment type

  • Full-time

Job function

  • Finance

Note : The job posting is listed as 6 days ago, but no explicit expiration indicator is present. Assuming it’s active.

J-18808-Ljbffr

Consigue la evaluación confidencial y gratuita de tu currículum.
o arrastra un archivo en formato PDF, DOC, DOCX, ODT o PAGES de hasta 5 MB.