Senior Scientist, Foundation models for speech

Solo per membri registrati
Roma
EUR 60.000 - 80.000
Descrizione del lavoro

About Translated
Translated is on a mission to allow everyone to understand and be understood, in their own language. We are a technology-powered professional translation provider. We partner with over 200,000 professional translators worldwide, in 200 languages. Our 310,000 clients range from private individuals needing their CV translated to large companies like Uber and Airbnb.

Our progress is largely powered by our ability to leverage scientific progress and realize the best synergy between humans and machines. We invest heavily in R&D, such as LLMs applied to translation, expressive speech synthesis, and privacy-preserving training for translation. We operate as a science-driven startup, ensuring our scientific innovations quickly make it to production and have a measurable impact on our operations.

The ideal candidate should have a strong enthusiasm for contributing to the design and implementation of Large-scale Language Models for speech-related tasks. They should also be capable of coordinating technical, communication, and team activities for our Meetween project.

The project: Meetween

Translated has just been awarded a grant for Meetween, a €7M, 4-year collaborative research project starting in January 2024, which Translated leads. Meetween uses LLMs and multimodal foundation models to enhance human communication.

The project covers research areas such as Deep Learning, Large Language and Multimodal models, Machine Translation, Automatic Speech Recognition and Translation, Summarization, and AI Digital Assistants. It offers the opportunity to collaborate with leading speech processing teams from academia and industry.

With Meetween, we aim to "solve speech": building foundation models to model all three speech modalities (text, audio, video including lip movement, facial expressions, and gestures) in a single architecture. Through transfer learning and conditioning, these models will support downstream tasks like ASR, zero-shot TTS, voice cloning, speech-to-speech translation, lip reading and resync, and audio/video reconstruction/enhancement.

We have secured a substantial computing budget on the Polish HPC infrastructure with hundreds of thousands of A100-hours, in addition to our in-house GPU infrastructure.

All research outcomes—models, datasets, benchmarks—will be open-sourced on HuggingFace.

What You’ll Do

You will join a team of researchers dedicated to Meetween within Translated's AI Research team, working on Large Language Models, Machine Translation, Speech Synthesis, and privacy-preserving Machine Learning. The team collaborates closely with product and engineering teams to develop next-generation technology.

In this role, you will:

  • Work with data, compute, and algorithms
  • Design deep learning multimodal neural architectures
  • Design experiments, implement and run them on large GPU/HPC infrastructure, and evaluate results
  • Monitor and benchmark state-of-the-art models
  • Potentially guide junior team members, including PhD students and interns
  • Coordinate with partners on research roadmap
  • Adapt to rapid scientific developments
  • Organize publications and open-source contributions

Requirements

  • PhD or 4+ years industry research in relevant deep learning fields (language modeling or speech recognition)
  • Excellent programming skills in PyTorch
  • Familiarity with Docker, Unix OS, and GPU experimentation
  • Interest in experimental research
  • Relevant scientific publications, teaching, or research experience
  • Experience in speech and language technology industry
  • Team coordination experience
  • Excellent English skills
  • Strong in multi-GPU training and optimization
  • Polyglot language skills
  • Open-source contributions
  • Publications at tier-1 ML/AI conferences (NeurIPS, ICML, ICASSP, Interspeech, ACL, EMNLP)

Translated is hosted at Pi Campus, a natural environment with villas in Rome, fostering talent growth. Pi Campus is also a venture firm created by Translated to reinvest profits into promising AI startups.

Benefits include gym, swimming pool, kickboxing, water aerobics, fitness, Pilates, table tennis, football table, kitchen and snacks, and incentives for healthy and family-friendly choices.

We celebrate diversity and are committed to an inclusive environment where everyone feels valued and supported to reach their potential.