Job Search and Career Advice Platform

Aktiviere Job-Benachrichtigungen per E-Mail!

VLM Research Engineer (m/f/d)

Deltia

Berlin

Vor Ort

EUR 60.000 - 90.000

Vollzeit

Heute
Sei unter den ersten Bewerbenden

Erstelle in nur wenigen Minuten einen maßgeschneiderten Lebenslauf

Überzeuge Recruiter und verdiene mehr Geld. Mehr erfahren

Zusammenfassung

An innovative AI-first software company in Berlin seeks a Research Engineer specializing in pushing the limits of vision-language models for real-world video understanding. Responsibilities include designing multimodal models and developing production-ready inference pipelines. Ideal candidates should hold a PhD in a related field and have experience with video-centric deep learning. The company offers competitive compensation, stock options, and a supportive culture that values diversity.

Leistungen

Competitive salary & stock options
Flexible working hours
Supportive and inclusive culture

Qualifikationen

  • Completed PhD or equivalent track record in relevant field.
  • Strong background in scene understanding or video generation.
  • Experience with large vision or VLM models.

Aufgaben

  • Design and adapt models for scene understanding and action recognition.
  • Build and maintain large-scale training and evaluation pipelines.
  • Deliver production-ready inference pipelines.

Kenntnisse

Video-centric deep learning
Multi-GPU training (PyTorch)
Proven engineering habits

Ausbildung

PhD in computer vision, machine learning, or related field

Tools

GPU clusters
TensorRT
Jobbeschreibung

We’re looking for a Research Engineer to push the limits of vision-language models for real-world video understanding. You’ll work on applied, state-of-the-art multimodal models and turn them into production pipelines used by customers.

Your role
  • Design and adapt vision-language and video models for scene understanding, temporal reasoning and activity / action recognition
  • Build and maintain large-scale training and evaluation pipelines on GPU clusters
  • Curate and augment video-text and action datasets, including synthetic labels and retrieval-based augmentation
  • Develop robust benchmarks for video QA, instruction following and temporal understanding, and use them to drive iterative model improvements
  • Cut and refactor model architectures for efficiency and deployability (compression, pruning, distillation)
  • Deliver production-ready inference pipelines to product and customer teams, working closely with CV, platform and robotics engineers
You bring
  • Completed PhD (or equivalent research track record) in computer vision, machine learning, robotics or a related field
  • Strong background in video-centric deep learning: scene understanding, temporal / activity / action recognition, or video generation
  • Experience training and adapting large vision or VLM models (e.g. InternVL, Qwen-VL, DeepSeek-VL, similar stacks)
  • Proven work with multi-GPU training (PyTorch, distributed, mixed precision) and large-scale datasets
  • Solid engineering habits: clean Python, reproducible experiments, reliable data and training pipelines
  • Track record of moving research into usable systems (demos, internal tools, or productised features) in fast-moving teams
Nice to have
  • Publications at top-tier venues (CVPR, ICCV, ECCV, NeurIPS, ICLR, etc.) on video, multimodal learning or scene understanding
  • Experience with 3D/4D scene representations, action generation or embodied / sense-plan-act style projects
  • Inference optimisation: quantisation, TensorRT, model distillation, or deployment on constrained hardware
  • Prior experience in a startup or applied research lab environment
What we offer
  • A competitive salary & stock options*
  • Be on the forefront in defining what artificial intelligence means in manufacturing
  • Gain hands‑on experience in working in an AI‑first software company
  • Supportive and inclusive culture that values diversity and promotes the advancement of underrepresented groups within the company
  • Collaborate with a diverse (currently more than 10 nationalities) and talented team, working on cutting‑edge projects with real‑world impact
  • Network with professionals and leaders in the field, opening doors to potential future career opportunities
  • We have a very flat hierarchy, open 360° feedback, and flexible working hours
  • Ethics⚖: We are committed to developing ethical AI software
Don't meet all the requirements?

Deltia is committed to creating a workplace that is diverse, fair, and inclusive. We encourage candidates from all backgrounds, even if they do not meet every qualification, to submit their application. We firmly believe that having a team with diverse perspectives only strengthens our company and drives innovation. Our commitment also extends to providing an accessible environment for everyone, including those with disabilities. Please let us know if you require any accommodations during the application process or while working with us, and we will do our best to support you.

*Only full-time, permanent roles are eligible for stock options. Part-time roles, contract roles, work‑student, internships and freelance roles are not eligible for stock options;

Hol dir deinen kostenlosen, vertraulichen Lebenslauf-Check.
eine PDF-, DOC-, DOCX-, ODT- oder PAGES-Datei bis zu 5 MB per Drag & Drop ablegen.