Aktiviere Job-Benachrichtigungen per E-Mail!

Semantic Backend Engineer (Contract, Remote)

INFUSE

Düsseldorf

Remote

EUR 60.000 - 80.000

Vollzeit

Heute

Sei unter den ersten Bewerbenden

Zusammenfassung

A technology company in Düsseldorf is seeking an Applied Machine Learning Engineer to manage the semantic ingestion pipeline from raw PDFs to structured data. The ideal candidate will have experience with building ML pipelines, semantic search, and will be proficient in tools like Python and PyTorch. This role involves collaborating on UX integration and ensuring data relevance and freshness across a vast array of resources.

Qualifikationen

Experience building ML pipelines for real users.
Familiarity with semantic search and embeddings.
Proficiency in handling and structuring unstructured data.

Aufgaben

Own the ETL pipeline from raw PDFs to structured resources.
Finalize summarization and classification flow using open-source models.
Implement logic for resource quality and mapping to topic taxonomy.
Generate and query embeddings using relevant tools.

Kenntnisse

Building ML pipelines

Semantic search

Working with unstructured data

Feedback iteration and metric tracking

Tools

Python

PyTorch

OpenAI APIs

FastAPI

Docker

OUR HIRING PROCESS

We will review your application against our job requirements. We do not employ machine learning technologies during this phase as we believe every human deserves attention from another human. We do not think machines can evaluate your application quite like our seasoned recruiting professionals—every person is unique. We promise to give your candidacy a fair and detailed assessment.
We may then invite you to submit a video interview for the review of the hiring manager. This video interview is often followed by a test or short project that allows us to determine whether you will be a good fit for the team.
At this point, we will invite you to interview with our hiring manager and/or the interview team. Please note: We do not conduct interviews via text message, Telegram, etc. and we never hire anyone into our organization without having met you face‑to‑face (or via Zoom). You will be invited to come to a live meeting or Zoom, where you will meet our INFUSE team.
From there on, it’s decision time! If you are still excited to join INFUSE and we like you as much, we will have a conversation about your offer. We do not make offers without giving you the opportunity to speak with us live.

INFUSE is committed to complying with applicable data privacy and security laws and regulations. For more information, please see our Privacy Policy.

INKHUB is ingesting 10 million raw PDFs to build the internet’s richest catalog of marketing‑grade B2B content - tagged, summarized, and searchable by topic, company, or intent.

We’re looking for an applied ML engineer to own the semantic ingestion pipeline, from raw PDFs to tagged, summarized, and embedded assets.

What You’ll Do

Own the ETL pipeline from raw PDFs (S3‑ingested) to structured resources
Finalize our summarization + classification flow using open‑source models with GPT‑4o fallback
Apply filtering logic (≤3 years old, ≤100 pages, etc) to enforce resource quality
Map each asset to the specific topic taxonomy (10+ per topic across ~9,000 topics)
Generate dense embeddings using sentence‑transformers
Load and query embeddings using Milvus or pgvector
Implement “freshness” logic to identify and index only new or updated content based on file diffing, crawl timestamp, or document hash
Build a QA/eval harness: format compliance, recall@5, drift monitoring
Expose /v1/semantic-search via FastAPI, with filtering and rank fusion
Collaborate closely with our Tech Lead on UX integration and snippet generation

Your Toolbox

Python, PyTorch, sentence‑transformers, OpenAI APIs, or similar pretrained LLMs.
FastAPI, Milvus or pgvector, PyPDF/Tika, Airflow or Lambda for orchestration
Docker, GPU scheduling, Athena/Redshift SQL

You Might Be a Fit If

You’ve built ML pipelines that touched real users, not just notebooks
You’ve worked on semantic search, embeddings, or large‑scale tagging
You’ve wrestled with unstructured data and love turning chaos into clarity
You like working fast, iterating with feedback, and tracking metrics that matter

Why This Role Matters

Your models decide what gets found, how it’s tagged, and which content and companies stand out. You’ll help define what “relevance” and “freshness” mean for over a million resources and 50,000+ company pages-and make sure INKHUB stays ahead of the curve.

Hol dir deinen kostenlosen, vertraulichen Lebenslauf-Check.

eine PDF-, DOC-, DOCX-, ODT- oder PAGES-Datei bis zu 5 MB per Drag & Drop ablegen.