
Enable job alerts via email!
Generate a tailored resume in minutes
Land an interview and earn more. Learn more
A leading technology company in Northern Cape, South Africa, is seeking an applied ML engineer to take ownership of the semantic ingestion pipeline. The role involves developing and optimizing the ETL process, utilizing advanced machine learning models, and ensuring data relevance and freshness. Ideal candidates will have experience in building ML pipelines that benefit real users, along with skills in Python and semantic search technologies.
INFUSE is committed to complying with applicable data privacy and security laws and regulations.
For more information, please see our Privacy Policy.
INKHUB is ingesting 10 million raw PDFs to build the Internet's richest catalog of marketing-grade B2B content — tagged, summarized, and searchable by topic, company, or intent.
We're looking for an applied ML engineer to own the semantic ingestion pipeline, from raw PDFs to tagged, summarized, and embedded assets.
Own the ETL pipeline from raw PDFs (S3-ingested) to structured resources.
Finalize our summarization and classification flow using open-source models with GPT-4o fallback.
Apply filtering logic (=3 years old, = pages, etc.) to enforce resource quality.
Map each asset to the specific topic taxonomy (10+ per topic across ~9, topics).
Generate dense embeddings using sentence-transformers.
Load and query embeddings using Milvus or pgvector.
Implement "freshness" logic to identify and index only new or updated content based on file diffing, crawl timestamp, or document hash.
Build a QA / eval harness : format compliance, , drift monitoring.
Expose / v1 / semantic-search via FastAPI, with filtering and rank fusion.
Collaborate closely with our Tech Lead on UX integration and snippet generation.
Python, PyTorch, sentence-transformers, OpenAI APIs, or similar pretrained LLMs.
FastAPI, Milvus or pgvector, PyPDF / Tika, Airflow or Lambda for orchestration.
Docker, GPU scheduling, Athena / Redshift SQL.
You've built ML pipelines that touched real users, not just notebooks.
You've worked on semantic search, embeddings, or large-scale tagging.
You've wrestled with unstructured data and love turning chaos into clarity.
You like working fast, iterating with feedback, and tracking metrics that matter.
Your models decide what gets found, how it's tagged, and which content and companies stand out.
You'll help define what "relevance" and "freshness" mean for over a million resources and 50,+ company pages and make sure INKHUB stays ahead of the curve.
Referrals increase your chances of interviewing at INFUSE by 2x.
Be among the first 25 applicants to get a fair and detailed assessment from our seasoned recruiting professionals.