Enable job alerts via email!

Machine Learning Specialist (Embeddings, Data Deduplication, AI-Powered Search)

Madfish

United Kingdom

Remote

GBP 60,000 - 80,000

Full time

Today
Be an early applicant

Job summary

A leading AI solutions provider is seeking a Machine Learning Specialist to design intelligent systems for data deduplication and optimization. The role involves developing and fine-tuning models for text and image embeddings, applying computer vision techniques, and collaborating on scalable AI-driven solutions. Ideal candidates will have strong experience in the Python ML stack, computer vision, and embeddings, and must be fluent in English. This is a fully remote opportunity.

Qualifications

  • Strong experience with embeddings (text, images, multimodal).
  • Hands-on experience in image classification and computer vision tasks.
  • Proficiency in Python ML stack including popular libraries.
  • Hands-on experience in AI-driven deduplication techniques.

Responsibilities

  • Build scalable pipelines for AI-driven deduplication and record linkage.
  • Develop and fine-tune image and text embedding models.
  • Integrate ML models with various databases.
  • Evaluate model performance and document findings.

Skills

Machine learning
Embeddings (text, images)
Image classification
Python ML stack
Computer vision
Fuzzy matching
Database knowledge
Problem-solving
Fluent English

Tools

PyTorch
TensorFlow
scikit-learn
Hugging Face Transformers
Job description
Overview

Job Title: Machine Learning Specialist (Embeddings, Data Deduplication, AI-Powered Search)

Job Summary: We are looking for a talented Machine Learning Specialist with expertise in embeddings (text and images), data processing, and AI-driven deduplication. The role involves designing intelligent systems to clean, normalize, and optimize large-scale datasets, improving product discovery and search. This is a fully remote opportunity where you’ll work on cutting-edge ML solutions for real-world retail and enterprise use cases.

Key Responsibilities
  • Build and maintain scalable pipelines for AI-driven deduplication and record linkage across large datasets.
  • Develop and fine-tune image and text embedding models for classification, similarity, and search.
  • Apply computer vision techniques (image classification, feature extraction, multimodal learning).
  • Integrate ML models with relational and non-relational databases (PostgreSQL, MySQL, MongoDB, Redis).
  • Apply vector search technologies (e.g., FAISS, Milvus, Pinecone, Weaviate) to power semantic retrieval.
  • Research and implement methods for entity resolution, clustering, and anomaly detection.
  • Collaborate with data engineers to ensure efficient ETL, preprocessing, and feature engineering.
  • Evaluate model performance using precision/recall, ROC-AUC, F1-score, and business KPIs.
  • Document experiments and share insights with cross-functional stakeholders.
Must-Have Skills
  • Strong experience with embeddings (text, images, multimodal, or product embeddings).
  • Hands-on experience in image classification, image embeddings, and computer vision tasks.
  • Proficiency in Python ML stack: PyTorch, TensorFlow, scikit-learn, Hugging Face Transformers.
  • Hands-on experience in AI-driven deduplication (fuzzy matching, clustering, record linkage).
  • Solid understanding of databases and query optimization.
  • Familiarity with vector databases (FAISS, Pinecone, Milvus, etc.).
  • Strong problem-solving and analytical skills.
  • Fluent English (Upper-Intermediate or higher) for technical discussions and documentation.
Preferred Qualifications
  • Experience with LLM-powered pipelines (RAG, prompt engineering, hybrid search).
  • Knowledge of data quality frameworks and large-scale data cleaning.
  • Familiarity with cloud ML platforms (AWS Sagemaker, GCP Vertex AI, Azure ML).
  • Previous work in retail, e-commerce, or product catalog data is a plus.
Get your free, confidential resume review.
or drag and drop a PDF, DOC, DOCX, ODT, or PAGES file up to 5MB.