Job Search and Career Advice Platform

Enable job alerts via email!

Data Engineer

Madfish

Remote

GBP 60,000 - 80,000

Full time

Yesterday
Be an early applicant

Generate a tailored resume in minutes

Land an interview and earn more. Learn more

Job summary

A data-driven technology company in the United Kingdom seeks an experienced professional to lead the development and scaling of their scientific knowledge graph. This role involves ingesting, structuring, and enriching vast datasets from research literature into actionable insights. Candidates should have strong experience in knowledge graph design, advanced Python skills for data engineering, and a proven track record with large datasets. An attractive financial package and opportunities for professional growth are offered in a friendly small team.

Benefits

Attractive financial package
Challenging projects
Professional & career growth
Great atmosphere in a friendly small team

Qualifications

  • Strong experience with knowledge graph design and implementation.
  • Proficient in advanced Python for data engineering, ETL, and entity processing.
  • Proven track record with large dataset ingestion of tens of millions of records.

Responsibilities

  • Lead the development and scaling of scientific knowledge graph.
  • Ingest, structure, and enrich datasets from research literature.
  • Transform data into meaningful, AI-ready insights.

Skills

Knowledge graph design and implementation
Advanced Python for data engineering
Large dataset ingestion
Familiarity with life-science or biomedical data
Experience with Airflow/Dagster/dbt

Tools

Neo4j
RDFLib
GraphQL
Spark
Dask
Polars
Job description

Lead the development and scaling of our scientific knowledge graph—ingesting, structuring, and enriching massive datasets from research literature and global data sources into meaningful, AI-ready insights.

Requirements
  • Strong experience with knowledge graph design and implementation (Neo4j, RDFLib, GraphQL, etc.).
  • Advanced Python for data engineering, ETL, and entity processing (Spark/Dask/Polars).
  • Proven track record with large dataset ingestion (tens of millions of records).
  • Familiarity with life-science or biomedical data (ontologies, research metadata, entity linking).
  • Experience with Airflow/Dagster/dbt, and data APIs (OpenAlex, ORCID, PubMed).
  • Strong sense of ownership, precision, and delivery mindset.
Nice to Have
  • Domain knowledge in life sciences, biomedical research, or related data models.
  • Experience integrating vector/semantic embeddings (Pinecone, FAISS, Weaviate).
We offer
  • Attractive financial package
  • Challenging projects
  • Professional & career growth
  • Great atmosphere in a friendly small team
Get your free, confidential resume review.
or drag and drop a PDF, DOC, DOCX, ODT, or PAGES file up to 5MB.