Enable job alerts via email!

Data Engineer

Madfish

Remote

GBP 60,000 - 80,000

Full time

Yesterday

Be an early applicant

Generate a tailored resume in minutes

Land an interview and earn more. Learn more

Job summary

A data-driven technology company in the United Kingdom seeks an experienced professional to lead the development and scaling of their scientific knowledge graph. This role involves ingesting, structuring, and enriching vast datasets from research literature into actionable insights. Candidates should have strong experience in knowledge graph design, advanced Python skills for data engineering, and a proven track record with large datasets. An attractive financial package and opportunities for professional growth are offered in a friendly small team.

Benefits

Attractive financial package

Challenging projects

Professional & career growth

Great atmosphere in a friendly small team

Qualifications

Strong experience with knowledge graph design and implementation.
Proficient in advanced Python for data engineering, ETL, and entity processing.
Proven track record with large dataset ingestion of tens of millions of records.

Responsibilities

Lead the development and scaling of scientific knowledge graph.
Ingest, structure, and enrich datasets from research literature.
Transform data into meaningful, AI-ready insights.

Skills

Knowledge graph design and implementation

Advanced Python for data engineering

Large dataset ingestion

Familiarity with life-science or biomedical data

Experience with Airflow/Dagster/dbt

Tools

Neo4j

RDFLib

GraphQL

Spark

Dask

Polars

Lead the development and scaling of our scientific knowledge graph—ingesting, structuring, and enriching massive datasets from research literature and global data sources into meaningful, AI-ready insights.

Requirements

Strong experience with knowledge graph design and implementation (Neo4j, RDFLib, GraphQL, etc.).
Advanced Python for data engineering, ETL, and entity processing (Spark/Dask/Polars).
Proven track record with large dataset ingestion (tens of millions of records).
Familiarity with life-science or biomedical data (ontologies, research metadata, entity linking).
Experience with Airflow/Dagster/dbt, and data APIs (OpenAlex, ORCID, PubMed).
Strong sense of ownership, precision, and delivery mindset.

Nice to Have

Domain knowledge in life sciences, biomedical research, or related data models.
Experience integrating vector/semantic embeddings (Pinecone, FAISS, Weaviate).

We offer

Attractive financial package
Challenging projects
Professional & career growth
Great atmosphere in a friendly small team

Get your free, confidential resume review.

or drag and drop a PDF, DOC, DOCX, ODT, or PAGES file up to 5MB.