Location:
Remote, with occasional visits to our lab in Bristol (approx. monthly).
Start Date:
June/July 2025.
Eligibility:
Candidates must be eligible to work in the UK to apply for this role.
Company:
GenomeKey is a Bristol-based biotech startup developing a next-generation diagnostic device for bloodstream infections, using machine learning and DNA sequencing.
Role Responsibilities
We’re looking for a skilled Data Engineer with experience in building robust data pipelines and managing large and complex datasets, particularly in genomics.
Responsibilities
As an integral part of our team, you will design and implement our new data management infrastructure in support of our cutting-edge diagnostic device development. Your responsibilities will include:
- Design, develop, and maintain data pipelines for processing large-scale genomic datasets and associated metadata. Implement and manage data storage solutions (both cloud-based and on-premises), ensuring scalability, security, and efficiency.
- Build and maintain automated systems for quality control, data enrichment, and monitoring of internal and external resources.
- Champion data management best practices, including reproducibility, documentation, and provenance.
- Integration of data pipelines with our “wet lab” processes - e.g., streaming data off DNA sequencers for processing and storage; integration into LIMS systems.
- Collaborate with Machine Learning, Software, and Bioinformatics specialists to ensure data needs are met. For example, developing processes to gain insights into data and enable improvements to our bioinformatics and machine learning algorithms as new data becomes available.
About You
We are looking for a highly motivated individual with strong communication skills and a passion for data engineering within the life sciences. You’ll have proven experience in working with large datasets, implementing data pipelines, and ensuring data integrity and security - ideally within the life sciences sector.
Qualifications & Experience
We are open to varied backgrounds for this role but expect candidates to meet most of the following skills and experiences:
Essential
- Master’s degree or PhD in Computer Science, Data Science, Bioinformatics, or a related field—or equivalent experience.
- 4+ years of professional experience in data engineering or a related role, with proven ability to build and maintain data pipelines.
- Strong proficiency in Python and SQL.
- Experience with data warehousing, data lakes, and database management systems, particularly with petabyte-scale datasets.
- Understanding of data security and privacy principles.
- Experience in life sciences, genomics, or medical device software development, including development under ISO 13485-compliant Quality Management System.
- Familiarity with genomic data formats, analyses, quality control, and validation techniques.
- Strong verbal and written communication skills, with the ability to convey complex technical concepts clearly.
Desirable
- DevOps expertise, including Docker and Kubernetes, automated testing, and CI/CD pipelines.
- Hands-on experience with cloud platforms like GCP, AWS, or Azure for data storage and processing.
- Experience with high-throughput data streams.
- Knowledge of big data technologies (e.g., Spark, Hadoop) and ETL pipelines.
- Experience with workflow management tools.
- Knowledge/experience with data storage management systems (ZFS/RAID, magnetic tape drives, etc.).
- Experience using AI/LLM-based tools to accelerate research and development.
Our Hiring Process
- Intro call with hiring manager (30 minutes)
- Take-home task
- Role-fit interview (60 minutes)
- Final stage (45 minutes)
GenomeKey is an equal opportunities employer. We welcome applicants from all backgrounds and experiences.