We're looking for a Senior Data Engineer to join our supportive and mission-driven Data Team. This is an exciting opportunity to work on a nationally significant programme powered by NHS health data — helping researchers solve complex challenges on a truly industrial with global significance.
In this role, you’ll bring your experience working with NHS datasets to help design, build, and maintain data pipelines that enable trusted, high-quality insights. You’ll collaborate closely with colleagues across multiple disciplines — from Researchers and Epidemiologists to Software Engineers and Product Leads — contributing to a shared code base that delivers real-world health data for discovery.
What You'll Be Doing:
- Support the build of data pipelines from data providers to our primary data store and trusted research environment. Support the design, scoping and build of data flows.
- Produce logic for data transformation steps as code, which meets the requirements for our end users and builds well curated, accessible and quality controlled data for analysis.
- Contribute to code base for multiple data pipelines while ensuring best coding practises are used.
- The opportunity to work with Data Scientists and Epidemiologists to understand their data requirements and collaborate with them to deliver the data needed for their projects.
- Keep abreast of best practice in data engineering across industry, research and Government and facilitating the adoption of these standards.
- Experience building and maintaining robust, scalable and efficient data pipelines. Capable of processing very large amounts of data based on feeds from multiple systems using a range of different technologies.
- You’re an empathetic communicator, comfortable bridging technical and non-technical perspectives
- You’re confident working with NHS health data and understand the nuances of secondary and primary care datasets (Hospital Episodes Statistics, Death registry data, A&E data etc) as well as Primary care (GP data) would be advantageous.
- Highly proficient in Python with solid command line knowledge and Unix skills.
- Good understanding of cloud environments (ideally Azure), distributed computing and optimising workflows and pipelines.
- Understanding of common data transformation and storage formats, e.g. Apache Parquet, Delta tables.
- Understanding of containerisation (e.g. Docker) and deployment (e.g. Kubernetes).
- Working knowledge using Spark, Databricks, Data Lakes.
- Follow best practices like code review, clean code and unit tests.
- You're comfortable working in an agile development team, familiar with version control and Git/GitHub.
- Awareness/interest of data standards such as GA4GH ( https://www.ga4gh.org/) and FAIR (https://www.go-fair.org/fair-principles/).
- You’re experienced in contributing to and navigating shared codebases within multi-person teams
- Competitive base salary
- Generous Pension Scheme – We invest in your future with employer contributions of up to 12%.
- 30 Days Holiday + Bank Holidays – Enjoy a generous holiday allowance with the flexibility to take bank holidays when it suits you.
- Enhanced Parental Leave – Supporting you during life’s biggest moments.
- Career Growth & Development – £500 per year to spend on Learnerbly, our learning platform, plus regular appraisals and development opportunities.
- Cycle to Work Scheme – Save 25-39% on a new bike and accessories through salary sacrifice.
- Home & Tech Savings – Get up to 8% off on IKEA and Currys products, spreading the cost over 12 months through salary sacrifice
- £1,000 Employee Referral Bonus – Know someone amazing? Get rewarded for bringing them on board!
- Wellbeing Support – Access to Mental Health First Aiders, plus 24/7 online GP services and an Employee Assistance Programme for you and your family.
- A Great Place to Work – We have a lovely Central London office in Holborn, and offer flexible and remote working arrangements.
Join us - let’sprevent disease together.