Overview
Synagen builds specialized AI agents for healthcare and oncology, designed to support complex clinical decisions and biomedical workflows with actionable, high-precision outputs. We combine modern AI with clinical expertise to create software that integrates into real provider environments and delivers value in practice.
Responsibilities
- This role bridges two modes (split may vary over time):
- Customer project work: deliver concrete analyses, data products, and insight pipelines for partner hospitals and projects.
- Internal platform work: build the reusable foundations (datalake/lakehouse, ontology/terminology layer, evaluation/monitoring) that make those projects fast, reproducible, and production-grade.
- Lead applied research / analytics projects with pharma and clinical partners: independently scope questions, define datasets and success criteria, and deliver end-to-end outputs with medical stakeholders.
- Build and operate scalable pipelines that transform raw clinical/patient data into structured, queryable, analysis-ready datasets.
- Design and evolve a datalake / lakehouse approach on Azure (storage, compute patterns, governance, access controls).
- Develop and maintain ontologies / terminology mappings and a consistent internal data model to enable reliable downstream analytics and agent reasoning.
- Build “SynInsight”-style data products for partners (e.g., cohorts, endpoints, phenotypes, evidence-ready exports and reports) that are robust, reproducible, and measurable.
- Implement LLM/agent operations: prompt/workflow versioning, evaluation harnesses, monitoring, regression testing, and cost/performance controls—using AI-assisted development tools where helpful.
- Build agents that automate R&D workflows (e.g., data-to-cohort pipelines, evidence synthesis, structured insight generation), and operationalize them with proper evaluation and monitoring.
- Drive privacy-preserving data capabilities, including synthetic data generation for development, evaluation, and safer sharing/testing in projects (including Azure-based implementations).
- Ensure security, privacy, and compliance expectations are met when processing sensitive healthcare data in Germany/EU and the US (e.g., GDPR, ISO 27001, SOC 2, BSI C5; US healthcare compliance alignment).
Qualifications
- Strong experience in applied data science / ML engineering / MLOps, ideally in pharma, R&D, or healthcare-adjacent environments.
- Proven ability to build production-grade pipelines for messy real-world data (ETL/ELT, data quality, lineage, reproducibility).
- Experience building and operating LLM/agent systems in production (workflows, evaluation, monitoring, reliability).
- Strong coding skills (Python + SQL) and comfort with engineering best practices (tests, CI/CD, documentation).
- Practical experience structuring data with ontologies/terminologies and making it usable for analytics and downstream systems.
- Experience in AI-assisted programming (Claude Code, Codex, etc.)
- Fluent in English (written and spoken).
Good to have
- Experience with clinical terminologies and standards (e.g., ICD-10, SNOMED CT, LOINC, RxNorm/ATC).
- Experience with modern data stack components (lakehouse patterns, columnar formats, distributed compute) on Azure.
- Familiarity with privacy-preserving data processing (pseudonymization/de-identification, access partitioning, audit trails).
- Experience delivering customer-facing data/ML projects end-to-end.
- Experience with modern DataOps tooling for reproducible data and agent workflows (e.g. dbt, Dagster, or similar asset-based orchestration and transformation frameworks).
Real-world impact in oncology: build integrations that bring AI into clinical workflows where accuracy and trust matter. High ownership: you will shape our interoperability layer end-to-end and define how we integrate at scale.