Job Title: Scientific Knowledge Engineer/Data harmonizationLocation: Durham, NC 12460-0070Duration: 07/01/2025 to 12/31/2025 (6 Months0Work Schedule: Monday-Friday – 9-5pmFully Remote – No onsite requirement – Candidates will need to be in the following cities - Seattle, Boston, Philadelphia, San Francisco• Definition of schemas and data models of scientific information required for the creation of value adding data products.
• This includes accountability for the quality control and mapping specifications to be industrialized by data engineering and maintained in platform provisioned tooling.
• Accountable for the quality control (through validation and verification) of mapping specifications to be industrialized by data engineering and maintained in platform provisioned tooling – e.g., models, schemas, controlled vocab.
• Working with Product managers/engineers confidently convert business need into defined deliverable business requirements to enable the integration of large-scale biology data to predict, model, and stabilize therapeutically relevant protein complex and antigen conformations for drug and vaccine discovery.
• Collaborate with external groups to align CLIENT data standards with industry/ academic ontologies ensuring that data standards are defined with usage/analytics in mind.
• They may also provide data source profiling and advisory consultancy to R&D outside of Onyx.
• Support effective ingestion of data by CLIENT through understanding the entry requirements required by platform engineering teams and ensuring that the “barrier for entry” is met e.g. Scientific information has the appropriate metadata to be indexed, structured, integrated and standardized as needed.
• This may require articulation of CLIENT engineering standards and metadata information needs to third parties to ensure efficient and automate ingestion at scale.
• Provides bespoke subject matter expertise for R&D data to translate deep science into data for actionable insights
Candidate Requirements:- Must-have Skills experience
- What type of individual excels in your environment and why?
- Non-essential requirements that would give the candidate an edge.
- Degrees or certifications required
- Would you consider candidates from other industry background?
- Bachelor’s degree
- Specialized knowledge of scientific ontology and metadata standards
- Semantic technology experience
- Data harmonization
- Meta data experience
Job 1: Skills for data harmonizationExperience with
• Data Analysis by Jupyter Notebooks, Python
• NextFlow pipeline
• Bioinformatics/data science
• GEO, single cell data
• Code versioning (GitHub)
• LinkML
• Ontology usage and basic understanding, knowledge of common biomedical ontologies
• Single-cell technologies
• Attention to detail
• Spreadsheet wizardry
• Regular expressions
• SQL
• anndata format (nice-to-have)
• Google cloud (nice-to-have)
• Bioinformatics/data science --> yes, some experience in this is helpful, at least knowing what the basic steps in an NGS workflow are
• familiarity with external data sources: GEO, ArrayExpress, EGA, CellxGene --> don't need to be familiar with every single one but at least data repositories like these
• familiarity with external ontologies: DOID, UBERON, CL, NCBITaxon
• previous biology research experience is helpful, especially with Next Gen Sequencing
• prior experience with cloud environments helpful
Job 2: Skills for semantic technologist• Demonstrated experience with following tools, Protégé, Semaphore, TopQuadrant
• Experience with RDF, RDFS, OWL, SPARQL, GraphQL
• Knowledge graphs
• Biology background- Bachelors
• Public ontology resources (Bioportal, OLS)
Minimum years of experience is 5+• Life science background would be a benefit