About the job Open Data & Search Solutions Engineer
Open Data & Search Solutions Engineer
Position Overview
We are seeking a Data Discovery Engineer to lead the development and integration of search, metadata cataloguing, and Linked Open Data (LoD) capabilities across our data platform ecosystem. This role will focus on implementing and managing open-source search engines (SolR, OpenSearch, Elastic), data cataloging tools (Atlas, NADA), and semantic web technologies (DCAT, RDF, schema.org, Croissant) to improve data discoverability, interoperability, and reuse.
Key Responsibilities
Search Platform Integration
- Design and deploy open-source search engine solutions such as Apache SolR, OpenSearch, or Elasticsearch
- Optimize indexing strategies for structured and unstructured data from diverse data sources
- Develop custom search features (facets, filters, synonyms, auto-suggestions) tailored to metadata and dataset discovery
- Implement scalable search pipelines with support for multilingual and full-text search capabilities
Metadata Cataloging & Data Discovery
- Deploy and maintain metadata catalog systems such as Apache Atlas, NADA, or CKAN
- Ensure metadata harvesting and harmonization across multiple sources using catalog APIs and connectors
- Integrate catalog systems with enterprise data lakes, APIs, and external repositories
- Establish metadata governance policies and data stewardship workflows in collaboration with data owners
Linked Open Data (LoD) Enablement
I
mplement Linked Open Data techniques using DCAT, RDF, schema.org, and W3C standards
I
mplement Linked Open Data techniques using DCAT, RDF, schema.org, and W3C standards- Publish datasets and metadata as linked data endpoints for reuse and interoperability
- Map internal metadata schemas to external ontologies (e.g., DCAT-AP, schema.org, Croissant)
- Build SPARQL endpoints or graph-based access for semantic querying of data assets
Platform Automation & Integration
- Develop and maintain pipelines for metadata ingestion, enrichment, validation, and publication
- Integrate catalog and search platforms with analytics tools, APIs, and user-facing portals
- Support deployment of semantic technologies using containers, CI/CD pipelines, and cloud-native tools
- Collaborate with platform engineers, data stewards, and domain experts to deliver robust metadata solutions
Technical Skills
- 6+ years experience deploying and operating open-source search engines (SolR, Elasticsearch, OpenSearch)
- Experience with metadata catalog platforms such as Apache Atlas, NADA, CKAN, or equivalent
- Strong understanding of metadata standards and vocabularies (DCAT, Dublin Core, schema.org, RDF, OWL)
- Proficiency in SPARQL, JSON-LD, XML, and semantic mapping techniques
- Familiarity with data discovery, information retrieval, and search tuning techniques
- Experience integrating catalog and search systems with APIs and microservices
- Working knowledge of containerization (Docker) and orchestration (Kubernetes)
- Scripting or development skills in Python, Java, or Scala for search and metadata tooling
- Familiarity with CI/CD pipelines (e.g., GitLab CI, GitHub Actions) and IaC tools (Terraform, Ansible)
Preferred Qualifications
- Bachelors or Masters degree in Computer Science, Information Science, Data Engineering, or related field
- Experience working in open data, statistical, or public-sector environments
- Knowledge of FAIR data principles, metadata quality frameworks, and interoperability standards
- Exposure to graph databases (Blazegraph, GraphDB, Virtuoso) or triple stores
- Contributions to or experience with semantic web and linked data communities