Enable job alerts via email!

Data Integration Specialist

FirstPrinciples

Canada

Remote

CAD 60,000 - 80,000

Full time

Today
Be an early applicant

Job summary

A non-profit research organization in Canada is seeking a Data Integration Specialist to lead data extraction projects and design robust ETL processes. The ideal candidate will have a Bachelor's degree in computer science or related fields and 1-3 years of experience in data transformation. They will be responsible for collaborating with teams and ensuring data integrity. This role presents an opportunity to contribute to advancing scientific discovery through innovative data management strategies.

Qualifications

  • 1-3 years of experience working with data transformation, ETL processes, or similar roles.
  • Experience managing small to medium-sized data projects from conception to completion.
  • Strong problem-solving abilities and attention to detail.

Responsibilities

  • Investigate and evaluate new data sources.
  • Create comprehensive extraction plans and strategies for each data source.
  • Develop and maintain parsers for diverse data sources.

Skills

Python
Data transformation
ETL processes
Data parsing libraries
Prompt engineering for LLMs

Education

Bachelor's degree in computer science, data science, information systems, or related field

Tools

SQL
JSON
Job description
Overview

About FirstPrinciples: FirstPrinciples is a non-profit organization advancing scientific discovery by developing an AI Physicist - an intelligent system designed to explore and uncover the fundamental laws of nature. Our goal is to build a new kind of research platform that can ask deep scientific questions, reason across disciplines, and accelerate the generation of new ideas. By combining AI, symbolic reasoning, and autonomous research capabilities, we aim to transform how knowledge is discovered and help push science toward its next breakthroughs.

Job Description

FirstPrinciples is seeking a skilled and detail-oriented Data Integration Specialist to play a crucial role in our data pipeline development. In this position, you will lead projects to design and implement data extraction processes from various structured and unstructured sources, create robust parsing mechanisms, and develop sophisticated logic to extract meaningful features from raw data. Working in an agile environment, you will iteratively refine extraction methods based on ongoing feedback.

Responsibilities
  • Investigate and evaluate new data sources.
  • Create comprehensive extraction plans and strategies for each data source.
  • Lead the full lifecycle of data extraction projects from planning to implementation.
  • Work closely with peers and managers to iterate quickly and refine various approaches.
  • Progressively scale extraction processes from small test batches to full implementation.
Data Source Integration
  • Develop and maintain parsers for diverse data sources including APIs, databases, web content, PDFs, and scientific literature.
  • Create reliable ETL processes to ensure data quality and consistency, including LLM-based extraction pipelines.
  • Design and refine prompts for LLMs to extract structured information from unstructured data sources, including text, images, and other multimodal inputs.
  • Implement error handling and logging systems to maintain data pipeline reliability.
  • Identify and extract valuable features from complex raw data sets.
  • Develop logic and algorithms to transform unstructured information into structured, analyzable formats.
  • Create reproducible processes for data normalization and standardization.
  • Optimize parsing procedures for performance and accuracy.
  • Document data lineage and transformation processes for transparency.
  • Work closely with cross-functional teams to understand feature requirements.
  • Coordinate with engineering team to integrate data pipelines into broader systems.
  • Communicate technical concepts clearly to non-technical stakeholders.
  • Engage directly with third party data vendors to obtain technical specifications and integration details.
  • Demonstrate ability to work effectively both as part of a collaborative team and independently on self-directed tasks.
Qualifications
  • Educational Background: Bachelor\'s degree in computer science, data science, information systems, or related field.
  • Experience: 1-3 years of experience working with data transformation, ETL processes, or similar roles.
  • Project Management Skills
    • Experience managing small to medium-sized data projects from conception to completion.
    • Demonstrated ability to create technical plans and roadmaps for data extraction.
    • Experience working in agile environments with iterative development cycles.
  • Technical Skills
    • Proficiency in Python and/or similar languages for data processing.
    • Experience with data parsing libraries and frameworks.
    • Knowledge of data storage systems and formats (SQL, JSON, etc.)
    • Familiarity with regular expressions and text processing techniques.
    • Experience with prompt engineering for LLMs and AI-assisted data extraction.
  • Analytical Skills: Strong problem-solving abilities and attention to detail.
  • Communication: Ability to document processes clearly and communicate technical concepts.
  • Bonus Skills
    • Experience with natural language processing.
    • Knowledge of scientific literature and research data structures.
    • Familiarity with cloud-based data processing.
Application Process
  • Interested candidates are invited to submit their resume, a cover letter detailing their qualifications and vision for the role, and references. Please include "Data Integration Specialist" in the cover letter.
Get your free, confidential resume review.
or drag and drop a PDF, DOC, DOCX, ODT, or PAGES file up to 5MB.