Oklahoma Data Exchange (OK Data) is seeking an experienced Data Scientist to support data exploration, quality assurance, and identity resolution for our growing public records database, currently focused on supporting work around the justice system. Founded in 2025, OK Data works with governmental and nonprofit organizations to build human and technical capacity to share, link, and understand critical data across the systems that serve Oklahomans.
The ideal candidate will have a proven track record in data engineering and/or data science, with significant experience designing and implementing database quality assurance tests and identity resolution algorithms. The position is remote and open to residents of Oklahoma.
Data Exploration
- Collaborate with Lead Data Engineer on creation of dashboards and reports using code (Python or Ruby) and/or a BI platform
- Explore, document and diagram various data sources keeping in mind the following:
- Mappings and standardization
- Data over time (temporal data)
- Scraping workflows
- Data scope
- Real-life data scenarios
- Edge cases and other sources of complexity
- Find patterns and suggest algorithms for designing database structure and gleaning insights
Database Quality Assurance
- Develop and maintain data quality standards, metrics, and processes
- Design and execute quality assurance test plans that ensure the integrity and currentness of data collected from public and privileged data sources
- Maintain working knowledge of OK Data data collection processes and storage infrastructure
- Monitor data pipelines and ETL processes to detect and resolve data anomalies
- Collaborate with Lead Data Engineer to quickly resolve errors in data processing and storage
- Collaborate with data engineers, analysts, and external stakeholders to understand data requirements and quality expectations
- Document data quality issues and work with relevant teams to implement corrective actions
- Create and maintain data quality dashboards and reports
- Support data governance initiatives and contribute to data stewardship efforts
Identity Resolution Support
- Test and benchmark identity resolution pipelines using deterministic and probabilistic matching algorithms
- Develop and maintain data ingestion and transformation processes to support identity stitching
- Collaborate with internal and external partners to define identity resolution rules and data quality standards
- Optimize performance of identity resolution workflows for scalability and accuracy
- Monitor and troubleshoot data matching issues and continuously improve match rates
- Ensure compliance with data privacy regulations (e.g., GDPR, CCPA) in identity resolution processes
- Document technical designs, data flows, and resolution logic
Benefits and Compensation
- Starting salary range for this position is $85,000 to $95,000, commensurate with experience
- Benefits package that includes 100% of employee coverage for medical, dental, vision plans
- Retirement savings plan with a safe harbor match at 5% of salary
- Fully remote work, stipend for work-from-home expenses, and flexible schedules
- Generous vacation leave and paid holidays
- Professional development opportunities
Qualifications
- Experience as a Data Scientist, Data Engineer, or relevant role, with a minimum of 5 years of experience
- High proficiency using SQL to create and extract custom data sets from multiple tables
- Demonstrated analytical and problem-solving skills with a keen attention to detail
- Familiarity with Oklahoma public data sources is a big plus