Project description
We are seeking a meticulous and analytical Data QA Analyst to join a new data team working with Large Language Models. You'll play a critical role in ensuring the accuracy, consistency, and reliability of our data. To ensure success as a data QA engineer, you should have programming skills and a keen eye for detail.Successful candidates will be evidently enthusiastic and motivated people who we can train up in our processes and ultimately play a key role in quality assurance initiatives across different stakeholder groups.
Responsibilities
- Develop and execute test plans, test cases, and scripts for data validation across ETL processes, databases, and reporting tools.
- Perform root cause analysis on data issues and work with engineering and analytics teams to resolve them.
- Monitor data quality metrics and implement automated checks to detect anomalies.
- Validate data transformations, aggregations, and business logic in dashboards and reports.
- Collaborate with data engineers, analysts, and product managers to define QA requirements and acceptance criteria.
- Document QA processes, test results, and data issue logs for transparency and continuous improvement.
SKILLS
Must have
- Proven experience in data QA, data analysis, or data engineering roles.
- Experience with MS SQL and PostgreSQL
- Strong SQL skills for querying and validating large datasets.
- Familiarity with data warehousing concepts and ETL processes (hands-on experience with ETL pipelines, data warehouses, and data validation at scale).
- Understanding of data governance, data lineage, and metadata management.
- Excellent attention to detail and problem-solving abilities.
- Strong communication skills to explain data issues and collaborate with cross-functional teams.
- Scripting and automation (e.g., PowerShell, Python, Java).
- Experience with Gitlab.
- Knowledge of Spotfire data visualization platform or alternative dashboard solutions.
- Awareness of Agile delivery methodologies.
Nice to have
- Experience with cloud-based database solutions.
- Understanding of data lifecycle management and SOC2 security standards.
- Familiarity with geoscience disciplines, geospatial data and GIS tools (e.g., ArcGIS, QGIS) is advantageous.
- Experience with Python or other scripting languages for automated testing.
- Familiarity with cloud data platforms (e.g., Snowflake, BigQuery, AWS Redshift).
- Knowledge of data quality frameworks and tools (e.g., Great Expectations, dbt tests).