Senior Data Quality Analyst – Capgemini
About Capgemini
Capgemini is a global leader in consulting, digital transformation, technology, and engineering services. With a presence in over 50 countries and a strong heritage of innovation, Capgemini enables organizations to realize their business ambitions through an array of services from strategy to operations. Our collaborative approach and a people-centric work culture have made us a partner of choice for clients across industries.
Role Overview
We are seeking a highly skilled Senior Data Quality Analyst with a robust background in designing, implementing, and maintaining data quality frameworks leveraging Python or Collibra. The ideal candidate will be adept at ensuring data accuracy, consistency, completeness, and reliability across large-scale cloud-based platforms, especially within Azure Databricks environments. This role requires expertise in automated data quality assurance, a deep understanding of data governance, and hands-on experience integrating quality controls into modern data pipelines.
The Senior Data Quality Analyst will be embedded within an agile squad dedicated to a specific business mission while contributing to a broader program comprising 4 to 8 interconnected squads. Collaboration, technical leadership, and a continuous improvement mindset are essential as you work cross-functionally to elevate the organization’s data quality standards.
Key Responsibilities
1. Development & Integration
- Design, develop, and implement automated data quality checks using Python scripts and libraries or Collibra Data Quality components.
- Integrate data quality validation logic within existing ETL/ELT pipelines operating on Azure Databricks, ensuring quality gates are consistently enforced across all data flows.
- Develop and maintain reusable Python modules that perform anomaly detection, schema validation, and rule-based data quality checks to enable rapid scaling of quality coverage.
- Collaborate with data engineering teams to embed continuous quality controls throughout the data ingestion, transformation, and consumption lifecycle.
- Support the deployment and management of Collibra-based data quality solutions to automate governance workflows and stewardship activities.
2. Data Quality Management
- Define, measure, and rigorously enforce data quality metrics, thresholds, and Service Level Agreements (SLAs) tailored to business-critical datasets.
- Utilize Collibra to manage and operationalize data governance workflows, maintain business glossaries, and delineate stewardship responsibilities.
- Monitor the integrity of data pipelines for completeness, accuracy, timeliness, and consistency across distributed and cloud-native environments.
- Conduct detailed root cause analyses for complex data quality issues, collaborating with engineers and domain experts to drive permanent remediation and prevention strategies.
- Implement and continuously refine monitoring frameworks, utilizing dashboards and alerting systems (built using Python and Collibra integrations) for real-time visibility into key data quality indicators.
3. Support & Operations
- Act as a Level 2/3 escalation point for data quality incidents, troubleshooting issues and coordinating with other agile squads and technical teams for rapid resolution.
- Work closely with product owners, business analysts, and key stakeholders to understand evolving data requirements and ensure quality expectations are aligned and met.
- Maintain and optimize operational dashboards for ongoing data quality monitoring, leveraging both Python-based and Collibra-integrated solutions.
- Participate actively in agile ceremonies, including sprint planning, daily standups, reviews, and retrospectives, contributing to squad goals and continuous delivery improvements.
4. Governance & Best Practices
- Establish, document, and evangelize data quality standards, validation frameworks, and best practices across squads and the broader data organization.
- Maintain comprehensive documentation on validation rules, automated test cases, and quality assurance procedures, ensuring transparency and repeatability.
- Mentor, coach, and upskill junior data engineers and analysts in data quality concepts, tools, and processes to foster a quality-first culture.
- Ensure strict compliance with data governance, privacy, and security policies by leveraging Collibra’s governance and stewardship frameworks.
- Continuously assess emerging technologies, tools, and methodologies for potential enhancement of the data quality ecosystem.
Qualifications
- Bachelor’s or Master’s degree in Computer Science, Data Management, Information Systems, or a closely related field.
- Years of progressive experience in data quality engineering, data management, or related data roles within complex technology environments.
- Demonstrable expertise in Python, including the development of reusable data quality and validation libraries.
- Extensive hands-on experience with Azure Databricks, including cloud-native data processing, ETL/ELT orchestration, and distributed computing concepts.
- Proficiency with Collibra Data Quality platform or equivalent data governance and stewardship tools.
- Strong track record working in agile environments, participating in cross-functional teams, and adapting to rapidly evolving project requirements.
- Excellent analytical, problem-solving, and communication skills, with the ability to convey complex technical topics to both technical and non-technical audiences.
Preferred Certifications (One or More)
- Databricks Certified Data Engineer Associate or Professional
- Microsoft Certified: Azure Data Engineer Associate
- Python Institute Certifications (PCAP, PCPP)
- Collibra Ranger or Collibra Data Quality Steward Certifications
Key Skills & Competencies
- Deep understanding of data quality frameworks, methodologies, and industry best practices
- Hands-on experience building automated data quality tests using Python, PySpark, or similar open-source libraries
- Expertise in designing quality validation steps within ETL/ELT data pipelines for large volumes of structured and semi-structured data
- Familiarity with cloud data ecosystems, especially Azure and Databricks
- Proven ability to operationalize and scale data governance using Collibra or comparable tools
- Experience with dashboarding, data visualization, and monitoring tools for real-time data quality tracking
- Strong collaboration, leadership, and mentoring abilities within agile squads or matrix teams
- Knowledge of data privacy, security, and regulatory compliance requirements
- Ability to drive innovation and continuous improvement in data quality processes
What We Offer
- Opportunity to work on cutting-edge data platforms and technologies in a global, multicultural environment
- Collaborative and agile work culture with empowering career growth opportunities
- Competitive remuneration, benefits, and professional certification support
- Access to Capgemini’s global learning platforms, mentorship programs, and technology communities
- Exposure to high-impact projects with Fortune 500 clients