Data Engineer - £350PD - Remote
Required Technical Skills
Data Pipeline & ETL
- Design, build, and maintain robust ETL/ELT pipelines for structured and unstructured data
- Hands‑on experience with AWS Glue and AWS Step Functions
- Implementation of data validation, data quality frameworks, and reconciliation checks
- Strong error handling, monitoring, and retry strategies in production pipelines
- Experience with incremental data processing patterns (CDC, watermarking, upserts)
AWS Data Services
- Amazon S3: data lake architectures, partitioning strategies, lifecycle policies
- DynamoDB: data modeling, secondary indexes, streams, and performance optimization
- Amazon Redshift: foundational querying, integrations, and performance considerations
- AWS Lambda for scalable data processing and orchestration
- Amazon EventBridge for event‑driven and decoupled data pipelines
Vector Databases & Embeddings
- Strong understanding of vector database concepts, indexing strategies, and performance trade‑offs
- Design and implementation of embedding generation pipelines
- Optimization techniques for semantic search and retrieval accuracy
- Effective chunking strategies for document ingestion and processing
- Experience with CockroachDB deployment and management is beneficial
Document Processing
- Experience with PDF parsing libraries such as PyPDF2, pdfplumber, and AWS Textract
- Integration of OCR solutions (AWS Textract, Tesseract) for scanned documents
- Extraction of document structure (headings, tables, sections)
- Metadata extraction, normalization, and enrichment
- Handling of multiple document formats including PDF, HTML, and DOCX
Data Integration
- Familiarity with SAP data structures is beneficial
- Integration with PIM (Product Information Management) systems
- Design and consumption of REST APIs
Programming & Querying
- Python (advanced): pandas, numpy, boto3, and data processing best practices
- SQL (advanced): complex queries, performance tuning, and query optimisation
Data Quality & Governance
- Data profiling and ongoing quality assessment
- Schema validation and evolution strategies
- Data lineage tracking and observability
- Understanding of Master Data Management (MDM) concepts
Domain Knowledge
- Product catalog data models and hierarchies
- E‑commerce data patterns and integrations
- B2B data exchange and system integration
To apply for this role please submit your CV or contact Dillon Blackburn on (phone number removed) or at (url removed).