Job Purpose:
Responsible for creating and managing the technological part of data infrastructure in every step of data flow. From configuring data sources to integrating analytical tools, all these systems would be architected, built, and managed by a general-role data engineer.
Minimum Education (Essential):
Bachelor's degree in Computer Science or Engineering (or similar)
Minimum Education (Desirable):
- Honors degree in Computer Science or Engineering (or similar)
- AWS Certified Data Engineer
- AWS Certified Solutions Architect
- AWS Certified Data Analyst
Minimum Applicable Experience (Years):
5+ years working experience
Required Nature of Experience:
- Data Engineering development
- Experience with AWS services used for data warehousing, computing, and transformations (e.g., AWS Glue, S3, Lambda, Step Functions, Athena, CloudWatch)
- Experience with SQL and NoSQL databases (e.g., PostgreSQL, MySQL, DynamoDB)
- Experience with SQL for querying and transforming data
Skills and Knowledge (Essential):
- Strong skills in Python (especially PySpark for AWS Glue)
- Strong knowledge of data modeling, schema design, and database optimization
- Proficiency with AWS and infrastructure as code
Skills and Knowledge (Desirable):
- Knowledge of SQL, Python, AWS serverless microservices
- Deploying and managing ML models in production
- Version control (Git), unit testing, and agile methodologies
Data Architecture and Management (20%)
- Design and maintain scalable data architectures using AWS services like S3, Glue, and Athena
- Implement data partitioning and cataloging strategies
- Work with schema evolution and versioning to ensure data consistency
- Develop and manage metadata repositories and data dictionaries
- Support data access roles and privileges setup and maintenance
Pipeline Development and ETL (30%)
- Design, develop, and optimize scalable ETL pipelines with AWS Glue and PySpark
- Implement data extraction, transformation, and loading processes
- Optimize ETL jobs for performance and cost efficiency
- Develop and integrate APIs for data workflows
- Integrate data pipelines with ML workflows for scalable deployment
Automation, Monitoring, and Optimization (30%)
- Automate data workflows ensuring fault tolerance and optimization
- Implement logging, monitoring, and alerting
- Optimize ETL performance and resource usage
- Optimize storage solutions for performance, cost, and scalability
- Deploy ML models into production using AWS Sagemaker
Security, Compliance, and Best Practices (10%)
- Ensure API security, authentication, and access control
- Implement data encryption and compliance with GDPR, HIPAA, SOC2
- Establish data governance policies
Development, Team Mentorship, and Collaboration (5%)
- Work with data scientists, analysts, and business teams to understand data needs
- Collaborate with backend teams for CI/CD integration
- Mentor team members through coaching and code reviews
- Align technology with B2C division strategy
- Identify growth areas within the team
QMS and Compliance (5%)
- Document data processes and architectural decisions
- Maintain high software quality standards and compliance with QMS, security, and data standards
- Ensure compliance with ISO, CE, FDA, and other relevant standards
- Safeguard confidential information and data