Enable job alerts via email!

Aws Data Engineer

Huntwave

Pretoria

On-site

ZAR 600,000 - 900,000

Full time

14 days ago

Boost your interview chances

Create a job specific, tailored resume for higher success rate.

Job summary

A leading company in data solutions seeks a Data Engineer to manage and optimize data infrastructure. The role involves designing scalable data architectures, developing ETL pipelines, and ensuring data security and compliance. Candidates should possess a Bachelor's degree in Computer Science or Engineering and have extensive experience with AWS services and data engineering practices.

Qualifications

  • 5+ years working experience in data engineering development.
  • Experience with AWS services for data warehousing and transformations.
  • Strong skills in Python, especially PySpark for AWS Glue.

Responsibilities

  • Design and maintain scalable data architectures using AWS services.
  • Develop and optimize scalable ETL pipelines with AWS Glue and PySpark.
  • Automate data workflows ensuring fault tolerance and optimization.

Skills

Python
Data modeling
Schema design
Database optimization
AWS services

Education

Bachelor's degree in Computer Science or Engineering
Honors degree in Computer Science or Engineering

Tools

AWS Glue
AWS S3
AWS Lambda
SQL
NoSQL

Job description

Job Purpose:

Responsible for creating and managing the technological part of data infrastructure in every step of data flow. From configuring data sources to integrating analytical tools, all these systems would be architected, built, and managed by a general-role data engineer.

Minimum Education (Essential):

Bachelor's degree in Computer Science or Engineering (or similar)

Minimum Education (Desirable):
  • Honors degree in Computer Science or Engineering (or similar)
  • AWS Certified Data Engineer
  • AWS Certified Solutions Architect
  • AWS Certified Data Analyst
Minimum Applicable Experience (Years):

5+ years working experience

Required Nature of Experience:
  • Data Engineering development
  • Experience with AWS services used for data warehousing, computing, and transformations (e.g., AWS Glue, S3, Lambda, Step Functions, Athena, CloudWatch)
  • Experience with SQL and NoSQL databases (e.g., PostgreSQL, MySQL, DynamoDB)
  • Experience with SQL for querying and transforming data
Skills and Knowledge (Essential):
  • Strong skills in Python (especially PySpark for AWS Glue)
  • Strong knowledge of data modeling, schema design, and database optimization
  • Proficiency with AWS and infrastructure as code
Skills and Knowledge (Desirable):
  • Knowledge of SQL, Python, AWS serverless microservices
  • Deploying and managing ML models in production
  • Version control (Git), unit testing, and agile methodologies
Data Architecture and Management (20%)
  • Design and maintain scalable data architectures using AWS services like S3, Glue, and Athena
  • Implement data partitioning and cataloging strategies
  • Work with schema evolution and versioning to ensure data consistency
  • Develop and manage metadata repositories and data dictionaries
  • Support data access roles and privileges setup and maintenance
Pipeline Development and ETL (30%)
  • Design, develop, and optimize scalable ETL pipelines with AWS Glue and PySpark
  • Implement data extraction, transformation, and loading processes
  • Optimize ETL jobs for performance and cost efficiency
  • Develop and integrate APIs for data workflows
  • Integrate data pipelines with ML workflows for scalable deployment
Automation, Monitoring, and Optimization (30%)
  • Automate data workflows ensuring fault tolerance and optimization
  • Implement logging, monitoring, and alerting
  • Optimize ETL performance and resource usage
  • Optimize storage solutions for performance, cost, and scalability
  • Deploy ML models into production using AWS Sagemaker
Security, Compliance, and Best Practices (10%)
  • Ensure API security, authentication, and access control
  • Implement data encryption and compliance with GDPR, HIPAA, SOC2
  • Establish data governance policies
Development, Team Mentorship, and Collaboration (5%)
  • Work with data scientists, analysts, and business teams to understand data needs
  • Collaborate with backend teams for CI/CD integration
  • Mentor team members through coaching and code reviews
  • Align technology with B2C division strategy
  • Identify growth areas within the team
QMS and Compliance (5%)
  • Document data processes and architectural decisions
  • Maintain high software quality standards and compliance with QMS, security, and data standards
  • Ensure compliance with ISO, CE, FDA, and other relevant standards
  • Safeguard confidential information and data
Get your free, confidential resume review.
or drag and drop a PDF, DOC, DOCX, ODT, or PAGES file up to 5MB.