Enable job alerts via email!

AWS Data Engineer

wePlace

Pretoria

On-site

ZAR 700,000 - 950,000

Full time

Yesterday
Be an early applicant

Boost your interview chances

Create a job specific, tailored resume for higher success rate.

Job summary

A leading company in data solutions is looking for a Data Engineer to manage and optimize their data infrastructure. You will be responsible for designing ETL pipelines, ensuring data quality, and implementing best practices in a collaborative environment. The ideal candidate has a strong background in AWS technologies, data modeling, and Python programming, and is ready to tackle complex data challenges to enhance business intelligence.

Qualifications

  • 5+ years working experience in data engineering.
  • Strong experience with AWS services for data warehousing.
  • Proficiency in Python, especially PySpark.

Responsibilities

  • Design and maintain scalable data architectures.
  • Develop and optimize ETL pipelines using AWS Glue and PySpark.
  • Implement logging and monitoring for data pipelines.

Skills

Python
Data Modeling
Schema Design
Database Optimization

Education

Bachelors degree in Computer Science
Honors degree in Computer Science

Tools

AWS Glue
AWS S3
AWS Lambda
AWS Athena

Job description

Responsible for creating and managing the technological part of data infrastructure in every step of data flow. From configuring data sources to integrating analytical tools all these systems would be architected, built and managed by a general-role Data Engineer.

Bachelors degree in Computer Science or Engineering (or similar)

  • Honors degree in Computer Science or Engineering (or similar)
  • AWS Certified Solutions Architect; or
  • AWS Certified Data Analyst

Minimum applicable experience (years) :

5+ years working experience

Required nature of experience :

  • Experience with AWS services used for data warehousing, computing and transformations i.e. AWS Glue (crawlers, jobs, triggers, and catalog), AWS S3, AWS Lambda, AWS Step Functions, AWS Athena and AWS CloudWatch
  • Experience with SQL and NoSQL databases (e.g., PostgreSQL, MySQL, DynamoDB)
  • Experience with SQL for querying and transformation of data

Skills and Knowledge (essential) :

  • Strong skills in Python (especially PySpark for AWS Glue)
  • Strong knowledge of data modelling, schema design and database optimization
  • Proficiency with AWS and infrastructure as code

Skills and Knowledge (desirable) :

  • Knowledge of SQL, Python, AWS serverless microservices,
  • Deploying and managing ML models in production
  • Version control (Git), unit testing and agile methodologies

Data Architecture and Management 20%

  • Design and maintain scalable data architectures using AWS services for example, but not limited to, AWS S3, AWS Glue and AWS Athena.
  • Implement data partitioning and cataloging strategies to enhance data organization and accessibility.
  • Work with schema evolution and versioning to ensure data consistency.
  • Develop and manage metadata repositories and data dictionaries.
  • Assist and support with defining setup and maintenance of data access roles and privileges.

Pipeline Development and ETL 30%

  • Design, develop and optimize scalable ETL pipelines using batch and real-time processing frameworks (using AWS Glue and PySpark).
  • Implement data extraction, transformation and loading processes from various structured and unstructured sources.
  • Optimize ETL jobs for performance, cost efficiency and scalability.
  • Develop and integrate APIs to ingest and export data between various source and target systems, ensuring seamless ETL workflows.
  • Enable scalable deployment of ML models by integrating data pipelines with ML workflows.

Automation, Monitoring and Optimization 30%

  • Automate data workflows and ensure they are fault tolerant and optimized.
  • Implement logging, monitoring and alerting for data pipelines.
  • Optimize ETL job performance by tuning configurations and analyzing resource usage.
  • Optimize data storage solutions for performance, cost and scalability.
  • Ensure the optimisation of AWS resources for scalability for data ingestion and outputs.
  • Deploy machine learning models into productions using cloud-based services like AWS SageMaker.

Security, Compliance and Best Practices 10%

  • Ensure API security, authentication and access control best practices.
  • Implement data encryption, access control and compliance with GDPR, HIPAA, SOC2 etc.
  • Establish data governance policies, including access control and security best practices.

Development Team Mentorship and Collaboration 5%

  • Work closely with data scientists, analysts and business teams to understand data needs.
  • Collaborate with backend teams to integrate data pipelines into CI / CD.
  • Assist with developmental leadership to the team through coaching, code reviews and mentorship.
  • Ensure technological alignment with B2C division strategy supporting overarching strategy and vision.
  • Identify and encourage areas for growth and improvement within the team.

QMS and Compliance 5%

Document data processes, transformations and architectural decisions.

Maintain high standards of software quality within the team by adhering to good processes, practices and habits, including compliance to QMS system, and data and system security requirements.

Ensure compliance to the established processes and standards for the development lifecycle, including but not limited to data archival.

Drive compliance to the Quality Management System in line with the Quality Objectives, Quality Manual, and all processes related to the design, development and implementation of software related to medical devices.

Comply to ISO, CE, FDA (and other) standards and requirements as is applicable to assigned products.

Safeguard confidential information and data.

Should you not receive a response from us within one week of your application, your application has unfortunately not been successful.

Create a job alert for this search
We Care About Your Privacy

We and our 1 partners store and access personal data, like browsing data or unique identifiers, on your device. Selecting I Accept enables tracking technologies to support the purposes shown under we and our partners process data to provide. Selecting Reject All or withdrawing your consent will disable them. If trackers are disabled, some content and ads you see may not be as relevant to you. You can resurface this menu to change your choices or withdraw consent at any time by clicking the Show Purposes link on the bottom of the webpage [or the floating icon on the bottom-left of the webpage, if applicable]. Your choices will have effect within our Website. For more details, refer to our Privacy Policy.

We and our partners process data to provide:

Use precise geolocation data. Actively scan device characteristics for identification. Store and/or access information on a device. Personalised advertising and content, advertising and content measurement, audience research and services development.

Get your free, confidential resume review.
or drag and drop a PDF, DOC, DOCX, ODT, or PAGES file up to 5MB.