Key Responsibilities:
- Design, build, and optimize scalable data pipelines for data extraction, transformation, and loading (ETL/ELT).
- Develop and maintain data architectures, databases, and data warehouses.
- Collaborate with data scientists and analysts to understand data needs and deliver clean, structured datasets.
- Monitor and improve the performance of data systems.
- Implement and enforce data quality, security, and governance practices.
- Work with cloud platforms (e.g., AWS, Azure, GCP) for data storage and processing.
- Automate data processes and workflows using orchestration tools like Airflow or similar.
- Maintain documentation of data models, schemas, and systems.
Required Skills and Qualifications:
- Bachelor's degree in Computer Science, Engineering, Information Technology, or related field.
- Proven experience as a Data Engineer or in a similar role.
- Proficient in programming languages such as Python, Scala, or Java.
- Strong SQL skills and experience with relational and NoSQL databases (e.g., PostgreSQL, MySQL, MongoDB).
- Hands-on experience with ETL tools and data pipeline frameworks (e.g., Apache Spark, Kafka, Airflow).
- Experience with cloud data platforms like AWS (Redshift, Glue, S3), Azure (Data Factory, Synapse), or GCP (BigQuery, Dataflow).
- Familiarity with data modeling, data warehousing concepts, and data lakes.
- Strong problem-solving skills and attention to detail.
Preferred Qualifications:
- Experience with big data technologies like Hadoop, Hive, or Presto.
- Knowledge of CI/CD and DevOps practices in data engineering.
- Experience with version control tools (e.g., Git).
- Understanding of data privacy and compliance standards (e.g., GDPR, HIPAA).