Activez les alertes d’offres d’emploi par e-mail !

Big Data & Cloud Data Engineer

Blackfluo.ai

Paris

Sur place

EUR 50 000 - 90 000

Plein temps

Il y a 8 jours

Résumé du poste

A data engineering firm in Paris is seeking a Big Data & Cloud Data Engineer to design and implement large-scale data processing systems using technologies like Hadoop, Spark, and cloud platforms such as AWS and Azure. The ideal candidate will have over 5 years of experience in big data technologies, strong programming skills, and expertise in cloud services. Competitive salary and opportunities for professional growth are offered.

Qualifications

  • 5+ years experience with big data technologies (Hadoop, Spark, Kafka, Hive, HBase).
  • Strong programming skills in Python, Scala, Java, and SQL for data processing.
  • Expert knowledge of at least one major cloud platform (Azure, AWS, GCP).

Responsabilités

  • Design and implement Hadoop ecosystems including HDFS, YARN, and distributed computing frameworks.
  • Develop real-time and batch processing applications using Apache Spark.
  • Configure Apache Kafka for event streaming, data ingestion, and real-time data pipelines.

Connaissances

Big Data technologies
Cloud platforms
Python
Scala
Java
SQL
Docker
Kubernetes
ETL/ELT processes
Data modeling

Formation

Bachelor's degree in Computer Science or related field

Outils

Hadoop
Spark
Kafka
AWS
Azure
GCP
Terraform
CI/CD tools

Description du poste

About the job Big Data & Cloud Data Engineer
Position Overview

We are seeking a Big Data & Cloud Data Engineer to design, implement, and manage large-scale data processing systems using big data technologies (Hadoop, Spark, Kafka) and cloud-based data ecosystems (Azure, GCP, AWS), enabling advanced analytics and real-time data processing capabilities across our enterprise.

Key Responsibilities

Design and implement Hadoop ecosystems including HDFS, YARN, and distributed computing frameworks

Develop real-time and batch processing applications using Apache Spark (Scala, Python, Java)

Configure Apache Kafka for event streaming, data ingestion, and real-time data pipelines

Implement data processing workflows using Apache Airflow, Oozie, and workflow orchestration tools

Build NoSQL database solutions using HBase, Cassandra, and MongoDB for high-volume data storage

Design multi-cloud data architectures using Azure Data Factory, AWS Glue, and Google Cloud Dataflow

Implement data lakes and lakehouses using Azure Data Lake, AWS S3, and Google Cloud Storage

Configure cloud-native data warehouses including Snowflake, BigQuery, and Azure Synapse Analytics

Build serverless data processing solutions using AWS Lambda, Azure Functions, and Google Cloud Functions

Implement containerized data applications using Docker, Kubernetes, and cloud container services

Develop ETL/ELT pipelines for structured and unstructured data processing

Create real-time streaming analytics using Kafka Streams, Apache Storm, and cloud streaming services

Implement data quality frameworks, monitoring, and alerting for production data pipelines

Build automated data ingestion from various sources including APIs, databases, and file systems

Design data partitioning, compression, and optimization strategies for performance

Platform Administration & Optimization

Manage cluster provisioning, scaling, and resource optimization across big data platforms

Monitor system performance, troubleshoot issues, and implement capacity planning strategies

Configure security frameworks including Kerberos, Ranger, and cloud IAM services

Implement backup, disaster recovery, and high availability solutions

Optimize query performance and implement data governance policies

Required Qualifications

Technical Skills

5+ years experience with big data technologies (Hadoop, Spark, Kafka, Hive, HBase)

Strong programming skills in Python, Scala, Java, and SQL for data processing

Expert knowledge of at least one major cloud platform (Azure, AWS, GCP) and data services

Experience with containerization (Docker, Kubernetes) and infrastructure as code (Terraform, CloudFormation)

Proficiency in stream processing frameworks and real-time analytics architectures

Knowledge of data modeling, schema design, and database optimization techniques

Experience with data pipeline orchestration and workflow management tools

Strong understanding of distributed systems, parallel processing, and scalability patterns

Knowledge of data formats (Parquet, Avro, ORC) and serialization frameworks

Experience with version control, CI/CD pipelines, and DevOps practices for data platforms

Preferred Qualifications

Bachelor's degree in Computer Science, Data Engineering, or related field

Cloud certifications (Azure Data Engineer, AWS Data Analytics, Google Cloud Data Engineer)

Experience with machine learning platforms and MLOps frameworks

Background in data governance, data cataloging, and metadata management

Knowledge of emerging technologies (Delta Lake, Apache Iceberg, dbt)

Obtenez votre examen gratuit et confidentiel de votre CV.
ou faites glisser et déposez un fichier PDF, DOC, DOCX, ODT ou PAGES jusqu’à 5 Mo.