Enable job alerts via email!
Generate a tailored resume in minutes
Land an interview and earn more. Learn more
A leading technology company is seeking a Data Engineer to design and implement data solutions using Databricks and cloud platforms. The ideal candidate will have strong skills in developing data processing pipelines, optimizing data processes, and ensuring data governance. You will collaborate with cross-functional teams to deliver advanced analytics and enable machine learning opportunities.
Key Responsibilities:
1. Databricks Development & Implementation:
∙Design and develop scalable data processing pipelines using Databricks, Apache Spark, and Delta Lake.
∙Optimize ETL jobs, batch processing, and real-time streaming workloads.
∙Implement data ingestion strategies using Kafka, IDMC etc.
∙Develop SQL-based transformations and data models in Databricks.
2. Cloud Data Architecture & Integration:
∙Design and implement data lakehouse architectures on AWS.
∙Integrate Databricks with cloud storage (S3) and databases (Databricks, Redshift, BigQuery).
∙Work with Terraform, CloudFormation, to automate Databricks deployments.
∙Implement CI/CD pipelines for Databricks notebooks using GitHub Actions.
3. Performance Optimization & Troubleshooting:
∙Optimize Spark jobs, cluster configurations, and query performance.
∙Monitor and debug Databricks jobs, workflows, and runtime errors.
∙Tune Delta Lake tables for efficient data processing and storage.
4. Security & Data Governance:
∙Implement RBAC (Role-Based Access Control), Unity Catalog, and data masking.
∙Ensure compliance with GDPR, HIPAA, and SOC2 regulations.
∙Manage IAM roles, permissions, and encryption settings for Databricks environments.
5. Collaboration & Support:
∙Work closely with data scientists, analysts, and DevOps teams to enable advanced analytics and machine learning workloads.
∙Provide technical guidance and best practices on Databricks development.
∙Document technical designs, processes, and troubleshooting guides.
Required Skills & Qualifications:
Technical Skills:
∙Strong experience with Databricks, Apache Spark (PySpark, Scala), and Delta Lake.
∙Proficiency in Python, SQL, and Scala for data processing.
∙Experience with cloud platforms (AWS, Azure, or GCP) and data services.
∙Hands-on knowledge of ETL, data warehousing, and lakehouse architectures.
∙Familiarity with Airflow, dbt, or similar workflow orchestration tools.
∙Knowledge of machine learning frameworks (MLflow, TensorFlow, PyTorch) is a plus.
Soft Skills:
∙Strong problem-solving and analytical skills.
∙Ability to work independently and within cross-functional teams.
∙Excellent communication and documentation skills.
∙Ability to manage multiple projects and prioritize tasks effectively.