Duration
Long term Contract
Experience Needed
10+ Years
Key Responsibilities
- Build and maintain scalable ETL / ELT pipelines using Databricks.
- Leverage PySpark / Spark and SQL to transform and process large datasets.
- Integrate data from multiple sources including Azure Blob Storage, ADLS and other relational / non-relational systems.
- Work closely with multiple teams to prepare data for dashboards and BI tools.
- Collaborate with cross-functional teams to understand business requirements and deliver tailored data solutions.
Performance & Optimization
- Optimize Databricks workloads for cost efficiency and performance.
- Monitor and troubleshoot data pipelines to ensure reliability and accuracy.
Governance & Security
- Implement and manage data security, access controls and governance standards using Unity Catalog.
- Ensure compliance with organizational and regulatory data policies.
Deployment
- Leverage Databricks Asset Bundles for deployment of Databricks jobs, notebooks and configurations across environments.
- Manage version control for Databricks artifacts and collaborate with the team to maintain development best practices.
Technical Skills
- Strong expertise in Databricks (Delta Lake, Unity Catalog, Lakehouse Architecture, Table Triggers, Delta Live Pipelines, Databricks Runtime, etc.).
- Proficiency in Azure Cloud Services.
- Solid understanding of Spark and PySpark for big data processing.
- Strong programming skills in Python.
- Experience with relational databases.
- Knowledge of Databricks Asset Bundles and GitLab.
Preferred Experience
- Familiarity with Databricks Runtimes and advanced configurations.
- Knowledge of streaming frameworks like Spark Streaming.
Certifications