We are looking for a DataBricks developer to assist in developing ETL processes and cleaning up notebooks. The project involves resolving inconsistencies in notebooks and report design. Solid experience with big data, DataBricks, review, and cleanup of notebook design and coding is required.
Experience: 2+ years preferred. This is a work-from-home opportunity, available for both full-time and part-time roles.
Responsibilities:
- Design and implement high-performance data ingestion pipelines from multiple sources using Apache Spark and/or Azure Databricks.
- Present proofs of concept of key technology components to stakeholders.
- Develop scalable, reusable frameworks for data ingestion.
- Ensure data quality and consistency across end-to-end data pipelines from source to target repositories.
- Work with event-based/streaming technologies for data ingestion and processing.
- Collaborate with the project team to support additional components like APIs and search functionalities.
- Support Azure CI/CD pipelines and maintain source control with Azure Dev Ops.
- Evaluate tools against customer requirements.
- Participate in Agile delivery and DevOps practices for iterative proof of concept and production deployment.
Knowledge and Skills:
- Strong understanding of Data Management principles.
- Experience in building ETL and data warehouse transformation processes.
- Hands-on experience with Azure Data Factory and Apache Spark (preferably Databricks).
- Experience with geospatial frameworks on Apache Spark.
- Microsoft Azure Big Data Architecture certification.
- Experience designing solutions using Azure Data Analytics platform including Azure Storage, Azure SQL Data Warehouse, Azure Data Lake, Azure Cosmos DB, and Azure Stream Analytics.
Minimum Requirements:
- Experience with big data, DataBricks, and notebook review and cleanup.
- Proven experience in ETL/data warehouse processes.
- Microsoft Azure Big Data Architecture certification.
- Knowledge of Azure Storage, Azure SQL Data Warehouse, Azure Data Lake, Azure Cosmos DB, Azure Stream Analytics.