Associate Data Engineer (Databricks) (Ref 26210)
JOBLINE RESOURCES PTE. LTD.
Singapore
On-site
SGD 90,000 - 130,000
Full time
Job summary
A leading data consultancy in Singapore seeks an experienced professional to oversee Databricks platform operations, ensuring data quality, governance, and compliance. The ideal candidate has 8-10 years in system operations, extensive knowledge of Databricks, and AWS certification. This role is crucial for optimizing data processes and mentoring junior team members.
Qualifications
- 8-10 years in system operations compliance and management.
- Hands-on experience with Databricks platform.
- Must be cloud certified (AWS).
- Expert-level proficiency in Databricks.
- Strong knowledge of monitoring and incident management.
Responsibilities
- Build and optimize ETL/ELT processes for large data volumes.
- Implement data quality frameworks to ensure data accuracy.
- Monitor and maintain production data pipelines for uptime.
- Establish best practices for data governance and compliance.
- Provide technical guidance and mentorship to junior members.
Skills
Databricks
AWS cloud services
Apache Spark
Delta Lake
Data warehouse concepts
CI/CD pipelines
Monitoring
Incident management
Education
Degree in Computer Science or Computer Engineering
Tools
Databricks Unity Catalog
Tableau
Oracle Database
Responsibilities
- Build and optimize ETL/ELT processes leveraging Databricks' native capabilities to handle large volumes of structured and unstructured data from various sources
- Implement data quality frameworks and monitoring solutions using Databricks data quality features to ensure data accuracy and reliability across all data products
- Establish best practices for data governance, security, and compliance within the Databricks ecosystem and integrate with enterprise systems
- Monitor and maintain production data pipelines to ensure 99.9% uptime and optimal performance across all Databricks workloads and clusters
- Implement comprehensive logging, alerting, and monitoring systems using Databricks monitoring capabilities and integration with enterprise monitoring tools
- Perform regular health checks on Databricks cluster performance, job execution times, and resource utilization to identify and resolve bottlenecks proactively
- Manage incident response procedures for Databricks pipeline failures, including root cause analysis, resolution, and post-incident reviews
- Establish and maintain disaster recovery procedures and backup strategies for critical data assets within the Databricks environment
- Conduct regular performance tuning of Spark jobs and Databricks cluster configurations to optimize cost and execution efficiency
- Implement automated testing frameworks for Databricks-based data pipelines, including unit tests, integration tests, and data validation checks
- Maintain comprehensive documentation for all Databricks operational procedures, runbooks, and troubleshooting guides
- Coordinate scheduled maintenance windows and Databricks system upgrades with minimal business impact
- Manage user access controls, workspace configurations, and security policies within Databricks environments
- Monitor data lineage using Databricks Unity Catalog and maintain metadata management systems to support operational transparency and compliance requirements
- Establish capacity planning processes to forecast Databricks infrastructure needs and manage cloud costs effectively
- Provide technical guidance and mentorship to junior team members on Databricks best practices and data engineering principles
- Participate in on-call rotation for critical production systems with focus on Databricks platform stability
- Lead operational reviews and contribute to continuous improvement initiatives for Databricks platform reliability and efficiency
- Coordinate with infrastructure teams on Databricks cluster provisioning, network configurations, and security implementations
Requirements
- Degree in Computer Science or Computer Engineering
- Minimum 8-10 years working experience in system operations compliance and management areas
- Project hands-on experience specifically with Databricks platform (primary requirement)
- Project experience in cloud operations or cloud architecture
- Must be cloud certified (AWS)
- Databricks certification (Associate or Professional level) - highly preferred
- Exposure to hospital information/clinical systems is an added advantage
- Understanding of DevOps practices and CI/CD pipelines for Databricks-based data engineering projects
- Knowledge of ITIL frameworks and operational best practices
- Expert-level proficiency in Databricks platform, including workspace management, cluster configuration, and job orchestration
- Strong expertise in Apache Spark within Databricks environment, including Spark SQL, DataFrames, and RDDs
- Extensive experience with Delta Lake, including data versioning, time travel, and ACID transactions
- Proficiency in Databricks Unity Catalog for data governance and metadata management
- Good in-depth understanding of data warehouse concepts, data profiling, data verification and advanced analytics techniques
- Strong knowledge of monitoring, incident management, and cloud cost control
- Databricks (primary and most critical skill)
- AWS cloud services and architecture
- IDMC (Informatica Data Management Cloud)
- Tableau for data visualization
- Oracle Database management
- ML Ops practices within Databricks environment (Good to have)
- STATA for statistical analysis is advantage (Good to have)
- Amazon SageMaker integration with Databricks (Good to have)
- DataRobot platform integration (Good to have)
Licence no: 12C6060