Job Description
This role requires a strong background in data engineering and analytics, with expertise in various data processing and database technologies. The candidate will be responsible for developing, maintaining, and optimizing data pipelines and models to support business intelligence and analytics initiatives.
Required Skills:
- Proficiency in Python, including unit testing and data packages such as Pandas, SQLAlchemy, and Alembic.
- Strong SQL skills, including DDL, DML, window functions, CTEs, sub-queries, joins, and performance profiling across platforms like Spark and PostgreSQL.
- Experience with Spark, including PySpark, SparkSQL, batch and streaming processing, partitioning, delta tables, and parquet.
- Knowledge of Databricks environment, including workflows, clusters, SQL Warehouse, Unity Catalog, and performance profiling.
- Understanding of streaming data solutions like Azure EventHubs, with the ability to process and scale streaming data efficiently.
- Experience with PostgreSQL, focusing on query optimization, indexing, JSON columns, and performance tuning.
- Data modeling expertise, particularly dimensional modeling and normalization for BI tools.
- Experience with containerization tools such as Docker.
- Familiarity with Infrastructure as Code and CI/CD tools like Kubernetes, Argo, Crossplane, Terraform.
- Knowledge of database migration tools like Alembic, Flyway, Liquibase.
- Understanding of SQL and NoSQL databases, and how to choose the appropriate database for different use cases.
- Ability to query logs using tools like KQL (Azure) or similar.
Desired Skills:
- Experience with SQL Server, including query optimization and maintenance.
- Familiarity with DBT in Databricks environment.
- Knowledge of PowerBI for creating semantic models.
- Experience with MLFlow for experiment tracking and model registration.
Qualifications and Experience:
- Degree in Computer Science or equivalent experience.
- Professional experience in data ingestion, ETL, and ELT processes for structured and unstructured data.
- Proficiency in Python and SQL for analytics, database development, and data modeling.
- Experience with DevOps and CI/CD pipelines for data applications.
- Experience working with cloud platforms, preferably Azure.
- Understanding of Agile methodologies and experience working in agile teams.
Responsibilities:
- Support, maintain, optimize, and create ETL/ELT pipelines, both batch and streaming, using Databricks (PySpark, DatabricksSQL), Python, SQL, and DBT.
- Design and model data objects, with proficiency in dimensional modeling and normalization.
- Write and perform tests for data flows.
- Collaborate with cross-functional teams including developers, data scientists, and business analysts to deliver solutions.
- Coordinate with platform teams to utilize infrastructure efficiently, including CI/CD deployment.
- Work in the PST timezone as required.