Job Summary
As a Data Engineer, you will support the Senior Manager of Analytics & Insights in delivering advanced analytics and data-driven initiatives. You will be responsible for designing, building, and maintaining robust data infrastructure and pipelines within a hybrid environment. Your role is critical in ensuring that data is secure, reliable, and optimized to empower data scientists and analysts in achieving measurable business impact.
Job Responsibility
- Design and implement ingestion, transformation, and integration pipelines from multiple sources, including on-prem SQL Server, APIs, and streaming data.
- Build and maintain scalable data lake/lakehouse environments (Azure Data Lake, Delta-style storage) for both structured and unstructured data.
- Set up and manage collaborative workspaces and sandbox environments, such as Databricks and JupyterHub, for analytics and ML development.
- Implement modern ETL/ELT frameworks, orchestration tools, and automation for batch and real-time processing.
- Enforce data governance best practices, including role-based access, encryption, compliance, and metadata management via data catalogs.
- Execute data quality checks, validation, and monitoring across all data flows to ensure high-fidelity outputs.
- Collaborate with data scientists, analysts, and system vendors to translate business requirements into maintainable engineering solutions.
- Write production-quality Python, SQL, and Spark code while performing rigorous design and code reviews.
- Drive DataOps and MLOps improvements through version control, CI/CD automation, and proactive performance tuning.
- Document architectures, runbooks, and operational procedures to ensure long-term scalability and cost-efficiency.
Job Requirements
- Bachelor’s degree in Computer Science, Information Systems, Engineering, or a related field.
- Minimum of 6 years of hands‑on experience in data engineering, with a proven track record of owning end‑to‑end builds and performance tuning.
- Advanced proficiency in SQL and Python, with specific expertise in Spark for large-scale distributed data processing.
- Deep knowledge of data lakehouse architectures, dimensional modeling, and semantic models for BI tools (e.g., Power BI, Tableau).
- Experience managing hybrid data environments combining on-premise databases with cloud platforms, specifically Azure.
- Hands‑on experience with relational and NoSQL databases, workflow orchestration tools, and real‑time event streaming.
- Proficient in Git and CI/CD practices (Azure DevOps, GitHub Actions) and familiar with the ML model lifecycle (MLOps).
- Strong understanding of data security, encryption, and automated testing for data pipelines.
- Excellent communication skills with the ability to translate complex technical insights into business‑friendly language.
- Demonstrated ability to work independently, manage multiple priorities, and collaborate effectively with cross‑functional teams and external vendors.