Key Responsibilities
- Manage, monitor, and troubleshootcloud-based data pipelines.
- Optimize data pipelines to enhanceperformance, cost-efficiency, and reliability.
- Automate repetitive tasks indata processing and management.
- PerformSLA-oriented monitoringof critical pipelines and ensure adherence.
- Analyze, troubleshoot, and resolvecomplex pipeline issuesand conductpost-incident reviews.
- Maintaininfrastructure reliabilityfor data pipelines and MDM jobs.
- Implement and improvemonitoring, alerting, and testing mechanismsfor pipeline reliability.
- Develop and maintaintechnical documentationfor data pipeline systems and processes.
- Collaborate with stakeholders, ensuring effectivecommunication and reporting.
- Open to working in a24x7 shift environment.
Required Skills & Experience
- 5+ yearsof industry experience inData Engineering support and enhancement.
- Proficiency incloud platforms: GCP, Azure, or AWS (experience with BigQuery, Cloud Storage, GKE, Glue, DMS, Athena, Lake Formation preferred).
- Strong understanding ofdata pipeline architectures and ETL processes.
- Excellent programming skills inPython (Pandas, PyArrow, Ibis)for data processing and automation.
- Proficiency inSQLfor data analysis with relational databases.
- Hands-on experience withApache PySparkand/orApache Flink(or Kafka Streams, Apache Storm).
- Practical knowledge ofDocker & Kubernetes (containerization and orchestration).
- Experience withGit, CI/CD pipelines, and automated testing.
- Strongtroubleshooting and problem-solving skills.
- Proven ability toanalyze, optimize, and maintain high-reliability pipelines.
- Strongdocumentation and communication skills.
Qualifications
- Bachelor’s degree inComputer Scienceor a related technical field (or equivalent practical experience).
- Cloud Professional Data Engineer certificationis an advantage.
- Excellentverbal and written communicationskills.
Nice to Have (Preferred Skills)
- Knowledge ofFHIR R4 standardsandhealthcare data interoperability.
- Experience withdata visualization tools(Google Looker, Tableau, Power BI).
- Hands-on withPyFlink(Python API for Apache Flink) andFlink SQL.
- Familiarity withPixi environment managementand modernPython dependency management.
- Experience withApache Icebergor other modern data lake storage formats.
- Understanding ofreal-time vs. batch processing trade-offs, particularly in healthcare data systems.