We are a Digital Product Engineering company that is scaling in a big way! We build products, services, and experiences that inspire, excite, and delight. We work at scale — across all devices and digital mediums, and our people exist everywhere in the world (18000+ experts across 37 countries, to be exact). Our work culture is dynamic and non-hierarchical. We are looking for great new colleagues. That is where you come in!
Job Description
Design, develop, and maintain scalable data pipelines and ETL processes on Google Cloud Platform (GCP).
Implement and optimize data storage solutions using BigQuery, Cloud Storage, and other GCP services.
Collaborate with data scientists, machine learning engineers, data engineers, and other stakeholders to integrate and deploy machine learning models into production environments.
Develop and maintain custom deployment solutions for machine learning models using tools such as Kubeflow, AI Platform, and Docker.
Write clean, efficient, and maintainable code in Python and PySpark for data processing and transformation tasks.
Ensure data quality, integrity, and consistency through data validation and monitoring processes.
Deep understanding of Medallion architecture.
Develop metadata-driven pipelines and ensure optimal processing of data.
Use Terraform to manage and provision cloud infrastructure resources on GCP.
Troubleshoot and resolve production issues related to data pipelines and machine learning models.
Stay up-to-date with the latest industry trends and best practices in data engineering, machine learning, and cloud technologies.
Understand data lifecycle management, data pruning, model drift, and model optimizations.
Qualifications
Must have Skills: Machine Learning - General Experience, Visualization, Google Cloud Platform, PySpark.
Bachelor's or Master's degree in Computer Science, Engineering, or a related field.
Proven experience as a Data Engineer with a focus on GCP.
Strong proficiency in Python and PySpark for data processing and transformation.
Hands-on experience with machine learning model deployment and integration on GCP.
Familiarity with GCP services such as BigQuery, Cloud Storage, Dataflow, and AI Platform.
Experience with Terraform for infrastructure as code.
Experience with containerization and orchestration tools like Docker and Kubernetes.
Strong problem-solving skills and the ability to troubleshoot complex issues.
Excellent communication and collaboration skills.
Additional Information
Preferred Qualifications:
Experience with custom deployment solutions and MLOps.
Knowledge of other cloud platforms (AWS, Azure) is a plus.
Familiarity with CI/CD pipelines and tools like Jenkins or GitLab CI.
Visualization experience is nice to have but not mandatory.
Certification in GCP Data Engineering or related fields.