Data Pipeline Development: Design, implement, and maintain scalable and efficient data pipelines for processing large datasets. Work with structured and unstructured data from multiple sources, ensuring that data is clean, reliable, and available for analysis.
Machine Learning Model Integration: Collaborate with data scientists to deploy machine learning models into production environments. Support model training, testing, and inference at scale, ensuring models are integrated seamlessly into the data pipeline.
ETL Process Optimization: Design and optimize ETL (Extract, Transform, Load) processes to ensure fast, reliable, and cost-effective data workflows. Automate data extraction and transformation processes using tools like Apache Airflow, Python, or other scripting frameworks.
Data Infrastructure Management: Develop and manage cloud-based data storage solutions (e.g., AWS S3, Azure Blob Storage, Google Cloud Storage) and databases (e.g., AWS Redshift, Google BigQuery). Build and maintain data warehouses or data lakes, ensuring high data availability and scalability.
Collaboration with Cross-functional Teams: Work closely with data scientists, business analysts, and other stakeholders to understand their data requirements and deliver data solutions that support business needs. Translate business requirements into scalable data systems and processes.
Performance Monitoring &Troubleshooting: Monitor the performance of data pipelines and resolve data-related issues quickly to ensure smooth processing and minimal downtime.Troubleshoot issues related to data quality, pipeline failures, and integration problems with AI/ML models.
Automation and Scaling: Automate repetitive data engineering tasks using scripting languages (Python, Bash) and tools (Apache Airflow). Ensure data pipelines and models are scalable and can handle increasing amounts of data over time.
Documentation & Knowledge Sharing: Document data engineering processes, workflows, and technical solutions to ensure knowledge sharing across teams. Provide support and mentoring to junior data engineers and other team members.