Enable job alerts via email!
Boost your interview chances
Create a job specific, tailored resume for higher success rate.
A leading company is seeking a Senior Data Engineer to design and maintain scalable data pipelines and support machine learning workflows. The ideal candidate will have 8–12 years of experience, strong expertise in Databricks and Apache Spark, and a proven ability to build scalable data platforms in cloud environments. This remote role offers the chance to work with cutting-edge data technologies and collaborate closely with Data Scientists.
Job Title: Senior Data Engineer
Experience: 8–12 Years
Location: Remote
Shift Timings: 11 am-8:30 pm IST
Job Summary:
We are seeking an experienced and highly skilled Senior Data Engineer with strong expertise in Databricks, data architecture, and integration of AI/ML models. The ideal candidate will have 8–12 years of experience in data engineering, and a proven track record in building scalable data platforms, architecting complex data pipelines, and supporting machine learning workflows in a cloud-based environment.
Key Responsibilities:
Design, build, and maintain scalable data pipelines using Databricks and Apache Spark.
Develop and maintain robust data architecture and ETL/ELT frameworks for structured and unstructured data.
Collaborate with Data Scientists and ML Engineers to enable end-to-end ML model pipelines — from data ingestion and feature engineering to model deployment and monitoring.
Ensure data quality, integrity, and governance across various data sources and systems.
Optimize data processing jobs for performance and cost-effectiveness in cloud environments (preferably Azure or AWS).
Implement best practices for CI/CD, data versioning, and workflow orchestration (e.g., using Airflow, MLflow, Delta Lake, etc.).
Required Skills and Qualifications:
8–12 years of experience in data engineering, with a strong background in big data technologies.
Expertise in Databricks, Apache Spark, Delta Lake, and associated cloud data services.
Solid experience in designing and implementing data lakehouses, data warehouses, and real-time/streaming data pipelines.
Proficient in Python, SQL, and at least one cloud platform (Azure preferred; AWS/GCP also acceptable).
Strong understanding of ML lifecycle, including feature stores, model training, and inference pipelines.
Hands-on experience integrating and scaling AI/ML models in production environments.
Experience with CI/CD tools, infrastructure as code, and version control (e.g., Git, Terraform).
Preferred Qualifications:
Experience with Kafka, Event Hubs, or other streaming platforms.
Familiarity with MLOps practicesandtools.