Overall 5-7 years of experience in data engineering and transformation on Cloud
3+ Years of Very Strong Experience in Azure Data Engineering, Databricks
Expertise in supporting/developing lakehouse workloads at enterprise level
Experience in pyspark is required – developing and deploying the workloads to run on the Spark distributed computing
Candidate must possess at least a Graduate or bachelor’s degree in Computer Science/Information Technology, Engineering (Computer/Telecommunication) or equivalent.
Cloud deployment: Preferably Microsoft azure
Experience in implementing the platform and application monitoring using Cloud native tools
1. Data Architecture & Design
Strong understanding of database architectures, including relational, NoSQL, and data warehousing.
Ability to design and implement data pipelines, data lakes, and data warehouses.
Knowledge of ETL (Extract, Transform, Load) processes.
2. Programming & Scripting
Languages: Expertise in Python, Java, Scala, and SQL.
Familiarity with data manipulation libraries like Pandas, Dask, and NumPy (for Python).
Ability to write efficient queries and scripts for data processing and analysis.
3. Big Data Technologies
Proficiency in big data frameworks like Hadoop, Spark, and Flink.
Experience with distributed computing systems and handling large datasets.
4. Cloud Platforms
Knowledge of cloud platforms such as AWS, Google Cloud Platform (GCP), and Microsoft Azure.
Experience with cloud-based data solutions like Amazon Redshift, Google BigQuery, or Azure Data Lake.
5. Data Modeling
Expertise in data modeling techniques, such as dimensional modeling, ER modeling, and star/snowflake schemas.
Ability to design schema structures that support business intelligence and analytical needs.
6. Data Pipeline Management
Experience in creating and maintaining data pipelines for large-scale data processing.
Familiarity with tools like Apache Airflow, Luigi, or dbt for orchestrating workflows.
7. Data Warehousing & SQL Optimization
Expertise in designing and optimizing data warehousing solutions.
Strong skills in SQL tuning, optimization, and performance improvement for handling large datasets.
8. Data Governance & Security
Understanding of data governance practices, such as data privacy, metadata management, and ensuring data quality.
Ability to enforce data security standards and practices for data protection.
9. Team Leadership & Project Management
Ability to lead a team of data engineers, collaborate with cross-functional teams, and manage data engineering projects.
Familiarity with project management frameworks like Agile and Scrum.
10. Automation & CI/CD
Experience in automating data workflows and deployment pipelines.
Knowledge of CI/CD tools (e.g., Jenkins, GitLab) for version control and automation.
11. Machine Learning Integration (Optional)
While not strictly necessary for all data engineering roles, experience integrating data pipelines with machine learning models or supporting the deployment of models can be valuable.
Desired candidate profile
1. Data Architecture & Design
Strong understanding of database architectures, including relational, NoSQL, and data warehousing.
Ability to design and implement data pipelines, data lakes, and data warehouses.
Knowledge of ETL (Extract, Transform, Load) processes.
2. Programming & Scripting
Languages: Expertise in Python, Java, Scala, and SQL.
Familiarity with data manipulation libraries like Pandas, Dask, and NumPy (for Python).
Ability to write efficient queries and scripts for data processing and analysis.
3. Big Data Technologies
Proficiency in big data frameworks like Hadoop, Spark, and Flink.
Experience with distributed computing systems and handling large datasets.
4. Cloud Platforms
Knowledge of cloud platforms such as AWS, Google Cloud Platform (GCP), and Microsoft Azure.
Experience with cloud-based data solutions like Amazon Redshift, Google BigQuery, or Azure Data Lake.
5. Data Modeling
Expertise in data modeling techniques, such as dimensional modeling, ER modeling, and star/snowflake schemas.
Ability to design schema structures that support business intelligence and analytical needs.
6. Data Pipeline Management
Experience in creating and maintaining data pipelines for large-scale data processing.
Familiarity with tools like Apache Airflow, Luigi, or dbt for orchestrating workflows.
7. Data Warehousing & SQL Optimization
Expertise in designing and optimizing data warehousing solutions.
Strong skills in SQL tuning, optimization, and performance improvement for handling large datasets.
8. Data Governance & Security
Understanding of data governance practices, such as data privacy, metadata management, and ensuring data quality.
Ability to enforce data security standards and practices for data protection.
9. Team Leadership & Project Management
Ability to lead a team of data engineers, collaborate with cross-functional teams, and manage data engineering projects.
Familiarity with project management frameworks like Agile and Scrum.
10. Automation & CI/CD
Experience in automating data workflows and deployment pipelines.
Knowledge of CI/CD tools (e.g., Jenkins, GitLab) for version control and automation.
11. Machine Learning Integration (Optional)
While not strictly necessary for all data engineering roles, experience integrating data pipelines with machine learning models or supporting the deployment of models can be valuable.