Data Engineer (Control M, Hadoop, Spark, Hive, Big Data)
NEPTUNEZ SINGAPORE PTE. LTD.
Singapore
On-site
SGD 60,000 - 90,000
Full time
8 days ago
Boost your interview chances
Create a job specific, tailored resume for higher success rate.
Job summary
A leading data solutions provider is seeking a skilled data engineer to design and optimize data pipelines for efficient data migration to Hadoop-based platforms. The role involves collaborating with cross-functional teams and ensuring data quality and operational stability. Candidates should be proficient in Python, SQL, and Hadoop ecosystem tools, ideally with experience in the financial services sector.
Qualifications
- Proficiency in Python and SQL required.
- Strong experience with Hadoop, Hive, and Spark is essential.
- Background in banking or financial services preferred.
Responsibilities
- Design, implement, and maintain data pipelines for data migration.
- Develop and optimize data processing jobs using Spark, Hive, and Python.
- Ensure end-to-end operational stability of data pipelines.
Skills
Python
SQL
Hadoop
Spark
Informatica
Data Management
Control-M
Responsibilities:
- Design, implement, and maintain data pipelines for the migration of data from on-premises systems to Hadoop-based platforms.
- Develop and optimize data processing jobs using Spark, Hive, and Python.
- Manage job orchestration and scheduling using Control-M, ensuring timely and accurate data delivery.
- Collaborate with cross-functional teams to understand data requirements and deliver efficient solutions.
- Perform code quality checks and peer reviews to ensure best practices, maintainability, and adherence to coding standards.
- Ensure end-to-end operational stability of data pipelines by proactively identifying and resolving bottlenecks, failures, and data quality issues.
- Ensure data quality through rigorous cleaning and validation processes.
- Documented data flow processes, transformation logic, and framework usage to support onboarding and troubleshooting.
Requirements:
- Proficiency in Python and SQL.
- Strong experience with Hadoop ecosystem tools (Hive, Spark).
- Worked extensively with transformation components, mapping development, and workflow orchestration in Informatica/DataStage.
- Experience with job scheduling and monitoring using Control-M.
- Familiar with pipeline-as-code concepts and using Jenkins files for automation of build and deployment processes.
- Solid understanding of database systems including Teradata, Oracle, and SQL Server.
- Ability to analyze and troubleshoot large-scale data processing systems.
- Experience in the banking or financial services industry.
- Knowledge of data warehousing concepts and star/snowflake schema design.