Overview
Collaborating and leading part of a cross-functional Agile team to create and enhance software for data ingestion and entity resolution platform.
Responsibilities
- Collaborating and leading part of a cross-functional Agile team to create and enhance software for data ingestion and entity resolution platform.
- Expertise in application, data and infrastructure architecture disciplines.
- Working with large, complex data sets from a variety of sources.
- Participate in the rapid development of user-driven prototypes to identify technical options and inform multiple architectural approaches.
- Building efficient storage and search functions over structured and unstructured data.
- Utilizing programming languages Python, Java, Scala, Relational and NoSQL databases.
- Learning newer technologies for entity resolution such as Quantexa platform.
Basic Qualifications
- Proven track record of a minimum of 4 years in management, in a space with strong focus on large scale data processing and instrumentation.
- Strong coding background, ideally in Java / Python / Scala.
- Strong working knowledge of engineering best practices & big data ecosystem.
- Experience in at least one big data product: Databricks, Elasticsearch, Snowflake.
- Experience building batch / real time data pipelines for production systems.
- Experience with Relational and Non-Relational DBs like DB2, MongoDB.
- Experience with various data formats: Parquet, CSV, JSON, XML, Relational Data.
- Strong familiarity with Kafka, Spark, Hadoop, Iceberg, Airflow, Data Modeling, relational databases, columnar databases.
- Previous working experience in large scale distributed systems.
- Strong familiarity with software engineering principles, including object-oriented and functional programming paradigms, design patterns, and code quality practices.
- Excellent communication skills, with the ability to effectively collaborate with cross-functional teams and explain technical concepts to non-technical stakeholders.
Desired Qualifications
- Experience with Rest based applications.
- Experience with Databricks / Delta Lake.
- Experience with client reference data sourcing from vendors.