Job Description: Data Engineer (Python | SQL | Semi‑Structured Data | ES APIs)
Experience Required: 6–8 years
Skills: Python, SQL, Data Modeling, ETL/ELT, DBeaver, SQLite/Postgres/Dremio, API Integration
Role Summary
We are seeking a highly skilled Data Engineer with strong Python and SQL expertise to build reliable, scalable data pipelines that transform semi‑structured data from ES URLs/APIs into clean, analytics‑ready datasets. You will work primarily in a local environment (Python, DBeaver, SQLite/Postgres/Dremio), establish database connections, flatten and normalize JSON/Elasticsearch topics, and prepare datasets for downstream Power BI reporting. This role requires deep hands‑on engineering ability, strong data modeling skills, and clear communication with business stakeholders.
Key Responsibilities
- Data Ingestion & Transformation
- Extract semi‑structured data from ES URLs, API endpoints, and Elasticsearch topics (JSON-based).
- Flatten, normalize, and structure nested JSON into relational tables suitable for analytics.
- Build reproducible ETL/ELT workflows using Python (pandas, NumPy, SQLAlchemy, requests).
- Implement transformation logic, incremental loads, and schema alignment for downstream use.
- Database Engineering
- Design, create, and maintain database schemas in SQLite, Postgres, and Dremio.
- Configure and manage local DB connections through DBeaver.
- Optimize queries using indexing strategies, caching, and partitioning.
- Implement performance tuning for Python data jobs and SQL queries.
- Data Quality & Governance
- Build and maintain validation rules, deduplication logic, and anomaly detection.
- Establish dataset versioning, lineage tracking, and data contract/documentation.
- Ensure secure handling of API credentials, tokens, and data source endpoints.
- Use Git for version control, perform code reviews, write unit tests, and support CI checks.
- Produce clear documentation, runbooks, and support materials for ad‑hoc data requests.
- Reporting & Downstream Enablement
- Prepare clean, analytics‑ready datasets for use in Power BI dashboards and business reporting.
- Collaborate with stakeholders to translate business requirements into technical data solutions.
- Ensure accurate, complete, and timely delivery of data to reporting teams.
Required Skills & Experience
- Programming & Data Engineering
- Strong hands‑on experience with Python (pandas, NumPy, SQLAlchemy, requests).
- Ability to work with and transform semi‑structured JSON/ES data.
- Experience integrating with REST APIs, ES endpoints, or similar data sources.
- SQL & Databases
- Advanced SQL proficiency across SQLite, Postgres, and Dremio querying.
- Understanding of dimensional modeling, normalization, and modeling nested/semi‑structured data.
- Experience with query tuning, indexing, and performance optimization.
- Tools & Pipelines
- Proficient in DBeaver (database connections, schema management).
- Experience building ETL/ELT pipelines with error handling, logging, and recoverability.
- Familiarity with dataset preparation for Power BI.
- Collaboration & Delivery
- Strong communication skills; ability to work closely with business stakeholders.
- Experience translating requirements into technical specifications and deliverables.
Preferred / Bonus Skills
- Experience with Elasticsearch, ES endpoints, scroll APIs, or schema‑on‑read engines (e.g., Dremio).
- Familiarity with Docker for reproducing local environments.
- Experience with schedulers such as Airflow, Prefect, or similar orchestration tools.
- Knowledge of performance profiling tools (EXPLAIN plans, indexing strategies, caching).