Job Search and Career Advice Platform

Enable job alerts via email!

Senior Data Engineer – Knowledge Graph & AI Platform

New Era Solutions

Hyderabad

Hybrid

INR 15,00,000 - 25,00,000

Full time

Yesterday
Be an early applicant

Generate a tailored resume in minutes

Land an interview and earn more. Learn more

Job summary

A leading technology firm in India is looking for a Senior Data Engineer to build and maintain core data infrastructure for its enterprise AI platform. This role involves designing scalable data pipelines, developing knowledge graphs, and preparing both structured and unstructured data for AI applications. Applicants should have a strong background in Python, MongoDB, and data engineering practices. The position offers competitive compensation and flexible remote/hybrid work arrangements.

Benefits

Competitive compensation with equity options
Flexible remote/hybrid work setup
Learning budget and conference support

Qualifications

  • 5+ years of Data Engineering experience with production-grade pipelines.
  • Strong Python skills with clean, testable, maintainable code.
  • MongoDB expertise including schema design and performance tuning.

Responsibilities

  • Build and maintain the core data infrastructure for an enterprise AI platform.
  • Design scalable data pipelines and develop knowledge graphs.
  • Prepare structured and unstructured data for AI and LLM-based applications.

Skills

Data Engineering experience
Python
MongoDB
Vector databases
Document processing
SQL
ETL/ELT at scale
Pipeline orchestration tools
Job description
Senior Data Engineer – Knowledge Graph & AI Platform

Location: Remote / Hybrid (India)

Employment Type: Full-Time

Reporting To: Platform Architect

Role Overview

The Senior Data Engineer will build and maintain the core data infrastructure for an enterprise AI platform. This role focuses on designing scalable data pipelines, developing knowledge graphs, and preparing structured and unstructured data for AI and LLM-based applications.

Roles & Responsibilities
Data Pipeline Development
  • Design and build scalable data ingestion pipelines from enterprise systems (ERP, documentation tools, version control, and project management tools)
  • Develop connectors for structured, semi-structured, and unstructured data
  • Implement incremental data loads, change data capture (CDC), and real-time sync
  • Ensure data quality through validation, deduplication, and lineage tracking
Knowledge Graph Engineering
  • Design ontologies and graph schemas for complex enterprise relationships
  • Implement entity resolution and relationship inference across data sources
  • Build APIs and query interfaces for graph traversal
  • Optimize graph storage and query performance for large-scale usage
Enterprise Data Integration
  • Extract and model enterprise metadata such as business rules and data dictionaries
  • Parse and semantically index documents and code artifacts
  • Build integrations with enterprise APIs and internal platforms
AI & LLM Data Infrastructure
  • Prepare structured and contextual data for LLM consumption
  • Design embedding strategies and manage vector databases for semantic search
  • Build memory and context management systems for stateful AI applications
Required Skills
Core Requirements
  • 5+ years of Data Engineering experience with production-grade pipelines
  • Strong Python skills (clean, testable, maintainable code)
  • MongoDB expertise (schema design, aggregation pipelines, indexing, performance tuning)
  • Vector databases experience (Qdrant, Pinecone, Weaviate, pgvector)
  • Document processing experience (chunking, metadata extraction, PDFs/Word/HTML; LangChain or similar)
  • Strong SQL skills (complex queries, joins, window functions, optimization)
  • ETL/ELT at scale (incremental loads, CDC, idempotent pipelines)
  • Pipeline orchestration tools (Airflow, Dagster, Prefect, or similar)
Good to Have / Strong Plus
  • Experience building production RAG pipelines
  • Deep understanding of embedding models and dimensionality
  • Graph databases (Neo4j) and Cypher query expertise
  • LLM application development using LangChain or Lang Graph
  • Streaming systems (Kafka, Flink) for real-time pipelines
  • Hybrid search (vector + keyword/metadata filtering)
  • Apache Spark for large-scale transformations
What We Offer
  • Work on cutting-edge AI and knowledge graph technologies
  • Build foundational infrastructure for an enterprise AI platform
  • Competitive compensation with equity options
  • Flexible remote/hybrid work setup
  • Learning budget and conference support
Get your free, confidential resume review.
or drag and drop a PDF, DOC, DOCX, ODT, or PAGES file up to 5MB.