Enable job alerts via email!
Boost your interview chances
Create a job specific, tailored resume for higher success rate.
A fast-growing LegalTech company is seeking a mid-level Data Engineer to enhance their GenAI capabilities. The role involves building scalable data pipelines, managing vector databases, and collaborating with cross-functional teams to support legal industry applications. Candidates with 3-6 years of experience in data engineering and a strong proficiency in SQL and Python are preferred. This position is fully remote, offering flexible hours.
Location: Fully Remote (U.S.-based, New England preferred)
Industry: LegalTech / SaaS
Level: Mid-Level (3–6 years experience)
About Us
We’re a fast-growing LegalTech and analytics SaaS company used by the nation’s top law firms, law schools, and legal recruiters to track and analyze legal talent flows. Our platform helps drive smarter hiring, retention, and market insights in the legal industry using modern data infrastructure and emerging AI capabilities.
As we expand our product offering with more GenAI features, we’re looking for a mid-level Data Engineer with hands-on experience in LLMs, RAG architecture, and vector databases to help us scale.
What You’ll DoBuild, optimize, and maintain scalable data pipelines and ETL processes
Implement RAG-based GenAI workflows using tools like LangChain and OpenAI
Integrate and manage vector databases (e.g., Pinecone, Weaviate, FAISS)
Work with structured and unstructured data to support analytics and AI-driven search
Collaborate cross-functionally with backend engineers, product, and data science
Support legal industry-specific use cases like entity resolution, summarization, and document classification
3–6 years of experience in data engineering or data infrastructure roles
Proficiency in SQL and Python (especially for AI/data pipelines)
Hands-on experience with AWS services (e.g., S3, Lambda, RDS, ECS)
Experience with LLM tools and frameworks (e.g., LangChain, LlamaIndex, OpenAI APIs)
Comfortable working with vector databases and retrieval-based AI
Strong understanding of scalable data architecture and data normalization
Experience with RAG pipelines in production
Familiarity with legal data or professional services industries
Exposure to orchestration tools like Airflow, Prefect, or similar
Experience with embeddings, chunking strategies, and prompt engineering
Make an impact on the future of LegalTech and AI-powered analytics
Collaborate with a small, passionate, high-performance team
Own meaningful projects that ship fast and evolve quickly
100% remote with flexible hours (New England-based candidates preferred)