Get AI-powered advice on this job and more exclusive features.
The AI revolution is here, transforming the workplace at an unprecedented pace. Routine tasks are being automated, driving efficiency but also reshaping the job market. This means new opportunities are emerging and traditional roles are evolving. Ignis AI is building a next generation of Talent Acquisition and Talent Management systems that empower individuals and businesses to thrive in this changing landscape. Our goal is to enable individuals to advance their careers and to allow organizations to adapt seamlessly to the evolving workforce demands. By leveling the workforce field, we help everyone to unlock their potential and achieve success. At Ignis AI, we embrace skills-based hiring, including skills such as creativity, communication and collaboration.
We’re building a real-time, intelligent platform powered by machine learning and large language models (LLMs). Our foundation is a robust data architecture that supports everything from analytics to LLM-driven applications, and we’re looking for a Principal Data Architect to lead that foundation. This role combines classic data engineering excellence with next-generation challenges around LLM readiness, data pipelines for embeddings, and retrieval-augmented generation (RAG) systems. As our Principal Data Architect, you’ll play a foundational role in designing the infrastructure, workflows, and data culture that powers our entire product ecosystem.
This is a remote role based in the United States.
Ideal candidates are based in the East Coast Area.
Occasional travel is anticipated.
Job Responsibilities:
- Architect and evolve our data platform to support structured, semi-structured, and unstructured data pipelines across real-time and batch workloads.
- Build and optimize pipelines that serve LLM fine-tuning, inference, and retrieval workflows, including preprocessing text, generating embeddings, and chunking documents for context injection.
- Collaborate with ML engineers to operationalize RAG pipelines, feature stores, and model inputs from production data streams.
- Own and define data contracts, schemas, lineage, and quality enforcement across the platform.
- Own data infrastructure end-to-end — from ingestion and transformation to cataloging, versioning, and quality enforcement.
- Design and implement streaming ingestion pipelines with Kafka or Redis for low-latency use cases.
- Implement and manage vector search infrastructure (e.g., Weaviate, Pinecone, FAISS) to support LLM-enhanced retrieval systems.
- Work cross-functionally to productionize data-driven features, signals, and metrics that power both analytics and intelligent experiences.
- Contribute to data governance, cataloging, access control, and observability across the ecosystem.
- Evaluate and integrate best-in-class tools for embedding generation, document store maintenance, and metadata tracking.
- Define and evolve our data lakehouse architecture, balancing batch, real-time, and streaming needs.
- Collaborate with the DevOps/MLOps engineer to build reliable, production-ready ML data pipelines that integrate into our broader platform.
- Define modeling standards and collaborate closely with Data Science and Product to ensure quality, performance, and usability.
- Evaluate and introduce technologies and frameworks that improve scale, efficiency, and maintainability.
- Languages: Python, SQL, Bash
- Orchestration: Airflow, Dagster, Prefect
- Data Governance & Quality: Great Expectations, OpenMetadata, DataHub
- LLM Tools: LangChain, Haystack, Hugging Face, OpenAI, Cohere
Must-Have Experience
- 7+ years of experience in data engineering, platform engineering, or data architecture.
- Proven experience designing and implementing enterprise-grade or high-scale data platforms.
- Deep fluency in data modeling, data warehousing, and data pipeline orchestration.
- Strong command of streaming systems and event-driven data architectures.
- A demonstrated ability to scale systems, debug complex data issues, and enforce best practices across a team.
- Experience designing scalable data pipelines using orchestration tools and cloud-native data platforms.
- Proven ability to build low-latency, real-time and batch ETL/ELT workflows.
- Comfort working with unstructured data, including text corpora and document metadata.
- Exposure to LLM-adjacent workflows, including fine-tuning, embedding generation, vector similarity search, or context-based retrieval.
- Understanding of how to prepare and optimize data for tokenization, chunking, semantic search, and contextual augmentation.
Preferred Experience
- Experience operationalizing machine learning pipelines and managing feature engineering workflows.
- Familiarity with data privacy, regulatory compliance, or PII governance frameworks.
- Exposure to domain-driven design, data mesh, or data product thinking.
- Familiarity with data-for-AI patterns, including training set curation, labeling workflows, and long-document management.
- Experience with prompt engineering, RAG architectures, or semantic indexing.
- Prior experience building data products for ML- and LLM-enabled applications in fast-moving startup environments.
How You Work:
- You think strategically and architect for the future — but you can deliver incrementally.
- You value pragmatism: you choose the right level of abstraction, not the most complex.
- You love enabling other teams — data as a product is how you think.
- You’re a strong communicator and collaborator across engineering, product, and data science.
- You take pride in building high-trust systems that are observable, resilient, and explainable.
- You think in systems and understand how data flows power downstream AI systems, not just dashboards.
- You are excited about LLMs, but more excited about making them usable, reliable, and cost-effective in production.
- You thrive in fast-paced, collaborative environments and are not afraid to define architecture from the ground up.
As part of our skills-based selection process, candidates may be asked to complete online assessments to help us better understand their fit for the role.
- Opportunity to lead and shape the product strategy of a forward-thinking company.
- Collaborative and inclusive work environment.
- Competitive compensation package and benefits.
- Compensation: $170,000 - $185,000. Negotiable based on education, experience, and skills.
- Benefits include paid time off and 401(k) plans. medical, dental, and vision insurance are available after an introductory period.
If you have a passion for helping to bring new products to market and a desire to make a real impact on the future of work, we encourage you to apply!
Seniority level
Seniority level
Mid-Senior level
Employment type
Job function
Job function
Engineering and Information TechnologyIndustries
Software Development
Referrals increase your chances of interviewing at Ignis AI by 2x
Inferred from the description for this job
Vision insurance
401(k)
Medical insurance
Get notified about new Data Architect jobs in Greater Boston.
Contact Center Genesys Cloud Architect (Remote Opportunity)
Consulting Field Solutions Architect - Unstructured Data
Senior Manager, Clinical Data Visualization Engineer
Boston, MA $137,000.00-$215,270.00 2 days ago
Senior Architect - Commercial Life Sciences (Remote)
Senior Architect - Commercial Life Sciences (Remote)
Boston, MA $100,000.00-$200,000.00 6 days ago
Associate Director, Media Analytics (Data Scientist)
Boston, MA $110,500.00-$165,800.00 3 days ago
Boston, MA $180,000.00-$215,000.00 2 months ago
Boston, MA $200,000.00-$228,000.00 2 months ago
We’re unlocking community knowledge in a new way. Experts add insights directly into each article, started with the help of AI.