Enable job alerts via email!

LLM Data Engineer | United States | Fully Remote

Halo Media

Orlando (FL)

Remote

USD 90,000 - 130,000

Full time

Yesterday

Be an early applicant

Boost your interview chances

Create a job specific, tailored resume for higher success rate.

Job summary

Join a leading company as an AI/LLM Data Engineer to develop and maintain data pipelines for their Generative AI platform. You will work on innovative AI solutions, collaborating with cross-functional teams to optimize data storage and processing. This role requires expertise in LLM technologies, data engineering, and a passion for ethical AI development.

Benefits

US employees benefit package

Qualifications

3-5 years of experience in data engineering, preferably in AI/ML.
Strong understanding of LLM architectures and data needs.

Responsibilities

Design and maintain multi-stage data pipelines for LLMs.
Collaborate with teams to ensure data quality and relevance.

Skills

Python

Problem Solving

Communication

Education

Master's degree in Computer Science

Tools

Snowflake

LangChain

LlamaIndex

Semantic Kernel

OpenAI functions

LLM Data Engineer | United States | Fully Remote

Join us at Halo Media as an AI/LLM Data Engineer to develop and maintain data pipelines for our Generative AI platform. The ideal candidate will be experienced in Large Language Model (LLM) technologies, data engineering, and techniques like Retrieval-Augmented Generation (RAG) and knowledge-base integration. This role reports to the Director of AI Solutions & Development within the AI COE, working on strategic projects with cross-functional teams to deliver innovative AI solutions.

Responsibilities

Design, implement, and maintain multi-stage data pipelines for LLMs, including SFT and RLHF data processes
Identify, evaluate, and integrate diverse data sources for the Generative AI platform
Develop workflows for chunking, indexing, ingestion, and vectorization of data
Benchmark and implement vector stores, embedding techniques, and retrieval methods
Create flexible pipelines supporting multiple embedding algorithms and search types
Implement auto-tagging systems and data preparation for LLMs
Develop tools for crawling, cleaning, and refining text and image data
Collaborate with teams to ensure data quality and relevance
Optimize data storage and processing using data lakehouse architectures
Integrate workflows with Snowflake and vector store technologies

Requirements

Master's degree in Computer Science, Data Science, or related field
3-5 years of experience in data engineering, preferably in AI/ML
Proficiency in Python, JSON, HTTP, and related tools
Strong understanding of LLM architectures and data needs
Experience with RAG systems, knowledge bases, and vector databases
Knowledge of embedding techniques, similarity search, and information retrieval
Experience with data cleaning, tagging, and annotation processes
Familiarity with data crawling and ethical considerations
Strong problem-solving skills and ability to work in fast-paced environments
Experience with Snowflake, vector store technologies, and data lakehouse architectures
Excellent communication and collaboration skills
Passion for innovative and ethical AI development
Experience with frameworks like LangChain, LlamaIndex, Semantic Kernel, OpenAI functions
Knowledge of LLM parameters and outcome evaluation metrics

Preferred Skills

Experience with LLM/RAG frameworks
Knowledge of distributed computing platforms (e.g., Spark, Dask)
Experience with data versioning and experiment tracking tools
Cloud platform experience (AWS, GCP, Azure)
Understanding of data privacy and security
Hands-on with data lakehouse solutions
Proficiency in query optimization in Snowflake or Databricks
Experience with vector store technologies

Benefits

US employees benefit package

Seniority level

Mid-Senior level

Employment type

Full-time

Industries

IT Services and IT Consulting

Get your free, confidential resume review.

or drag and drop a PDF, DOC, DOCX, ODT, or PAGES file up to 5MB.

Similar jobs