Enable job alerts via email!

LLM Data Engineer | United States | Fully Remote

Halo Media

Orlando (FL)

Remote

USD 90,000 - 130,000

Full time

Yesterday
Be an early applicant

Boost your interview chances

Create a job specific, tailored resume for higher success rate.

Job summary

Join a leading company as an AI/LLM Data Engineer to develop and maintain data pipelines for their Generative AI platform. You will work on innovative AI solutions, collaborating with cross-functional teams to optimize data storage and processing. This role requires expertise in LLM technologies, data engineering, and a passion for ethical AI development.

Benefits

US employees benefit package

Qualifications

  • 3-5 years of experience in data engineering, preferably in AI/ML.
  • Strong understanding of LLM architectures and data needs.

Responsibilities

  • Design and maintain multi-stage data pipelines for LLMs.
  • Collaborate with teams to ensure data quality and relevance.

Skills

Python
Problem Solving
Communication

Education

Master's degree in Computer Science

Tools

Snowflake
LangChain
LlamaIndex
Semantic Kernel
OpenAI functions

Job description

LLM Data Engineer | United States | Fully Remote

Join us at Halo Media as an AI/LLM Data Engineer to develop and maintain data pipelines for our Generative AI platform. The ideal candidate will be experienced in Large Language Model (LLM) technologies, data engineering, and techniques like Retrieval-Augmented Generation (RAG) and knowledge-base integration. This role reports to the Director of AI Solutions & Development within the AI COE, working on strategic projects with cross-functional teams to deliver innovative AI solutions.

Responsibilities
  • Design, implement, and maintain multi-stage data pipelines for LLMs, including SFT and RLHF data processes
  • Identify, evaluate, and integrate diverse data sources for the Generative AI platform
  • Develop workflows for chunking, indexing, ingestion, and vectorization of data
  • Benchmark and implement vector stores, embedding techniques, and retrieval methods
  • Create flexible pipelines supporting multiple embedding algorithms and search types
  • Implement auto-tagging systems and data preparation for LLMs
  • Develop tools for crawling, cleaning, and refining text and image data
  • Collaborate with teams to ensure data quality and relevance
  • Optimize data storage and processing using data lakehouse architectures
  • Integrate workflows with Snowflake and vector store technologies
Requirements
  • Master's degree in Computer Science, Data Science, or related field
  • 3-5 years of experience in data engineering, preferably in AI/ML
  • Proficiency in Python, JSON, HTTP, and related tools
  • Strong understanding of LLM architectures and data needs
  • Experience with RAG systems, knowledge bases, and vector databases
  • Knowledge of embedding techniques, similarity search, and information retrieval
  • Experience with data cleaning, tagging, and annotation processes
  • Familiarity with data crawling and ethical considerations
  • Strong problem-solving skills and ability to work in fast-paced environments
  • Experience with Snowflake, vector store technologies, and data lakehouse architectures
  • Excellent communication and collaboration skills
  • Passion for innovative and ethical AI development
  • Experience with frameworks like LangChain, LlamaIndex, Semantic Kernel, OpenAI functions
  • Knowledge of LLM parameters and outcome evaluation metrics
Preferred Skills
  • Experience with LLM/RAG frameworks
  • Knowledge of distributed computing platforms (e.g., Spark, Dask)
  • Experience with data versioning and experiment tracking tools
  • Cloud platform experience (AWS, GCP, Azure)
  • Understanding of data privacy and security
  • Hands-on with data lakehouse solutions
  • Proficiency in query optimization in Snowflake or Databricks
  • Experience with vector store technologies
Benefits
  • US employees benefit package
Seniority level
  • Mid-Senior level
Employment type
  • Full-time
Industries
  • IT Services and IT Consulting
Get your free, confidential resume review.
or drag and drop a PDF, DOC, DOCX, ODT, or PAGES file up to 5MB.

Similar jobs

LLM Data Engineer | United States | Fully Remote

Halo Media

Tallahassee

Remote

USD 90,000 - 150,000

6 days ago
Be an early applicant