Enable job alerts via email!

LLM Data Engineer | United States | Fully Remote

Halo Media

Tallahassee (FL)

Remote

USD 90,000 - 150,000

Full time

7 days ago
Be an early applicant

Boost your interview chances

Create a job specific, tailored resume for higher success rate.

Job summary

An established industry player is on the lookout for a skilled AI/LLM Data Engineer to enhance their Generative AI platform. This role involves designing and maintaining complex data pipelines, optimizing workflows, and ensuring data quality. With a focus on Retrieval-Augmented Generation and knowledge-base techniques, you'll collaborate with various teams to integrate diverse data sources and implement advanced data processing methods. This position offers the chance to work in a fully remote environment, contributing to cutting-edge AI solutions while enjoying a comprehensive benefits package. If you're passionate about data engineering and AI, this is your opportunity to shine.

Benefits

US employee benefits package

Qualifications

  • 3-5 years of experience in data engineering, preferably in AI/ML.
  • Strong understanding of LLM architectures and data needs.

Responsibilities

  • Design and maintain end-to-end data pipelines for LLMs.
  • Integrate diverse data sources to support the Generative AI platform.

Skills

Python
Data Engineering
LLM Technologies
JSON
HTTP
Data Cleaning
Problem-Solving

Education

Master's degree in Computer Science
Master's degree in Data Science

Tools

Snowflake
Spark
Dask
AWS
GCP
Azure

Job description

LLM Data Engineer | United States | Fully Remote

Join to apply for the LLM Data Engineer | United States | Fully Remote role at Halo Media

Job Overview

We are seeking an experienced AI/LLM Data Engineer to build and maintain the data pipeline for our Generative AI platform. The ideal candidate will be well-versed in Large Language Model (LLM) technologies and have a strong background in data engineering, focusing on Retrieval-Augmented Generation (RAG) and knowledge-base techniques. This role is part of the AI COE within DX Tech & Digital and reports to the Director, AI Solutions & Development.

Responsibilities
  • Design, implement, and maintain end-to-end multi-stage data pipelines for LLMs, including SFT and RLHF data processes.
  • Identify, evaluate, and integrate diverse data sources to support the Generative AI platform.
  • Develop and optimize workflows for chunking, indexing, ingestion, and vectorization of data.
  • Benchmark and implement vector stores, embedding techniques, and retrieval methods.
  • Create flexible pipelines supporting multiple embedding algorithms and search types.
  • Implement auto-tagging systems and data preparation processes for LLMs.
  • Develop tools for crawling, cleaning, and refining text and image data.
  • Collaborate with teams to ensure data quality and relevance.
  • Work with data lake architectures to optimize storage and processing.
  • Integrate and optimize workflows using Snowflake and vector store technologies.
Minimum Requirements
  • Master's degree in Computer Science, Data Science, or related field.
  • 3-5 years of experience in data engineering, preferably in AI/ML.
  • Proficiency in Python, JSON, HTTP, and related tools.
  • Strong understanding of LLM architectures and data needs.
  • Experience with RAG systems, knowledge bases, and vector databases.
  • Familiarity with embedding techniques and information retrieval.
  • Experience with data cleaning, tagging, and annotation.
  • Knowledge of data crawling and ethical considerations.
  • Strong problem-solving skills and ability to work in fast-paced environments.
  • Experience with Snowflake integration in AI/ML pipelines.
  • Experience with vector store technologies and data lakehouse architectures.
Preferred Skills
  • Experience with LLM/RAG frameworks.
  • Knowledge of distributed computing platforms (e.g., Spark, Dask).
  • Familiarity with data versioning and experiment tracking tools.
  • Experience with cloud platforms (AWS, GCP, Azure).
  • Understanding of data privacy and security.
  • Hands-on experience with lakehouse solutions.
  • Proficiency in query optimization in Snowflake or Databricks.
  • Experience with vector store technologies.
Benefits
  • US employee benefits package.
Additional Details
  • Seniority level: Mid-Senior level.
  • Employment type: Full-time.
  • Industry: IT Services and IT Consulting.
Get your free, confidential resume review.
or drag and drop a PDF, DOC, DOCX, ODT, or PAGES file up to 5MB.

Similar jobs

LLM Data Engineer | United States | Fully Remote

Halo Media

Orlando

Remote

USD 90,000 - 130,000

Today
Be an early applicant