Enable job alerts via email!

LLM Data Engineer | United States | Fully Remote

Halo Media

Tallahassee (FL)

Remote

USD 90,000 - 150,000

Full time

7 days ago

Be an early applicant

Boost your interview chances

Create a job specific, tailored resume for higher success rate.

Job summary

An established industry player is on the lookout for a skilled AI/LLM Data Engineer to enhance their Generative AI platform. This role involves designing and maintaining complex data pipelines, optimizing workflows, and ensuring data quality. With a focus on Retrieval-Augmented Generation and knowledge-base techniques, you'll collaborate with various teams to integrate diverse data sources and implement advanced data processing methods. This position offers the chance to work in a fully remote environment, contributing to cutting-edge AI solutions while enjoying a comprehensive benefits package. If you're passionate about data engineering and AI, this is your opportunity to shine.

Benefits

US employee benefits package

Qualifications

3-5 years of experience in data engineering, preferably in AI/ML.
Strong understanding of LLM architectures and data needs.

Responsibilities

Design and maintain end-to-end data pipelines for LLMs.
Integrate diverse data sources to support the Generative AI platform.

Skills

Python

Data Engineering

LLM Technologies

JSON

HTTP

Data Cleaning

Problem-Solving

Education

Master's degree in Computer Science

Master's degree in Data Science

Tools

Snowflake

Spark

Dask

AWS

GCP

Azure

LLM Data Engineer | United States | Fully Remote

Join to apply for the LLM Data Engineer | United States | Fully Remote role at Halo Media

Job Overview

We are seeking an experienced AI/LLM Data Engineer to build and maintain the data pipeline for our Generative AI platform. The ideal candidate will be well-versed in Large Language Model (LLM) technologies and have a strong background in data engineering, focusing on Retrieval-Augmented Generation (RAG) and knowledge-base techniques. This role is part of the AI COE within DX Tech & Digital and reports to the Director, AI Solutions & Development.

Responsibilities

Design, implement, and maintain end-to-end multi-stage data pipelines for LLMs, including SFT and RLHF data processes.
Identify, evaluate, and integrate diverse data sources to support the Generative AI platform.
Develop and optimize workflows for chunking, indexing, ingestion, and vectorization of data.
Benchmark and implement vector stores, embedding techniques, and retrieval methods.
Create flexible pipelines supporting multiple embedding algorithms and search types.
Implement auto-tagging systems and data preparation processes for LLMs.
Develop tools for crawling, cleaning, and refining text and image data.
Collaborate with teams to ensure data quality and relevance.
Work with data lake architectures to optimize storage and processing.
Integrate and optimize workflows using Snowflake and vector store technologies.

Minimum Requirements

Master's degree in Computer Science, Data Science, or related field.
3-5 years of experience in data engineering, preferably in AI/ML.
Proficiency in Python, JSON, HTTP, and related tools.
Strong understanding of LLM architectures and data needs.
Experience with RAG systems, knowledge bases, and vector databases.
Familiarity with embedding techniques and information retrieval.
Experience with data cleaning, tagging, and annotation.
Knowledge of data crawling and ethical considerations.
Strong problem-solving skills and ability to work in fast-paced environments.
Experience with Snowflake integration in AI/ML pipelines.
Experience with vector store technologies and data lakehouse architectures.

Preferred Skills

Experience with LLM/RAG frameworks.
Knowledge of distributed computing platforms (e.g., Spark, Dask).
Familiarity with data versioning and experiment tracking tools.
Experience with cloud platforms (AWS, GCP, Azure).
Understanding of data privacy and security.
Hands-on experience with lakehouse solutions.
Proficiency in query optimization in Snowflake or Databricks.
Experience with vector store technologies.

Benefits

US employee benefits package.

Additional Details

Seniority level: Mid-Senior level.
Employment type: Full-time.
Industry: IT Services and IT Consulting.

Get your free, confidential resume review.

or drag and drop a PDF, DOC, DOCX, ODT, or PAGES file up to 5MB.

Similar jobs