¡Activa las notificaciones laborales por email!

Ai Integration Engineer (Talent Pool)

Halo Media

Xico

Presencial

MXN 400,000 - 600,000

Jornada completa

Ayer

Sé de los primeros/as/es en solicitar esta vacante

Genera un currículum adaptado en cuestión de minutos

Consigue la entrevista y gana más. Más información

Descripción de la vacante

A technology company in Veracruz, Mexico is seeking an AI Integration Engineer (Talent Pool) to build and maintain data pipelines for both structured and unstructured data, enabling AI/ML models. The ideal candidate will have a Bachelor's degree in Computer Science or Engineering and at least 5 years of experience in scalable data solutions. Responsibilities include designing data integrations, ensuring low latency of AI systems, and supporting various LLM providers. The company offers competitive compensation and a dynamic work environment.

Formación

5+ years of experience in scalable data solutions for analytics.
Proficiency in Python, Java, or R for data processing.
Expert SQL skills in cloud database environments.

Responsabilidades

Design and implement data integrations and ingestion processes.
Build and maintain scalable data pipelines for unstructured data.
Ensure reliability, scalability, and low latency of AI systems.

Conocimientos

Python

Java

SQL

Apache Airflow

Hadoop

Spark

Data modeling

ETL

Data governance

Educación

Bachelor's degree in Computer Science or Engineering

Herramientas

Snowflake

Redshift

Databricks

Docker

Kubernetes

Git

AI Integration Engineer (Talent Pool) Join to apply for the AI Integration Engineer (Talent Pool) role at Halo Media.

We are seeking an exceptional AI Integration Engineer who operates at the intersection of development, operations, data, and systems engineering to build solutions for large-scale continuous data transformation and delivery.

This role focuses specifically on building and maintaining data pipelines for both structured and unstructured data, enabling the development and deployment of AI / ML models that power our RAG-based document processing and insight generation systems.

Key Responsibilities

Design and implement data integrations and ingestion processes for internal and external data sources.
Build and maintain scalable data pipelines for ingesting, processing, and transforming unstructured data sources (customer feedback, documents, multimedia content).
Develop data models and mapping rules to transform raw data into actionable insights and structured outputs.
Architect and implement semantic layers that integrate analytics data from multiple sources efficiently.
Develop and maintain robust backend APIs and services supporting the entire prompt-to-answer workflow.
Implement and optimize retrieval logic including vector search, hybrid search, and advanced information retrieval techniques.
Manage document ingestion pipelines including parsing, OCR, chunking, and embedding generation.
Support integration of various LLM providers (OpenAI, Azure AI, Anthropic) with internal business data sources.
Ensure reliability, scalability, and low latency of AI response generation systems.
Implement data governance policies and procedures for responsible and ethical use of data in AI applications.
Develop data quality monitoring and validation processes specifically for AI / ML datasets, including bias identification and mitigation.
Build and maintain monitoring, alerting, and observability systems for AI infrastructure.
Collaborate with analytics and data science teams to understand requirements and deliver solutions.
Work with data scientists to ensure data is available in appropriate format and quality for model training and deployment.
Maintain comprehensive documentation including data models, mapping rules, and data dictionaries.
Partner with internal business stakeholders, technology resources, and external vendors.

Qualifications

Bachelor's degree in Computer Science, Engineering, or equivalent work experience.
5+ years of experience in designing, building, and maintaining scalable data solutions for large-scale analytics.
Proven ability to lead development projects from start to finish with demonstrated results.
Proficiency in Python, Java, or R and open-source frameworks for distributed processing (Hadoop, Spark).
Expert-level SQL and development experience with cloud database environments (Snowflake, Redshift, Databricks).
Hands-on experience with modern cloud data stack tools for code management, versioning (Git), CI/CD, and automation.
Experience with orchestration tools (Apache Airflow) and monitoring & alerting systems.
Strong understanding of data modeling, data warehousing, and ETL concepts.
Experience with vector databases (Pinecone, Milvus, Weaviate, Chroma).
Proficiency in handling unstructured data formats (JSON, Parquet, text, images, audio, video).
Familiarity with AI/ML model development lifecycle and data requirements for training and deployment.
Experience with cloud-based AI/ML platforms and services.
Knowledge of data augmentation techniques for improving AI/ML model performance.
Experience with data labeling platforms (Amazon SageMaker Ground Truth, Labelbox).
Understanding of responsible AI principles and data privacy regulations (GDPR, CCPA).
Experience with data governance and observability tools (Datahub, Collibra).
Basic frontend development experience (HTML, CSS, JavaScript).

Tools & Technologies: Programming & Frameworks Python, Java, R Apache Spark, Apache Hadoop FastAPI, Django, Flask Data & AI Platforms Snowflake, Redshift, Databricks Pinecone, Milvus, LlamaIndex, Chroma LangChain, LlamaIndex OpenAI, Azure AI, Anthropic, Cohere Cloud & Infrastructure AWS, Azure, Google Cloud Platform Docker, Kubernetes Apache Airflow, Apache Kafka Development Tools Git, GitHub, GitLab Jenkins, GitHub Actions Jupyter Notebooks, Dataiku #J Ljbffr

Consigue la evaluación confidencial y gratuita de tu currículum.

o arrastra un archivo en formato PDF, DOC, DOCX, ODT o PAGES de hasta 5 MB.

Ciudades destacadas

Empresas destacadas

Vacantes populares