You will work closely with ML engineers, data scientists, and cloud architects to ensure that data is reliable, scalable, and production-ready enabling rapid experimentation and deployment of AI models across airport systems.
This is a hands-on, delivery-focused role suited for engineers who enjoy building robust data systems in a fast-moving environment.
Key Responsibilities
Data Architecture & Pipeline Development
- Design, develop, and maintain end-to-end data pipelines to support ML and AI workloads.
- Build real-time and batch data processing systems using Kafka, Kinesis, Spark, or similar technologies.
- Implement data quality, validation, and transformation workflows to ensure trustworthy and high-quality data for model training and analytics.
Infrastructure & Platform Engineering
- Develop and operate cloud-based data architectures on AWS and hybrid on-prem environments.
- Build and manage data warehouses and data lakes for efficient storage, retrieval, and sharing.
- Use Infrastructure as Code (IaC) tools such as Terraform, CloudFormation, or CDK to automate data environment provisioning.
- Containerize and orchestrate workloads using Docker and Kubernetes.
MLOps & Integration
- Partner with ML engineers to streamline feature pipelines, model training, and inference data flows.
- Support retrieval-augmented generation (RAG) and GenAI agent systems through optimized data access and embeddings infrastructure.
- Ensure seamless integration of data systems into CI/CD pipelines for continuous delivery and monitoring.
Security & Operations
- Implement and uphold data, access control, and privacy compliance standards.
- Monitor and optimize pipelines for cost efficiency, latency, and reliability.
- Collaborate with DevOps and IT teams to troubleshoot, deploy, and scale data services in production environments.
Innovation & Collaboration
- Work closely with cross-functional teams to translate business and research data needs into scalable technical solutions.
- Evaluate emerging data engineering technologies and architectures relevant to AI workloads.
- Contribute to internal documentation and knowledge sharing to support continuous improvement.
Qualifications
- Bachelors or Masters in Computer Science, Engineering, or related technical field.
- 3 - 5 years of experience in data engineering roles.
Proficiency in:
- Python and SQL
- Shell scripting
- Infrastructure as Code (Terraform, CloudFormation, CDK)
- Strong experience with cloud data platforms (AWS, GCP) and hybrid/on-prem environments.
- Solid understanding of big data ecosystems (e.g., Spark, Hadoop) and streaming technologies (Kafka, Kinesis).
- Familiarity with containerization (Docker) and orchestration (Kubernetes).
- Proven ability to design and deliver scalable, production-grade data systems.
Bonus Skills
- Exposure to MLOps, feature stores, or vector databases (e.g., FAISS, Pinecone, Weaviate).
- Experience supporting AI/GenAI model pipelines.
- Interest in data observability, metadata management, and responsible data practices.