Responsibilities:
- Conduct exploratory data analysis (EDA) to uncover patterns, anomalies, and trends in aircraft operational and maintenance data.
- Design and maintain robust ETL pipelines that ensure data freshness, completeness, and integrity across multiple aviation data sources.
- Build end-to-end data products from development through deployment, including monitoring and ensuring system reliability in production.
- Develop and implement machine learning models (regression, classification, clustering) to predict maintenance needs and optimize operations.
- Leverage Large Language Models (LLMs) to enhance data products through automated feature extraction, data enrichment, and intelligent information retrieval and decision making.
- Create scalable data solutions that can handle real-time aircraft sensor data and maintenance logs.
- Deploy models and data pipelines to cloud infrastructure (GCP), implementing proper monitoring, alerting, and retraining workflows.
- Build intelligent automation systems that can process unstructured maintenance reports and technical documentation.
- Ensure data product reliability through robust error handling, logging, and performance optimization.
- Create clear data visualizations and communicate insights to technical teams, engineers, and management stakeholders.
- Collaborate with aviation engineers and operations teams to understand requirements and translate them into scalable analytical solutions.
- Maintain production systems including troubleshooting issues, optimizing performance, and ensuring uptime.
- Participate in code reviews and maintain comprehensive documentation for reproducibility and compliance.
Experience and Qualifications:
Required:
- BS/MS in Computer Science, Statistics, Mathematics, Engineering, or related field.
- 2-4 years of hands-on experience in data science and/or data engineering.
- Strong proficiency in Python and SQL, with experience building production-grade code. (We embrace AI-assisted coding tools to enhance productivity, but believe they work best in the hands of engineers who deeply understand code structure, debugging, and system design.)
- Solid foundation in statistics and machine learning concepts with practical implementation experience.
- Experience building and deploying data pipelines and ML models to production environments.
- Hands-on experience with cloud platforms (preferably GCP) including compute, storage, and ML services.
- Experience with software engineering practices: version control (Git), CI/CD, testing, and monitoring.
- Strong problem-solving skills with the ability to work independently on end-to-end solutions.
- Excellent communication skills to collaborate with technical and non-technical stakeholders.
Preferred:
- Experience with distributed computing and handling large-scale datasets
- Familiarity with NoSQL or Graph databases
- Experience in aviation, manufacturing, or other industrial domain
- Mandarin speaker will have an added advantage
Areas of Specialization
We work across these domains and welcome candidates with experience or strong interest in one or more of these areas:
Data Engineering for ML
- Building scalable data pipelines. (Airflow)
- Working with streaming data and real-time processing. (Dataflow, Pub/Sub)
- Model serving and API development. (FastAPI)
- Data quality and monitoring frameworks.
Machine Learning
- Classical ML algorithms and frameworks. (scikit-learn, TensorFlow, PyTorch)
- Model evaluation, feature engineering, and deployment.
- Cloud-based ML services. (especially GCP)
Modern AI & LLM Applications
- Building applications with Large Language Models.
- RAG systems and vector databases.
- Prompt engineering, auto prompt optimizers (like DSPy) and LLM evaluation.
- Experience with frameworks like Agents SDK or similar.
Operations Research
- Optimization problems in scheduling and resource allocation.
- Experience with optimization tools (OR-Tools, Gurobi).
- Applied problem-solving in operational contexts.