Tiger Analytics is looking for a skilled and innovative Machine Learning Engineer with hands-on experience in Google Cloud Platform (GCP) and Vertex AI to design, build, and deploy scalable ML solutions. You will play a key role in operationalizing machine learning models and driving the end-to-end ML lifecycle, from data ingestion to model serving and monitoring.
Key Responsibilities:- Develop, train, and optimize ML models using Vertex AI, including Vertex Pipelines, AutoML, and custom model training.
- Design and build scalable ML pipelines for feature engineering, training, evaluation, and deployment.
- Deploy models to production using Vertex AI endpoints and integrate with downstream applications or APIs.
- Collaborate with data scientists, data engineers, and MLOps teams to enable reproducible and reliable ML workflows.
- Monitor model performance and set up alerting, retraining triggers, and drift detection mechanisms.
- Utilize GCP services such as BigQuery, Dataflow, Cloud Functions, Pub/Sub, and GCS in ML workflows.
- Apply CI/CD principles to ML models using Vertex AI Pipelines, Cloud Build, and GitOps practices.
- Implement model governance, versioning, explainability, and security best practices within Vertex AI.
- Document architecture decisions, workflows, and model lifecycle clearly for internal stakeholders.
Additional expertise required includes:
- Advanced Generative AI, including RAG with Graph-based hybrid retrieval and multimodal agents.
- Deep knowledge of ADK, Langchain Agentic Frameworks, fine-tuning, and distillation techniques.
Python expertise is essential, including:
- Strong OOP and functional programming skills.
- Proficiency with ML/DL libraries such as TensorFlow, PyTorch, scikit-learn, pandas, NumPy, PySpark.
- Experience with production-grade code, testing, and performance optimization.
GCP Cloud Architecture & Services proficiency includes:
- Vertex AI, BigQuery, Cloud Storage, Cloud Run, Cloud Functions, Pub/Sub, Dataproc, Dataflow.
- Understanding of IAM, VPC.
API Development & Integration skills include:
- Designing and building RESTful APIs using FastAPI or Flask.
- Integrating ML models into APIs for real-time inference.
- Implementing authentication, logging, and performance optimization.
System Design & Scalability experience involves:
- Designing end-to-end AI systems with scalability and fault tolerance.
- Developing distributed systems, microservices, and asynchronous processing.
This position offers an excellent opportunity for significant career development in a fast-growing and challenging entrepreneurial environment with a high degree of individual responsibility.