Location:Jersey City/ Remote
Experience:6+ years
Employment Type:Contract
Position Overview
We are seeking an experiencedAI/ML Data Scientistto architect, deploy, and maintain advanced machine learning systems at scale. The ideal candidate is highly proficient in distributed training, MLOps, and ML model serving infrastructure, and will play a key role in shaping and scaling enterprise AI platforms.
This is a hands-on engineering role focused on deep learning, containerized ML deployment, and real-time inference systems in a cloud-based environment.
Key Responsibilities
- Architect and implementdistributed training strategiesusing frameworks likeHorovodandDeepSpeed.
- Design and deploy containerized ML models viaDocker,Kubernetes, and modern serving platforms likeTensorFlow Serving,TorchServe, andSeldon Core.
- Develop robustmodel monitoring,drift detection, and logging systems to ensure reliability in production.
- LeadCI/CD pipeline developmentand enforceMLOps best practicesfor rapid and safe model iteration.
- Optimize inference pipelines forlow-latency model servingacross real-time and batch workloads.
- Integrate models with distributed data storage platforms andvector databasesto support scalable feature retrieval.
- Troubleshoot and debugcomplex distributed AI/ML systemsfor performance and scalability.
- Collaborate across data engineering, DevOps, and product teams to deliver business-ready AI solutions.
Must-Have Qualifications
- 6–10 years of experience in AI/ML, with deep expertise in training and deploying ML models in production.
- Strong Python skills with experience inNumPy,SciPy, and machine learning libraries.
- Proficient indeep learning frameworks:TensorFlow,PyTorch, and associated ecosystems.
- Experience withdata orchestration toolssuch asAirflow,Kubeflow, or similar.
- Hands-on knowledge offeature engineering platforms(e.g.,Feast,Tecton).
- Solid background indistributed computingusingApache Spark,Dask, or similar platforms.
- Strong experience withcontainerizationandorchestrationvia Docker and Kubernetes.
- Familiarity withmodel serving frameworks:TF Serving,TorchServe, orSeldon Core.
- Proven ability to implementmodel monitoringandconcept drift detectionpipelines.
- In-depth understanding ofdata formats and serialization:Parquet,Avro,Protocol Buffers.
Preferred Qualifications
- Prior experience deployingAI/ML platforms in insurance, finance, or healthcareenvironments.
- Knowledge of vector search andreal-time feature storesfor high-performance retrieval.
- Exposure tomulti-cloud or hybrid cloud ML infrastructure.
- Contributions to open-source AI/ML tooling is a plus.
Application Process:
Submit yourupdated resumeand abrief portfolioor summary highlighting your AI/ML deployments, infrastructure contributions, and specific experience with distributed model training and serving.