This range is provided by Koda Staff. Your actual pay will be based on your skills and experience — talk with your recruiter to learn more.
Base pay range
$100,000.00/yr - $160,000.00/yr
Direct message the job poster from Koda Staff.
Helping clients find top USA based software engineering talent across the fin-tech, financial services, AI, and SaaS sectors: Need Golang, Python…
On a mission to make AI smarter, faster, and everywhere. If you live and breathe machine learning pipelines, cloud platforms, and deployment automation, we want you to join my client's team of tech rebels.
They’re not looking for mere mortals— we want an MLOps guru ready to optimize, automate, and scale machine learning to the moon.
What you’ll be doing:
- Design, develop, and optimize ML pipelines like a wizard—ensuring training, validation, and inference run smoothly at scale.
- Automate deployment of deep learning and generative AI models for real-time applications.
- Implement model versioning, rollbacks, and ensure seamless updates.
- Deploy and manage ML models on cloud platforms (AWS, GCP, Azure) using your containerized magic.
- Optimize real-time inference performance—bring TensorRT, ONNX, and PyTorch to their full glory.
- Work with GPU acceleration, distributed computing, and parallel processing to make AI workloads faster.
- Fine-tune models to slice latency and boost scalability.
- Build and maintain CI/CD pipelines for ML models (GitHub Actions, Jenkins, ArgoCD).
- Automate retraining and deployment to ensure the AI is always learning.
- Develop monitoring solutions to track model drift, data integrity, and performance.
- Stay on top of security, data privacy, and AI ethics standards.
What we need from you:
- 5+ years of experience in MLOps, DevOps, or AI model deployment.
- Mastery of Python and ML frameworks like TensorFlow, PyTorch, and ONNX.
- You’ve deployed models using Docker, Kubernetes, and serverless architectures.
- Hands-on experience with ML pipeline tools (ArgoWorkflow, Kubeflow, MLflow, Airflow).
- Expertise in cloud platforms (AWS, GCP, Azure) and hosting AI/ML models.
- GPU-based inference acceleration experience (CUDA, TensorRT, NVIDIA DeepStream).
- Solid background in CI/CD workflows, automated testing, and deploying ML models.
- Real-time inference optimization and scalable ML infrastructure experience.
- Excellent technical judgment.
- You’re all about automation.
- You’ve got a deep understanding of distributed systems and computing architectures.
- Self-driven with the ability to work independently.
- Experience with Kubernetes, Docker, or microservices.
- BS or MS in Computer Science or equivalent.
Nice to have:
- Some CUDA programming skills.
- Experience with LLMs and generative AI models in production.
- A bit of networking knowledge.
- Familiarity with distributed computing frameworks (Ray, Horovod, Spark).
- Edge AI deployment experience (Triton Inference Server, TFLite, CoreML).
Why Them?
- Work with top-tier AI experts in a fast-growing startup.
- Flexibility—work from anywhere, anytime, as long as you get stuff done.
- Competitive salary plus benefits.
- Learning culture—we provide opportunities to grow and expand your skills.
- A work hard, play hard culture.
Seniority level
Mid-Senior level
Employment type
Full-time
Job function
Information Technology
Industries
Software Development