DevOps Engineer - Remote / Telecommute
Cynet systems Inc
Atlanta (GA)
Remote
USD 100,000 - 130,000
Full time
Job summary
A cutting-edge technology company in Atlanta is seeking an experienced cloud engineer to develop and maintain scalable inference platforms for large language models. Responsibilities include managing cloud engineering projects and implementing distributed inference optimization techniques. Candidates should have experience with modern distributed environments and proficiency in relevant programming languages. A Bachelor’s or Master’s degree in a related field is preferred.
Qualifications
- Deep experience building services in cloud and distributed environments.
- Experience with Large Language Models (LLMs).
- Strong communication skills for technical documentation.
- Hands-on experience with benchmarking tools.
- Familiarity with LLM performance metrics.
- Experience with inference engines like vLLM or NVIDIA Dynamo.
- Knowledge of distributed inference techniques.
Responsibilities
- Develop and maintain inference platforms for LLMs.
- Manage end-to-end cloud engineering projects.
- Improve tools and systems for performance monitoring.
- Design frameworks for benchmarking model performance.
- Implement optimization techniques for distributed inference.
Skills
Cloud infrastructure expertise
AI model inference understanding
Proficiency in Python
C++ programming ability
Problem-solving skills
Analytical skills
Debugging skills
Collaboration ability
Education
Bachelor’s or Master’s degree in Computer Science
Experience with AI infrastructure
Tools
Kubernetes
Docker
CI/CD
APIs
CUDA
ROCm
AITER
NCCL
Job Description
- Develop and maintain scalable inference platforms for serving LLMs optimized for NVIDIA and client GPUs.
- Manage end-to-end cloud engineering projects from ideation and prototyping to deployment and operations.
- Build and improve tooling and observability systems to monitor performance and system health.
- Design benchmarking frameworks to test and evaluate model serving performance across models, engines, and GPU configurations.
- Implement distributed inference optimization techniques, including tensor/data parallelism, KV cache optimizations, and intelligent routing.
- Build cross-platform inference support for diverse model architectures.
- Contribute to open-source inference engines to enhance performance and efficiency.
- Collaborate closely with cloud infrastructure, AI, and DevOps teams to ensure efficient deployment and scaling.
- Requirements / Must Have:
- Deep experience building services in modern cloud and distributed environments (Kubernetes, Docker, CI/CD, APIs, data storage, monitoring, logging, and alerting).
- Experience hosting and running inference on Large Language Models (LLMs).
- Strong communication skills with the ability to write detailed technical documentation.
- Hands-on experience building or using benchmarking tools for evaluating LLM inference.
- Familiarity with LLM performance metrics (prefill throughput, decode throughput, TPOT, TTFT).
- Experience with inference engines such as vLLM, SGLang, or Modular Max.
- Familiarity with distributed inference serving frameworks (llm-d, NVIDIA Dynamo, Ray Serve, etc.).
- Proficiency with client and NVIDIA GPU software such as CUDA, ROCm, AITER, NCCL, or Client.
- Knowledge of distributed inference optimization techniques and GPU tuning strategies.
- Expertise in cloud infrastructure, containerization, and microservices.
- Strong understanding of AI model inference and GPU acceleration.
- Proficiency in Python, C++, or related programming languages.
- Excellent problem-solving, analytical, and debugging skills.
- Ability to collaborate in a dynamic and fast-paced environment.
- Qualification and Education:
- Bachelor’s or Master’s degree in Computer Science, Artificial Intelligence, Electrical Engineering, or a related field.
- Experience with AI infrastructure or LLM deployment platforms is highly preferred.