Attiva gli avvisi di lavoro via e-mail!

GPU Engineer - Contract - Remote

Skillsearch Enterprise Technology

Roma

Remoto

EUR 50.000 - 70.000

Tempo pieno

Oggi
Candidati tra i primi

Descrizione del lavoro

A technology consulting firm in Italy is seeking a GPU Engineer to optimize GPU workload performance in enterprise settings. Responsibilities include implementing CI/CD pipelines for GPU applications, benchmarking performance, and automating resource management. Ideal candidates have over 2 years of experience in GPU engineering or workload optimization, alongside strong skills in relevant tools and technologies. This role offers a collaborative environment with opportunities for professional growth.

Competenze

  • Minimum 2 years of hands-on experience in GPU engineering or optimization.
  • NVIDIA Certified preferred.
  • Proven experience in analyzing deployment approaches for GPU frameworks.

Mansioni

  • Conduct performance benchmarking and tuning of GPU workloads.
  • Optimize applications leveraging GPU parallelization.
  • Document all deployment procedures for transparency.
  • Integrate GPU provisioning into CI/CD pipelines.

Conoscenze

GPU services provisioning
CUDA
OpenCL
TensorRT
Performance benchmarking
Infrastructure as Code
CI/CD pipelines
Kubernetes
Scripting in Python
Troubleshooting

Strumenti

Nsight
Prometheus
Grafana
NVIDIA Data Center GPU Manager
Descrizione del lavoro
About the Role

You will conduct comprehensive performance benchmarking, profiling, and tuning of GPU workloads to provide evidence-based recommendations on suitable GPU sharing techniques

Responsibilities

Optimize performance of existing and new applications by leveraging GPU parallelization, identifying bottlenecks, and deploying code and framework-level improvements. Perform a thorough analysis of the deployment methods for GPU-accelerated serving frameworks in the market, with reference implementations and best-practice recommendations for large-scale serving solutions (e.g., NVIDIA Triton Inference Server, TensorRT, ONNX Runtime). Develop repeatable and automated configuration templates for GPU resources. Implement active GPU monitoring, including review and analysis of all relevant metrics (utilization, memory bandwidth, power, temperature, etc.), and establish dashboards and alerts for proactive performance and health management. Integrate GPU resource provisioning and configuration into CI / CD pipelines using Infrastructure as Code (IaC) tools (e.g., Terraform, Helm Charts, etc), and document workflows for seamless deployment and rollback. Document all configurations, testing results, benchmarking analyses, and deployment procedures to ensure transparency and reproducibility. Establish active GPU monitoring protocols, including the identification and evaluation of available metrics, to select the most relevant indicators for ongoing performance management. Support self-service deployment of Large Language Models (LLMs) on GPU resources, enabling application owners with varying technical expertise to access and utilize GPU capabilities seamlessly.

Qualifications

Minimum 2 years of hands-on experience in GPU engineering or cloud-based GPU workload optimization, ideally within enterprise or large-scale environments. NVIDIA Certified (Preferred).

Required Skills

Direct experience with GPU services, including resource provisioning, scaling, and optimization. Demonstrable expertise in GPU-accelerated software development (CUDA, OpenCL, TensorRT, PyTorch, TensorFlow, ONNX, etc.). Strong background in performance benchmarking, profiling (Nsight, nvprof, or similar tools), and workload tuning. Experience with Infrastructure as Code (Terraform, HELM Charts, or equivalent) for automated cloud resource management. Proven experience designing and implementing CI / CD pipelines for GPU-enabled applications using tools like GitHub Actions (Preferred) or similar. Working knowledge of Kubernetes and GPU scheduling, including setup of GPU-enabled clusters and deployment of GPU workloads in Kubernetes. Familiarity with GPU monitoring and observability, using tools such as Prometheus, Grafana, NVIDIA Data Center GPU Manager (DCGM), or custom scripts. Proven ability to analyze deployment approaches for GPU-accelerated serving frameworks and deliver reference implementations. Experience implementing software quality engineering practices (unit testing, code review, test automation, reproducibility). Strong scripting skills in Python, Bash, or PowerShell for automation and monitoring purposes. Excellent analytical, problem solving, and troubleshooting abilities. Quick learner, adaptable to evolving requirements and emerging GPU / cloud technologies. Positive and collaborative attitude in Agile environments.

Ottieni la revisione del curriculum gratis e riservata.
oppure trascina qui un file PDF, DOC, DOCX, ODT o PAGES di non oltre 5 MB.