Attiva gli avvisi di lavoro via e-mail!

Senior Engineer – AI Model Compression Research

Axelera AI

Milano

In loco

EUR 50.000 - 80.000

Tempo pieno

Ieri

Candidati tra i primi

Aumenta le tue possibilità di ottenere un colloquio

Crea un curriculum personalizzato per un lavoro specifico per avere più probabilità di riuscita.

Descrizione del lavoro

Axelera AI is seeking a Senior Engineer for AI Model Compression Research to develop cutting-edge techniques enhancing the efficiency of Generative AI models. This is an opportunity to influence real-time inference for high-performance AI applications across various environments, emphasizing collaboration and innovation.

Servizi

Impact on groundbreaking technology

Culture of innovation and continuous learning

Significant growth opportunities

Competitive salary and equity options

Competenze

Proven experience in model compression techniques like pruning and quantization.
Expertise in deep learning frameworks such as TensorFlow, PyTorch, or JAX.
Ability to work in a collaborative, fast-paced startup environment.

Mansioni

Design and implement advanced model compression techniques.
Optimize compressed models for high-throughput and low-latency inference.
Collaborate with AI researchers and engineers to integrate optimizations.

Conoscenze

Model Compression Techniques

Deep Learning Algorithms

Collaboration & Communication

Formazione

PhD or advanced degree in Computer Science, Machine Learning, AI

Strumenti

TensorFlow

PyTorch

JAX

TensorRT

ONNX

Join to apply for the Senior Engineer – AI Model Compression Research role at Axelera AI

2 weeks ago Be among the first 25 applicants

Join to apply for the Senior Engineer – AI Model Compression Research role at Axelera AI

Get AI-powered advice on this job and more exclusive features.

Company Overview

Axelera is a European, high-growth Series B startup revolutionizing the AI landscape with our in-memory computing platform. We specialize in creating AI hardware and software optimized for high-performance inference, catering to cutting-edge use cases across high-end edge computing, embodied AI, and server-side AI deployments. We are looking for passionate, innovative research engineers to join our team and help drive the future of AI.

Company Overview

Role Overview

We are looking for an AI Research Engineer with a strong focus on model compression to join our dynamic team. This role will be responsible for developing cutting-edge compression techniques that make Generative AI models more efficient for real-time inference across a variety of environments, from high-end edge systems to large-scale server-side deployments. You will be key in ensuring that our models are optimized for memory usage, computational efficiency, and performance, while maintaining or improving model accuracy.

This is an exciting opportunity to work at the intersection of advanced machine learning, in-memory computing, and high-performance AI inference on cutting-edge hardware architectures.

Responsibilities :

Model Compression : Design and implement advanced model compression techniques such as pruning, quantization, weight sharing, and knowledge distillation to make models more memory-efficient and computationally optimized.
Performance Tuning : Optimize compressed models to achieve high-throughput and low-latency inference, specifically tailored to our in-memory computing platform.
Collaboration : Work closely with AI researchers, software engineers, and hardware engineers to integrate your model optimizations into our AI platform, ensuring that models work effectively across edge and server-side deployments.
Innovation : Stay on top of the latest developments in the AI and model compression research space, pushing the envelope on novel techniques for reducing model size without sacrificing performance.
Deployment & Testing : Implement best practices for model testing, deployment, and continuous improvement to ensure models scale effectively in production environments.

Requirements :

Experience : Proven experience (for all levels) working on model compression, including techniques like pruning, quantization, low-rank factorization, and knowledge distillation.

Technical Skills :

Expertise in deep learning frameworks such as TensorFlow, PyTorch, or JAX.

Experience optimizing models for resource-constrained environments, such as edge devices or embedded systems.

Familiarity with distributed systems, in-memory computing, or high-performance computing environments.

A strong understanding of deep learning algorithms, neural networks, and the trade-offs involved in model compression.

Knowledge : A strong understanding of the latest advancements in AI / ML research, particularly in compression and distillation of generative models (e.g. transformers and diffusion models).

Collaboration & Communication : Ability to work in a highly collaborative, fast-paced startup environment and communicate complex technical concepts clearly.

Preferred Qualifications :

PhD or advanced degree in Computer Science, Machine Learning, AI, or related fields.

5+ years of post-graduation relevant work experience.

Research experience in model compression, efficient inference, or deploying AI models to resource-constrained devices.

Familiarity with model deployment frameworks like TensorRT, ONNX, or similar.

A passion for solving real-world challenges with AI in dynamic, high-performance environments.

Location

This position is based in Italy & we support relocation to Bologna, Florence or Milan for talent based abroad and interested in this role.

Why Join Us?

Impact : Work on groundbreaking technology that will power the next wave of AI applications, from edge computing to embodied AI systems.

Culture : Join a diverse, driven team that values innovation, collaboration, and continuous learning.

Growth : As a Series B startup, you’ll have significant growth opportunities, including the chance to shape the direction of the product and AI strategy.

Compensation : Competitive salary, equity options, and benefits package.

How to Apply?

Please submit your resume and a brief cover letter explaining why you're excited about this opportunity, and how your experience aligns with our model compression goals.

At Axelera AI, we wholeheartedly embrace equal opportunity and hold diversity in the highest regard. Our steadfast commitment is to cultivate a warm and inclusive environment that empowers and celebrates every member of our team. We welcome applicants from all backgrounds to join us in shaping the future of AI.

Seniority level

Mid-Senior level

Employment type

Full-time

Job function

Engineering and Information Technology

Industries

Semiconductor Manufacturing

Referrals increase your chances of interviewing at Axelera AI by 2x

Get notified about new Senior Engineer jobs in Milan, Lombardy, Italy .

We’re unlocking community knowledge in a new way. Experts add insights directly into each article, started with the help of AI.

J-18808-Ljbffr