Work somewhere with the creativity of a scaleup and expertise of an enterprise.
We are looking for a savvy AI Software Engineer to join our growing AI Research Team in Milan.
In this role, you will be responsible for designing, building, and maintaining robust data pipelines to support complex model training operations. You will work with diverse data sources spanning text, image, and audio modalities. Additionally, you'll collaborate with cross-functional teams—including researchers, engineers, and legal advisors—to ensure all data is collected, transformed, and used in a compliant and efficient manner.
Should you be successful, you would:
- Design, develop, and optimize large-scale data pipeline architectures for AI model training
- Manage data ingestion processes (e.g., web crawling, data extraction, search) to gather high-quality, diverse, and multimodal training datasets
- Oversee end-to-end data flow and transformation, ensuring consistent data delivery for multiple ongoing projects
- Collaborate with various research teams to incorporate the latest methods for enhancing pre-training datasets
- Implement Infrastructure-as-Code (IaC) practices to streamline data processing and system deployment
- Ensure all data collection and processing initiatives comply with legal and data privacy regulations
- Build and maintain distributed systems capable of handling terabytes of data efficiently and securely
- Continuously research and evaluate new techniques to improve dataset quality and pipeline performance.
Requirements
What You Have
- At least 5 years' proven experience in AI Software Engineer role or similar
- A degree in Computer Science, Informatics, Information Systems or similar
- Strong programming skills (e.g., Python, Java, or similar languages) and familiarity with distributed systems frameworks
- Advanced working knowledge and experience with a variety of databases, such as SQL and NoSQL
- Experience building and optimizing data pipelines and architectures with data processing technologies (e.g., Apache Spark, Hadoop, or similar) and cloud-based platforms
- Experience with CI/CD tools and Infrastructure-as-Code solutions (Jenkins, Travis, Argo CD, Terraform, CloudFormation...)
- Experience with MLOps is a plus
- Experience with HPC is a plus
Who You Are
- An enthusiastic individual, excited by the prospect of optimizing or even contributing to designing our company's data architecture
- An analytical thinker with a passion for data
- Proactive, comfortable working alone or as part of a team
- Super organized
- Accurate, with strong attention to detail
- Familiar with Agile development
- Fluent in English
Benefits
Perks
- Learning Friday. If our team members know more, so do we. That's why we give everyone a training budget that they can spend on books, online courses or other training materials.
- Smart Working. Trains can be a drag, you can save some commuting time by working from home.
- Salary is based on experience, and topped up with other bonuses
We offer a competitive salary, as well as an opportunity to receive company equity. The typical salary for this role ranges between € 50.000 and € 80.000. As you gain experience and make more significant contributions to the business, your compensation will be reviewed to match your impact.
Additionally, depending on your seniority and your performance, you'll have the opportunity to receive stock options, with a variable value calculated from your base salary, giving you the chance to directly participate in the company's success.
About Domyn
Domyn is a company specializing in the research and development of Responsible AI for regulated industries, including financial services, government, and heavy industry. It supports enterprises with proprietary, fully governable solutions based on a composable AI architecture — including LLMs, AI agents, and one of the world's largest supercomputers.
At the core of Domyn's product offer is a chip-to-frontend architecture that allows organizations to control the entire AI stack — from hardware to application — ensuring isolation, security, and governance throughout the AI lifecycle.
Its foundational LLMs, Domyn Large and Domyn Small, are designed for advanced reasoning and optimized to understand each business's specific language, logic, and context. Provided under an open-enterprise license, these models can be fully transferred and owned by clients.
Once deployed, they enable customizable agents that operate on proprietary data to solve complex, domain-specific problems. All solutions are managed via a unified platform with native tools for access management, traceability, and security.
Powering it all, Colosseum — a supercomputer in development using NVIDIA Grace Blackwell Superchips — will train next-gen models exceeding 1T parameters.
Domyn partners with Microsoft, NVIDIA, and G42. Clients include Allianz, Intesa Sanpaolo, and Fincantieri.
Please review our Privacy Policy here https://bit.ly/2XAy1gj .