Enable job alerts via email!
A cutting-edge technology startup in London is seeking a Software Engineer with a focus on data to build core infrastructure for managing large-scale ML training datasets. The ideal candidate will have 3+ years of professional experience and proficiency in large-scale data processing and cloud platforms. Join a bold, innovative team dedicated to redefining AI in 3D technology.
SpAItial is pioneering the development of a frontier 3D foundation model, pushing the boundaries of AI, computer vision, and spatial computing. Our mission is to redefine how industries, from robotics and AR/VR to gaming and movies, generate and interact with 3D content.
We’re looking for individuals who are bold, innovative, and driven by a passion for pushing the boundaries of what’s possible. You should thrive in an environment where creativity meets challenge and be fearless in tackling complex problems. Our team is built on a foundation of dedication and a shared commitment to excellence, so we value people who take immense pride in their work and place the collective goals of the team above personal ambition. As a part of our startup, you’ll be at the forefront of the AI revolution in 3D technology, and we want you to be excited about shaping the future of this dynamic field. If you’re ready to make an impact, embrace the unknown, and collaborate with a talented group of visionaries, we want to hear from you.
Responsibilities
Be the first Software Engineer with a focus on data in a dynamic deep-tech ML startup, unblocking a high standard of execution across the company.
Architecting and building our core data infrastructure for managing large-scale ML training datasets (e.g., Apache Iceberg, Parquet).
Develop cloud-based data processing pipelines, that ingest and compute auxiliary metadata signals on image, video and 3D data (PySpark, Airflow).
Develop a data serving strategy for training ML models, including data loaders, caching, etc.
Generate tooling to help in the ML lifecycle, including evaluation, training data inspection, model versioning, experiment tracking, etc.
Ensure code quality and maintainability by conducting code reviews and promoting best coding practices.
Collaborate with team members to uphold best practices and improve the long-term health of the codebase.
Key Qualifications:
3 years full-time professional experience, committing code to a production environment.
Proficiency in large-scale data processing (e.g. Spark, Cloud SQL…), and large-scale data systems (Iceberg, Parquet, …)
Proficiency in cloud platforms (e.g. AWS, GCP, Azure).
Proficiency in the Python ecosystem and its best practices
Experience in CI/CD (e.g. CircleCI).
Preferred Qualifications
Familiarity and enthusiasm about AI-based coding solutions (e.g. Cursor, Windsurf, etc.)
Familiarity with ML concepts and frameworks (PyTorch).
Experience in large-scale processing of multimodal computer vision data (images, videos, captioning, etc) for ML purposes.
Experience in Structure-from-Motion for large-scale 3D reconstruction of image data.
At SpAItial, we are committed to creating a diverse and inclusive workplace. We welcome applications from people of all backgrounds, experiences, and perspectives. We are an equal opportunity employer and ensure all candidates are treated fairly throughout the recruitment process.