About PredictX
Make a real difference at one of London’s foremost SaaS scale-ups: Be ready to pioneer the future of AI, data analytics, and technology. Step into PredictX, where we don't just see AI as a fashionable bandwagon but have lived and breathed AI & ML in every aspect of our product for the past decade.
As an Enterprise SaaS provider, we're revolutionising critical decision-making for many of the world’s largest businesses, including 3 FAANGs, seeking empowerment through our integrative AI technology and Predictive Analytics.
We pride ourselves on our commitment to staying at the forefront of technological advancements. You'll be joining a team that actively explores and integrates the latest innovations to maintain our competitive edge.
The Role
As a Senior/Expert ML/Data Engineer, you will be at the forefront of our data science and machine learning initiatives, actively contributing to the evolution of our AI-powered solutions. You will be instrumental in designing, building, and maintaining our cutting-edge data infrastructure and machine learning pipelines, with a growing focus on leveraging the power of Large Language Models (LLMs) and other emerging AI technologies.
This role demands a strong blend of data engineering prowess, machine learning understanding (including LLMs), and the ability to translate complex business needs into robust technical solutions. You will be expected to lead projects, mentor junior team members, and drive innovation within our rapidly evolving data and AI landscape.
Key Responsibilities
- Design, develop, and maintain scalable and efficient data pipelines using technologies such as Spark, Python, and relevant ETL tools to support our machine learning models, including those leveraging LLMs, and analytical needs.
- Architect and implement robust data warehousing solutions and data models that ensure data quality, integrity, and performance, catering to the specific data requirements of advanced AI models.
- Lead the development, testing, and deployment of machine learning models, including exploration and integration of Large Language Models (LLMs) and other novel AI architectures, collaborating closely with Data Scientists to productionize innovative solutions.
- Engineer approaches for storing, transforming, transporting, synchronising, archiving, and securing large and complex datasets, including unstructured and semi-structured data crucial for training and deploying advanced AI models.
- Participate in the evaluation and testing of new machine learning models and frameworks, including LLMs, to assess their potential and applicability to our products.
- Identify and resolve performance bottlenecks, data quality issues, and other pain points within our data and ML infrastructure. Proactively recommend and implement solutions for optimization and improvement, especially in the context of deploying large-scale AI models.
- Define and govern data modelling and design standards, best practices, and development methodologies within the team, considering the unique challenges and opportunities presented by LLMs and other advanced AI.
- Create and maintain comprehensive technical documentation for data pipelines, data models, and machine learning workflows, including details specific to LLM integration and testing.
- Collaborate effectively with Business Analysts, Data Scientists, and other engineering teams to understand data requirements and deliver impactful data and AI solutions.
- Stay abreast of the latest advancements in data engineering, machine learning (including LLMs and generative AI), and big data technologies, and actively participate in the evaluation and integration of promising new technologies.
- Mentor and guide junior Data Engineers and Data Scientists within the team, sharing knowledge about new AI developments and best practices.
Experience/Skills
- Extensive (5+ years) proven experience in building and maintaining complex data pipelines and data warehousing solutions in a production environment.
- Expert proficiency in data engineering tools and technologies, including Spark, Python, SQL, and various ETL/ELT tools.
- Deep understanding of data modelling techniques and data warehousing concepts.
- Strong knowledge of data governance, data quality principles, and data security best practices, with an awareness of the specific security and ethical considerations related to AI models.
- Significant experience with data integration, data cleansing, and data transformation processes on large datasets, including data preparation for machine learning and LLMs.
- Familiarity with data profiling and data lineage tools.
- Excellent ability to identify, diagnose, and resolve data issues, performance bottlenecks, and data quality problems effectively, including those encountered when working with large AI models.
- Strong analytical and problem-solving skills to analyse complex data sets and translate them into actionable technical solutions, with an aptitude for understanding the nuances of AI model performance.
- Excellent written and verbal communication skills to effectively convey technical concepts to both technical and non-technical audiences, including discussions around AI model capabilities and limitations.
- Strong teamwork and collaboration skills with the ability to build effective working relationships across teams, especially when integrating new AI technologies.
- A proactive and solution-oriented approach with a strong drive to learn and implement new technologies, particularly within the rapidly evolving fields of AI and LLMs.
- Meticulous attention to detail to ensure data accuracy and integrity, which is critical for the reliability of AI models.
- Proven ability to write clear and concise technical documentation, including documentation for AI model development and deployment.
Desired Skills
- Solid understanding of machine learning fundamentals, algorithms, and libraries, with specific interest or experience in Natural Language Processing (NLP) and Large Language Models (LLMs).
- Experience in deploying and monitoring machine learning models, including LLMs, in a production environment.
- Experience with building and maintaining data pipelines using orchestration tools like Apache Airflow.
- Familiarity with big data technologies beyond Spark, such as Hadoop, Kafka, NoSQL databases, and data streaming platforms, and their application in AI workflows.
- Experience with cloud platforms like AWS, Azure, or GCP, and their data engineering and machine learning services, including those specific to LLMs.
- Understanding of CI/CD pipelines for data and machine learning deployments, including considerations for deploying and updating AI models.
- Exposure to statistical analysis and data visualization tools, and their use in understanding AI model performance and data insights.
- Experience with prompt engineering and fine-tuning of Large Language Models.
What we offer
- Innovative Projects: Work on cutting-edge projects that push the boundaries of AI, machine learning, and data engineering, including the exciting application of Large Language Models.
- Dynamic Technology Environment: Be part of a team that actively explores, tests, and integrates the latest technological advancements, including the newest developments in AI and LLMs.
- Innovation Hub: Collaborate with a team of experts and leverage the latest technologies to drive innovation in the realm of AI and data.
- Collaborative Culture: A supportive and team-focused environment where your expertise and contributions, especially in the area of new AI technologies, are highly valued.
- Growth Opportunities: Opportunities for professional development and growth within a rapidly expanding company at the forefront of AI innovation.
- Impactful Work: Play a key role in shaping the future of our product and influencing critical business decisions for world-leading companies through the application of advanced AI technologies.