Overview
About the Role
We are seeking a highly experienced and visionary Sr. Staff or Principal Engineer to join our Customers organization. This role is crucial for the evolution and scalability of our core Catalog and data intensive systems, while also playing a pivotal role in advancing our Machine Learning serving and serving infrastructure capabilities. This position will not only impact core business functions and drive significant revenue but also shape the future of our personalized, real-time ML-driven experiences. The ideal candidate will possess deep expertise in distributed systems, streaming processing, data intensive applications, and particularly, the deployment, scaling, and optimization of Machine Learning models in production.
This is a unique opportunity to join a dynamic and innovative team, and to make a significant impact on the future of our platform by advancing both our core data infrastructure and our Machine Learning capabilities. If you are a highly motivated and experienced engineer with a passion for solving complex technical challenges across distributed systems, data engineering, and ML serving, we encourage you to apply.
About the Team
Join a dynamic team at the heart of Instacart's success, leading the core shopping experience that millions of users rely on. We are obsessive about perfecting every aspect of the customer shopping journey on the app, encompassing UX formats, feeds, algorithms, personalization, recommender engines, and ranking systems to deliver an exceptional experience. Our team thrives on collaborative problem-solving in a fast-paced environment.
About the Job
- Provide architectural leadership for Catalog, streaming, and data-intensive systems, emphasizing ML serving infrastructure and best practices, and drive the technical roadmap.
- Design, build, and scale reliable, efficient, and adaptable solutions to address changing business and ML needs.
- Lead the development and optimization of ML serving endpoints, ensuring high availability, low latency, robust performance, and implement fail-fast input validations and track metrics using Datadog.
- Centralize ML serving logic and decouple it from product applications to improve debugging, manageability, and system performance.
- Drive and contribute to company-wide transformational initiatives, impacting key business metrics like revenue, personalization, and operational efficiency, and influence the direction of ML infrastructure including real-time inferencing.
- Serve as a subject matter expert for Catalog, streaming, data-intensive, and ML serving technologies, providing guidance and mentorship to engineering and data science teams.
- Identify and implement innovative solutions to optimize system performance, reduce costs, and improve data processing and ML serving latency.
- Collaborate with cross-functional teams, including Product, Retailer, IC App, Ads, ML Infrastructure, and Data Science, to deliver integrated ML-driven solutions, and lead incident response and resolution for high-severity issues.
About You
Minimum Qualifications
- Extensive experience in software engineering, with a focus on distributed systems, streaming processing (e.g., Flink), data intensive applications, and particularly, Machine Learning serving and deployment.
- Proven track record of designing, implementing, and scaling large-scale, high-performance systems, including ML serving infrastructure.
- Deep understanding of database technologies, data modeling, data pipelines, and ML model deployment patterns.
- Strong architectural skills and the ability to design and evaluate complex technical solutions across diverse technology domains, including Catalog, Streaming, and Machine Learning.
- Excellent problem-solving and debugging skills, with specific experience in addressing issues related to ML model serving, data quality, and infrastructure stability.
- Strong communication and collaboration skills, with the ability to effectively work across teams, influence stakeholders, and mentor junior engineers.
- Experience with cloud platforms and related technologies, including ML serving platforms (e.g., Sagemaker).
- Ability to quantify and demonstrate the impact of technical contributions on business results (e.g., revenue, efficiency, cost savings, and ML model performance).
- Familiarity with challenges related to ML lifecycle, data flow, and best practices
Preferred Qualifications
- Experience working with large-scale catalog systems or similar data-intensive platforms.
- Significant experience in designing and implementing high-throughput, low-latency ML serving systems.
- Contributions to open-source projects or technical publications related to distributed systems, data engineering, or Machine Learning serving.
- Experience in a high-growth, fast-paced environment, particularly in the context of scaling ML initiatives.