Staff AI Research Engineer, Video Understanding and Vision Language Models
Join to apply for the Staff AI Research Engineer, Video Understanding and Vision Language Models role at Rivian and Volkswagen Group Technologies
Staff AI Research Engineer, Video Understanding and Vision Language Models
Join to apply for the Staff AI Research Engineer, Video Understanding and Vision Language Models role at Rivian and Volkswagen Group Technologies
Get AI-powered advice on this job and more exclusive features.
About Us
Rivian and Volkswagen Group Technologies is a joint venture between two industry leaders with a clear vision for automotive’s next chapter. From operating systems to zonal controllers to cloud and connectivity solutions, we’re addressing the challenges of electric vehicles through technology that will set the standards for software-defined vehicles around the world.
About Us
Rivian and Volkswagen Group Technologies is a joint venture between two industry leaders with a clear vision for automotive’s next chapter. From operating systems to zonal controllers to cloud and connectivity solutions, we’re addressing the challenges of electric vehicles through technology that will set the standards for software-defined vehicles around the world.
The road to the future is uncharted. By combining our expertise across connectivity, AI, security and more, we’ll map a new way forward. Working together, we’ll create a future that’s more connected, more intelligent, more sustainable for everyone.
Role Summary
Join our dynamic team in the automotive industry as an AI Research Engineer specializing in Video Understanding and Vision Language Models (VLM). Contribute to developing cutting-edge AI solutions for complex video analysis and multimodal interactions. In this role, you will focus on building and optimizing advanced algorithms for video understanding, VLM development, scene analysis, and real-time processing. Your work will enhance vehicle safety, improve user experiences through rich multimodal interactions, and enable innovative in-vehicle applications that leverage both visual and language data.
Responsibilities
- Develop and implement advanced video understanding algorithms, including action recognition, video captioning, and video question answering, integrating VLM approaches.
- Design, train, and fine-tune Vision Language Models (VLMs) using large-scale multimodal datasets, particularly automotive and video-centric data.
- Deploy and optimize VLM and video processing models for edge devices and real-time automotive applications, ensuring efficient performance and latency.
- Collaborate with cross-functional teams to integrate video understanding and VLM solutions into in-vehicle systems and infotainment platforms for enhanced user interaction.
- Conduct cutting-edge research to stay up-to-date with the latest advancements in video understanding, VLM technologies, and multimodal learning.
- Create scalable, maintainable pipelines for multimodal data processing, VLM training, and deployment.
- Document workflows, algorithms, and results for internal and external stakeholders, with a focus on VLM and video analysis techniques.
Qualifications
- MS or PhD in Computer Science, Electrical Engineering, or a related field with a focus on Computer Vision, Natural Language Processing, or Multimodal Learning.
- Strong programming skills in Python, with extensive experience in deep learning frameworks like TensorFlow or PyTorch, and specifically with VLM libraries and frameworks.
- Proficiency in computer vision techniques, including CNNs, Transformers, video modeling, and multimodal architectures.
- Hands-on experience with large-scale video data and VLM tasks such as video captioning, video question answering, and multimodal retrieval.
- Familiarity with image and video processing libraries (e.g., OpenCV, scikit-image) and natural language processing libraries (e.g., Hugging Face Transformers).
- Experience with hardware accelerators (e.g., GPUs, NPUs) for training and deploying complex VLM and video models.
Preferred Qualifications
- Authored or co-authored publications in top-tier computer vision, natural language processing, or AI conferences/journals (e.g., CVPR, ICCV, NeurIPS, ICML, ACL).
- Deep knowledge of spatiotemporal modeling, video understanding techniques, and advanced VLM architectures.
- Familiarity with MLOps practices, including model deployment pipelines, CI/CD for VLM, and multimodal data management.
- Background in optimizing large-scale models for resource-constrained and edge environments, specifically focusing on VLM and video processing.
- Extensive experience with annotation tools and synthetic data generation for multimodal training datasets, including video and text data.
Pay Disclosure
Salary Range/Hourly Rate for California Based Applicants: 206,000 USD - 258,000 USD (actual compensation will be determined based on experience, location, and other factors permitted by law).
Benefits Summary: Rivian and Volkswagen Group Technologies provides robust medical/Rx, dental and vision insurance packages for full-time and part-time employees, their spouse or domestic partner, and children up to age 26. Full Time Employee coverage is effective on the first day of employment. Part-Time employee coverage is effective the first of the month following 90 days of employment.
Equal Opportunity
Rivian and Volkswagen Group Technologies is committed to creating a diverse environment and is proud to be an equal opportunity employer. All qualified applicants will receive consideration for employment without regard to race, color, religion, national origin, ancestry, sex, sexual orientation, gender, gender expression, gender identity, genetic information or characteristics, physical or mental disability, marital/domestic partner status, age, military/veteran status, medical condition, or any other characteristic protected by law. We are also committed to ensuring compliance with all applicable fair employment practice laws regarding citizenship and immigration status.
Rivian and Volkswagen Group Technologies is committed to ensuring that our hiring process is accessible for persons with disabilities. If you have a disability or limitation, such as those covered by the Americans with Disabilities Act, that requires accommodations to assist you in the search and application process, please email us at candidateaccommodations@rivian.com.
Candidate Data Privacy
Rivian and VW Group Technologies (“Rivian and Volkswagen Group Technologies”) may collect, use and disclose your personal information or personal data (within the meaning of the applicable data protection laws) when you apply for employment and/or participate in our recruitment processes (“Candidate Personal Data”). This data includes contact, demographic, communications, educational, professional, employment, social media/website, network/device, recruiting system usage/interaction, security and preference information. Rivian and Volkswagen Group Technologies may use your Candidate Personal Data for the purposes of (i) tracking interactions with our recruiting system; (ii) carrying out, analyzing and improving our application and recruitment process, including assessing you and your application and conducting employment, background and reference checks; (iii) establishing an employment relationship or entering into an employment contract with you; (iv) complying with our legal, regulatory and corporate governance obligations; (v) recordkeeping; (vi) ensuring network and information security and preventing fraud; and (vii) as otherwise required or permitted by applicable law.
Rivian and Volkswagen Group Technologies may share your Candidate Personal Data with (i) internal personnel who have a need to know such information in order to perform their duties, including individuals on our People Team, Finance, Legal, and the team(s) with the position(s) for which you are applying; (ii) Rivian and Volkswagen Group Technologies affiliates; and (iii) Rivian and Volkswagen Group Technologies’ service providers, including providers of background checks, staffing services, and cloud services.
Rivian and Volkswagen Group Technologies may transfer or store internationally your Candidate Personal Data, including to or in the United States, Canada, and the European Union and in the cloud, and this data may be subject to the laws and accessible to the courts, law enforcement and national security authorities of such jurisdictions.
Please see our Candidate Data Privacy Notice (English) and Candidate Data Privacy Notice (Serbian) for more information.
Please note that we are currently not accepting applications from third party application services.Seniority level
Seniority level
Not Applicable
Employment type
Job function
Job function
Engineering and Information TechnologyIndustries
Software Development
Referrals increase your chances of interviewing at Rivian and Volkswagen Group Technologies by 2x
Sign in to set job alerts for “Artificial Intelligence Engineer” roles.
Software Engineer, AI Platform - New Grad
Palo Alto, CA $150,000.00-$220,000.00 4 hours ago
San Jose, CA $137,500.00-$236,500.00 1 month ago
San Jose, CA $120,700.00-$228,600.00 1 week ago
San Jose, CA $119,000.00-$177,000.00 2 days ago
San Jose, CA $120,700.00-$228,600.00 1 week ago
San Jose, CA $113,500.00-$250,000.00 5 days ago
San Jose, CA $113,500.00-$250,000.00 1 week ago
Palo Alto, CA $100,000.00-$160,000.00 1 month ago
Mountain View, CA $167,200.00-$250,800.00 2 weeks ago
Data Engineer, Play Data Science and Analytics
San Jose, CA $120,700.00-$228,600.00 1 week ago
Cupertino, CA $1,000.00-$20,000.00 4 weeks ago
Machine Learning Engineer, Monetization Engineering
Machine Learning Engineer for Game Technology
San Jose, CA $120,700.00-$301,200.00 1 week ago
Software Engineer, AI/ML, Google Cloud AI
Sunnyvale, CA $141,000.00-$202,000.00 2 weeks ago
San Jose, CA $130,000.00-$182,000.00 9 months ago
Software Engineer, Machine Learning, YouTube Ads
Mountain View, CA $141,000.00-$202,000.00 23 hours ago
San Jose, CA $137,500.00-$236,500.00 2 months ago
We’re unlocking community knowledge in a new way. Experts add insights directly into each article, started with the help of AI.