Enable job alerts via email!

Staff AI Research Engineer, Video Understanding and Vision Language Models

Rivian and Volkswagen Group Technologies

Palo Alto (CA)

On-site

USD 206,000 - 258,000

Full time

3 days ago
Be an early applicant

Boost your interview chances

Create a job specific, tailored resume for higher success rate.

Job summary

Rivian and Volkswagen Group Technologies is seeking a Staff AI Research Engineer specializing in Video Understanding and Vision Language Models. This role focuses on developing innovative AI solutions that enhance vehicle safety and user interaction through advanced algorithms in video analysis and multimodal learning.

Benefits

Robust medical/Rx, dental, and vision insurance
Full-time employee coverage effective day one

Qualifications

  • Experience in Video Captioning and Question Answering.
  • Hands-on experience with VLM tasks.
  • Proficiency with large-scale multimodal datasets.

Responsibilities

  • Develop advanced video understanding algorithms.
  • Design and fine-tune Vision Language Models.
  • Collaborate on in-vehicle systems for enhanced interaction.

Skills

Python
Deep Learning
Computer Vision
Multimodal Learning
Video Analysis

Education

MS or PhD in Computer Science
Electrical Engineering or related field

Tools

TensorFlow
PyTorch
OpenCV

Job description

Staff AI Research Engineer, Video Understanding and Vision Language Models

Join to apply for the Staff AI Research Engineer, Video Understanding and Vision Language Models role at Rivian and Volkswagen Group Technologies

Staff AI Research Engineer, Video Understanding and Vision Language Models

Join to apply for the Staff AI Research Engineer, Video Understanding and Vision Language Models role at Rivian and Volkswagen Group Technologies

Get AI-powered advice on this job and more exclusive features.

About Us

Rivian and Volkswagen Group Technologies is a joint venture between two industry leaders with a clear vision for automotive’s next chapter. From operating systems to zonal controllers to cloud and connectivity solutions, we’re addressing the challenges of electric vehicles through technology that will set the standards for software-defined vehicles around the world.

About Us

Rivian and Volkswagen Group Technologies is a joint venture between two industry leaders with a clear vision for automotive’s next chapter. From operating systems to zonal controllers to cloud and connectivity solutions, we’re addressing the challenges of electric vehicles through technology that will set the standards for software-defined vehicles around the world.

The road to the future is uncharted. By combining our expertise across connectivity, AI, security and more, we’ll map a new way forward. Working together, we’ll create a future that’s more connected, more intelligent, more sustainable for everyone.

Role Summary

Join our dynamic team in the automotive industry as an AI Research Engineer specializing in Video Understanding and Vision Language Models (VLM). Contribute to developing cutting-edge AI solutions for complex video analysis and multimodal interactions. In this role, you will focus on building and optimizing advanced algorithms for video understanding, VLM development, scene analysis, and real-time processing. Your work will enhance vehicle safety, improve user experiences through rich multimodal interactions, and enable innovative in-vehicle applications that leverage both visual and language data.

Responsibilities

  • Develop and implement advanced video understanding algorithms, including action recognition, video captioning, and video question answering, integrating VLM approaches.
  • Design, train, and fine-tune Vision Language Models (VLMs) using large-scale multimodal datasets, particularly automotive and video-centric data.
  • Deploy and optimize VLM and video processing models for edge devices and real-time automotive applications, ensuring efficient performance and latency.
  • Collaborate with cross-functional teams to integrate video understanding and VLM solutions into in-vehicle systems and infotainment platforms for enhanced user interaction.
  • Conduct cutting-edge research to stay up-to-date with the latest advancements in video understanding, VLM technologies, and multimodal learning.
  • Create scalable, maintainable pipelines for multimodal data processing, VLM training, and deployment.
  • Document workflows, algorithms, and results for internal and external stakeholders, with a focus on VLM and video analysis techniques.

Qualifications

  • MS or PhD in Computer Science, Electrical Engineering, or a related field with a focus on Computer Vision, Natural Language Processing, or Multimodal Learning.
  • Strong programming skills in Python, with extensive experience in deep learning frameworks like TensorFlow or PyTorch, and specifically with VLM libraries and frameworks.
  • Proficiency in computer vision techniques, including CNNs, Transformers, video modeling, and multimodal architectures.
  • Hands-on experience with large-scale video data and VLM tasks such as video captioning, video question answering, and multimodal retrieval.
  • Familiarity with image and video processing libraries (e.g., OpenCV, scikit-image) and natural language processing libraries (e.g., Hugging Face Transformers).
  • Experience with hardware accelerators (e.g., GPUs, NPUs) for training and deploying complex VLM and video models.

Preferred Qualifications

  • Authored or co-authored publications in top-tier computer vision, natural language processing, or AI conferences/journals (e.g., CVPR, ICCV, NeurIPS, ICML, ACL).
  • Deep knowledge of spatiotemporal modeling, video understanding techniques, and advanced VLM architectures.
  • Familiarity with MLOps practices, including model deployment pipelines, CI/CD for VLM, and multimodal data management.
  • Background in optimizing large-scale models for resource-constrained and edge environments, specifically focusing on VLM and video processing.
  • Extensive experience with annotation tools and synthetic data generation for multimodal training datasets, including video and text data.

Pay Disclosure

Salary Range/Hourly Rate for California Based Applicants: 206,000 USD - 258,000 USD (actual compensation will be determined based on experience, location, and other factors permitted by law).

Benefits Summary: Rivian and Volkswagen Group Technologies provides robust medical/Rx, dental and vision insurance packages for full-time and part-time employees, their spouse or domestic partner, and children up to age 26. Full Time Employee coverage is effective on the first day of employment. Part-Time employee coverage is effective the first of the month following 90 days of employment.

Equal Opportunity

Rivian and Volkswagen Group Technologies is committed to creating a diverse environment and is proud to be an equal opportunity employer. All qualified applicants will receive consideration for employment without regard to race, color, religion, national origin, ancestry, sex, sexual orientation, gender, gender expression, gender identity, genetic information or characteristics, physical or mental disability, marital/domestic partner status, age, military/veteran status, medical condition, or any other characteristic protected by law. We are also committed to ensuring compliance with all applicable fair employment practice laws regarding citizenship and immigration status.

Rivian and Volkswagen Group Technologies is committed to ensuring that our hiring process is accessible for persons with disabilities. If you have a disability or limitation, such as those covered by the Americans with Disabilities Act, that requires accommodations to assist you in the search and application process, please email us at candidateaccommodations@rivian.com.

Candidate Data Privacy

Rivian and VW Group Technologies (“Rivian and Volkswagen Group Technologies”) may collect, use and disclose your personal information or personal data (within the meaning of the applicable data protection laws) when you apply for employment and/or participate in our recruitment processes (“Candidate Personal Data”). This data includes contact, demographic, communications, educational, professional, employment, social media/website, network/device, recruiting system usage/interaction, security and preference information. Rivian and Volkswagen Group Technologies may use your Candidate Personal Data for the purposes of (i) tracking interactions with our recruiting system; (ii) carrying out, analyzing and improving our application and recruitment process, including assessing you and your application and conducting employment, background and reference checks; (iii) establishing an employment relationship or entering into an employment contract with you; (iv) complying with our legal, regulatory and corporate governance obligations; (v) recordkeeping; (vi) ensuring network and information security and preventing fraud; and (vii) as otherwise required or permitted by applicable law.

Rivian and Volkswagen Group Technologies may share your Candidate Personal Data with (i) internal personnel who have a need to know such information in order to perform their duties, including individuals on our People Team, Finance, Legal, and the team(s) with the position(s) for which you are applying; (ii) Rivian and Volkswagen Group Technologies affiliates; and (iii) Rivian and Volkswagen Group Technologies’ service providers, including providers of background checks, staffing services, and cloud services.

Rivian and Volkswagen Group Technologies may transfer or store internationally your Candidate Personal Data, including to or in the United States, Canada, and the European Union and in the cloud, and this data may be subject to the laws and accessible to the courts, law enforcement and national security authorities of such jurisdictions.

Please see our Candidate Data Privacy Notice (English) and Candidate Data Privacy Notice (Serbian) for more information.

Please note that we are currently not accepting applications from third party application services.

Seniority level
  • Seniority level
    Not Applicable
Employment type
  • Employment type
    Full-time
Job function
  • Job function
    Engineering and Information Technology
  • Industries
    Software Development

Referrals increase your chances of interviewing at Rivian and Volkswagen Group Technologies by 2x

Sign in to set job alerts for “Artificial Intelligence Engineer” roles.
Software Engineer, AI Platform - New Grad

Palo Alto, CA $150,000.00-$220,000.00 4 hours ago

San Jose, CA $137,500.00-$236,500.00 1 month ago

San Jose, CA $120,700.00-$228,600.00 1 week ago

San Jose, CA $119,000.00-$177,000.00 2 days ago

San Jose, CA $120,700.00-$228,600.00 1 week ago

San Jose, CA $113,500.00-$250,000.00 5 days ago

San Jose, CA $113,500.00-$250,000.00 1 week ago

Palo Alto, CA $100,000.00-$160,000.00 1 month ago

Mountain View, CA $167,200.00-$250,800.00 2 weeks ago

Data Engineer, Play Data Science and Analytics

San Jose, CA $120,700.00-$228,600.00 1 week ago

Cupertino, CA $1,000.00-$20,000.00 4 weeks ago

Machine Learning Engineer, Monetization Engineering
Machine Learning Engineer for Game Technology

San Jose, CA $120,700.00-$301,200.00 1 week ago

Software Engineer, AI/ML, Google Cloud AI

Sunnyvale, CA $141,000.00-$202,000.00 2 weeks ago

San Jose, CA $130,000.00-$182,000.00 9 months ago

Software Engineer, Machine Learning, YouTube Ads

Mountain View, CA $141,000.00-$202,000.00 23 hours ago

San Jose, CA $137,500.00-$236,500.00 2 months ago

We’re unlocking community knowledge in a new way. Experts add insights directly into each article, started with the help of AI.

Get your free, confidential resume review.
or drag and drop a PDF, DOC, DOCX, ODT, or PAGES file up to 5MB.