Senior ML Research Scientist, VLM/VLA

at Nuro

California (HQ), Mountain View, United States

Who We Are

Nuro is a self-driving technology company on a mission to make autonomy accessible to all. Founded in 2016, Nuro is building the world’s most scalable driver, combining cutting-edge AI with automotive-grade hardware. Nuro licenses its core technology, the Nuro Driver™, to support a wide range of applications, from robotaxis and commercial fleets to personally owned vehicles. With technology proven over years of self-driving deployments, Nuro gives the automakers and mobility platforms a clear path to AVs at commercial scale—empowering a safer, richer, and more connected future.

About the Role

In this role, you will develop vision-language-action models for our onboard Behavior & Planning stack, with the goal of improving safe and robust decision-making in complex, long-tail driving scenarios. You will work on multimodal models that connect scene understanding, contextual reasoning, and planning-relevant representations for real-world autonomous driving.

This role is focused on advancing state-of-the-art VLAs for autonomy, including model development, large-scale training, fine-tuning, evaluation, and onboard optimization. You will work closely with partners across behavior, planning, perception, systems, and infrastructure to translate research advances into practical capabilities deployed on our vehicles.

If you are excited about building and deploying cutting-edge VLA systems for real-world robotics, we'd love to hear from you.

About the Work

Develop and advance VLA models for onboard Behavior & Planning in autonomous driving systems.
Build multimodal models that improve safe decision-making in complex, ambiguous, and long-tail driving scenarios.
Research and apply state-of-the-art approaches in vision-language-action modeling, multimodal representation learning, and foundation models for autonomy.
Train, fine-tune, and evaluate large-scale VLAs using diverse real-world driving data.
Improve model quality, robustness, and generalization across challenging edge cases and dynamic real-world environments.
Optimize models for onboard deployment, including inference efficiency, latency, memory usage, and runtime performance.
Collaborate with autonomy, data, and infrastructure teams to define training, evaluation, and deployment requirements.
Design effective evaluation methodologies for multimodal models in safety-critical applications.
Contribute to scalable model and data pipelines that support rapid experimentation and production deployment.

About You

You have deep expertise and prior experience in some or many of the following areas:
You have an M.S. or Ph.D. in Computer Science, Machine Learning, Robotics, Artificial Intelligence, or a closely related field.
You have 5+ years of industry and/or research experience in machine learning, with a focus on large-scale model development.
You have strong experience with vision-language models (VLMs), multimodal foundation models, VLAs, or related architectures.
You have hands-on experience with large-scale model training, fine-tuning, and evaluation.
You have familiarity with core model components and techniques relevant to modern multimodal systems, such as Vision Transformers (ViTs) and large language models (LLMs).
You have experience optimizing models for deployment, including inference speed, memory efficiency, and performance under onboard compute constraints.
You have strong programming skills in Python and experience with modern deep learning frameworks such as PyTorch, JAX, and/or TensorFlow.
You are an independent researcher and strong collaborator who can move fluidly from early-stage ideas to practical implementation.
You thrive in a fast-paced research and development environment and are excited to deploy advanced ML systems in the real world.

Nice to Have

Experience applying VLMs/VLAs or other foundation models to autonomous driving, robotics, or embodied AI.
Familiarity with behavior, planning, scene understanding, or decision-making systems in AVs.
Experience with multimodal data curation, dataset development, or data quality systems at scale.
Experience with onboard ML deployment in real-time or resource-constrained environments.
Publications in top-tier machine learning, robotics, or computer vision conferences such as NeurIPS, ICML, ICLR, CVPR, ICCV, ECCV, CoRL, RSS, or ICRA.
Strong C++ skills or experience integrating ML models into production systems.

At Nuro, we celebrate differences and are committed to a diverse workplace that fosters inclusion and psychological safety for all employees. Nuro is proud to be an equal opportunity employer and expressly prohibits any form of workplace discrimination based on race, color, religion, gender, sexual orientation, gender identity or expression, national origin, age, genetic information, disability, veteran status, or any other legally protected characteristics.

At Nuro, your base pay is one part of your total compensation package. For this position, the reasonably expected pay range is between $183,825.00 and $275,975.00/year for the level at which this job has been scoped. Your base pay will depend on several factors, including your experience, qualifications, education, location, and skills. In the event that you are considered for a different level, a higher or lower pay range would apply. This position is also eligible for an annual performance bonus, equity, and a competitive benefits package.