Skip to main content

We’re introducing an efficient, on-device robotics model with general-purpose dexterity and fast task adaptation.

In March, we introduced Gemini Robotics, our most advanced VLA (vision language action) model, bringing Gemini 2.0’s multimodal reasoning and real-world understanding into the physical world.

Today, we’re introducing Gemini Robotics On-Device, our most powerful VLA model optimized to run locally on robotic devices. Gemini Robotics On-Device shows strong general-purpose dexterity and task generalization, and it’s optimized to run efficiently on the robot itself.

Since the model operates independent of a data network, it’s helpful for latency sensitive applications, and ensures robustness in environments with intermittent or zero connectivity.

We’re also sharing a Gemini Robotics SDK to help developers easily evaluate Gemini Robotics On-Device on their tasks and environments, test our model in our MuJoCo physics simulator, and quickly adapt it to new domains, with as few as 50 to 100 demonstrations. Developers can access the SDK by signing up to our trusted tester program.

Model capabilities and performance

Gemini Robotics On-Device is a robotics foundation model for bi-arm robots, engineered to require minimal computational resources. It builds on the task generalization and dexterity capabilities of Gemini Robotics and is:

  • Designed for rapid experimentation with dexterous manipulation.
  • Adaptable to new tasks through fine-tuning to improve performance.
  • Optimized to run locally with low-latency inference.

Gemini Robotics On-Device achieves strong visual, semantic and behavioral generalization across a wide range of testing scenarios, follows natural language instructions, and completes highly-dexterous tasks like unzipping bags or folding clothes — all while operating directly on the robot.

In our evaluations, our On-Device mode exhibits strong generalization performance while running entirely locally.

Chart evaluating Gemini Robotics On-Device’s generalization performance, compared to our flagship Gemini Robotics model and the previous best on-device model.

Gemini Robotics On-Device also outperforms other on-device alternatives on more challenging out-of-distribution tasks and complex multi-step instructions. For developers seeking state-of-the-art results in these settings, without on-device limitations, we also offer the Gemini Robotics model.

Chart evaluating Gemini Robotics On-Device’s instruction following performance, compared to our flagship Gemini Robotics model and the previous best on-device model.

To learn more about our evaluations, read our Gemini Robotics tech report.

Adaptable to new tasks, generalizable across embodiments

Gemini Robotics On-Device is the first VLA model we’re making available for fine-tuning. While many tasks will work out of the box, developers can also choose to adapt the model to achieve better performance for their applications. Our model quickly adapts to new tasks, with as few as 50 to 100 demonstrations — indicating how well this on-device model can generalize its foundational knowledge to new tasks.

Here, we show how Gemini Robotics On-Device outperforms the current, best on-device VLA on tasks involving fine-tuning to newer models. We tested the model on seven dexterous manipulation tasks of varying degrees of difficulty, including zipping a lunch-box, drawing a card and pouring salad dressing.

Chart showing Gemini Robotics On-Device’s task adaptation performance, with fewer than 100 examples.

We further adapted the Gemini Robotics On-Device model to different robot embodiments. While we trained our model only for ALOHA robots, we were able to further adapt it to a bi-arm Franka FR3 robot and the Apollo humanoid robot by Apptronik.

On the bi-arm Franka, the model performs general-purpose instruction following, including handling previously unseen objects and scenes, completing dexterous tasks like folding a dress, or executing industrial belt assembly tasks that require precision and dexterity.

On the Apollo humanoid, we adapt the model to a significantly different embodiment. The same generalist model can follow natural language instructions and manipulate different objects, including previously unseen objects, in a general manner.

Model capabilities and performance

We’re developing all Gemini Robotics models in alignment with our AI Principles and applying a holistic safety approach spanning semantic and physical safety.

In practice, we capture semantic and content safety using the Live API, and interface our models with low-level safety critical controllers to execute the actions. We recommend evaluating the end-to-end system on our recently developed semantic safety benchmark and performing red-teaming exercises at all levels to expose the model’s safety vulnerabilities.

Our Responsible Development & Innovation (ReDI) team continues to analyze and advise on the real-world impact of all Gemini Robotics models, finding ways to maximize their societal impact and minimize risk. Then our Responsibility & Safety Council (RSC) reviews these assessments, providing feedback to integrate into model development to help further maximize benefits and minimize risk.

To gain a deeper understanding of Gemini Robotics On-Device’s usage and safety profile and to gather feedback, we’re initially releasing it to a select group of trusted testers.

Accelerating innovation in robotics

Gemini Robotics On-Device marks a step forward in making powerful robotics models more accessible and adaptable — and our on-device solution will help the robotics community tackle important latency and connectivity challenges.

The Gemini Robotics SDK will further accelerate innovation by allowing developers to adapt the model to their specific needs. Sign up for model and SDK access via our trusted tester program.

We’re excited to see what the robotics community will build with these new tools as we continue to explore the future of bringing AI into the physical world.

Leave a Reply