Technical comparison of Deepseek, Novasky, S1, Helix, P0
post by Juliezhanggg · 2025-02-25T04:20:40.413Z · LW · GW · 0 commentsContents
Comparing Novasky with S1: Looking into Deepseek: Physical intelligence: Helix Vision-Language-Action (VLA) model: Comparing the two models Conclusion: The Evolving Landscape of AI and Robotics Key Insights Future Implications None No comments
Comparing Novasky with S1:
NovaSky by Berkeley club, S1 by Feifei Li (arXiv:2501.19393), are the players who don't have capital or compute, mainly focuses on developing method that finetune a large language model with curated minimal reasoning datasets. Sky-T1 trained an entire model, with datasets from diverse domains, 17K training examples. S1 is focusing more on test-time scaling which is extending the reasoning time when needed through "wait" token repeatedly when it want to terminate the reasoning. But the datasets used to finetune is much smaller, which is only 1000 chosen questions with detailed reasoning traces. One method is more versatile, the other one is focused on math problem by increasing inference time for better performance.
- s1:
- Inference-Time Enhancement:
Concentrates on leveraging additional computation during inference (test-time scaling) to boost performance without necessarily changing the model’s underlying architecture.
- Inference-Time Enhancement:
- Sky-T1:
End-to-End Training:
Focuses on training a model that already incorporates strong reasoning capabilities across domains. While it might use some techniques to boost inference, the primary innovation lies in a cost-effective and efficient training process that achieves robust performance out-of-the-box.
The players with strongest tech team, but in stealth mode who only do frontier experimentation.
These are the most terrifying players, such as SSI, Keen technology hiding everything behind.
Looking into Deepseek:
Deepseek: A true technical breakthrough has emerged in which reasoning ability is achieved entirely through reinforcement learning (RL) without any reliance on supervised finetuning. This approach is grounded in a novel mathematical framework—Group Relative Policy Optimization (GRPO)—that employs an objective function to ensure the learning process is stable and gradual rather than chaotic, preventing sudden changes in behavior at each step. For every question, the old policy generates a batch of answers, and rewards are assigned based on the relative improvement of the entire set of answers rather than an absolute right-or-wrong measure; the policy is then updated in favor of answers with a positive advantage. The training follows a simple template, typically formatted as a sequence of a “<think>” followed by an “<answer>,” and an “aha moment” emerges as the system develops self-reflection—re-evaluating its own reasoning and consciously monitoring its thought process. Innovations such as the creation of the R1 finetuning dataset provide concrete examples of reasoning, while guided reflection and verification through RL optimize reasoning for specific tasks using reward signals that combine language consistency with general accuracy. This method extends reasoning to diverse tasks, such as writing, and ultimately employs RL with a neural reward model to favor helpfulness and harmlessness.
Moreover, the approach enables the transfer of reasoning ability from large models to smaller ones by utilizing a base architecture combining LLaMA and Qwen, and applying RL-based distillation from large to small models. Future application models are expected to be smaller and more ubiquitous, and breakthroughs in intelligence will likely require more computational power—RLDS has demonstrated that reasoning ability can be directly learned rather than emerging solely as a consequence of scaling laws.
Physical intelligence:
The training method behind is first, they didn't start from scratch but started with a vision-language model called PaliGemma. This makes the the system understand both visual cues and language commands without learning. Regarding datasets, a very large and varied dataset (about 10,000 hours) collected across seven different robot configurations performing 68 tasks. The effort behind data collection is what makes this model stand out and this is where the major cost went to.
The team adopted ideas from diffusion models to the domain of continuous action generation for robotics. Action generation is not viewed as direct regression or discrete prediction problem, but a continuous denoising process. The method adds controlled noise to target actions, then train the model to learn how to find the right direction to denoise back to target actions. This technique was originally used for image generation for diffusion model but creatively applied for robotics.
Another innovation is in order to have continuous control of robots, gradual refining process is trained with the introduction of time-dependent parameter to control amount of noise added. It means from time to time, the noisy examples is added gradually from very noisy to nearly clean. Teaching the model how to gradually correct from randomness to desired control demand.
Helix Vision-Language-Action (VLA) model:
One neural network without the need to finetune for specific, Multi-robot collaboration, run on lower power consumption edge pair GPUs(2), these altogether makes the shocking breakthrough in robotic foundational model. The dataset they used is a 500 hour high quality, multi-robot, multi-operator dataset of diverse teleoperated behaviors. 500 hours is an extremely small datasets compared to physical intelligence, and other robotic model players which make me guess they use very little compute to train the Helix model.
From the blog, they explained the current state of of teaching robots a single new behavior require huge amount of demonstrations, or expert manual programming. And they dropped the answer I have been searching for months, which is the new scaling law without demanding more data. With Helix, new skills can be specified with language. This to me feels like a major breakthrough in robotics, because researchers can just prompt the robot AI with experiment they want to conduct, then robot can check manual and learn to do it without previous training on the procedures.
There was a hackathon happened in AGI house where a man gave out a challenge of using unsupervised learning to learn movement trajectory from Video data. But with Helix's innovation where they use language directly, it skipped the need to learn from videos. I believe the key innovation of Helix isn't just running on edge devices, but their solution to the "System 1/System 2" problem - having a unified architecture that combines slow, deliberate reasoning with fast, reactive control. This decoupled architecture may be why they can achieve strong results with much less data.
The fact that they created a system that runs entirely on embedded GPUs while maintaining sophisticated capabilities suggests they've made significant optimizations in model efficiency.
Usually, when we do model merging, from distinct architecture or domains, the performance drop. I wonder how helium deliver by having two architecture work together under 1 system.
They mentioned using matching pair from prompt to movement from videos. the training pair is called natural language-conditioned training pairs, we use an auto-labeling VLM to generate hindsight instructions. The VLM processes segmented video clips from the onboard robot cameras, prompted with: "What instruction would you have given the robot to get the action seen in this video?
Comparing the two models
Physical Intelligence adapts diffusion models (typically used for image generation) to robot control, viewing actions as noise that needs to be gradually reduced to reach target behaviors.
Helix uses a two-part system where:
- System 2 (based on a VLM) handles high-level understanding at 7-9Hz
- System 1 translates this into precise motor control at 200Hz
Conclusion: The Evolving Landscape of AI and Robotics
The recent advances in AI reasoning and robotic control systems reveal distinct approaches to solving complex challenges in artificial intelligence. From specialized reasoning models like NovaSky and S1 to breakthrough robotic systems like Physical Intelligence and Helix, we're witnessing a paradigm shift in how AI systems learn and operate in the physical world.
Key Insights
- Efficiency vs. Scale: While traditional approaches relied heavily on massive datasets and computational resources, newer models like Helix demonstrate that architectural innovation can dramatically reduce data requirements. This shift from "more data" to "smarter architecture" represents a fundamental change in AI development strategy.
- Specialized Architecture: The System 1/System 2 approach employed by Helix elegantly solves the dual challenges of high-level reasoning and real-time control by separating concerns while maintaining a unified training process. This decoupling enables both deep understanding and millisecond-level responsiveness without sacrificing either.
- Novel Training Methodologies: The adaptation of techniques from other domains—like using diffusion models for robot control in Physical Intelligence or reinforcement learning for reasoning in Deepseek—shows that cross-pollination of ideas continues to drive the field forward.
- Edge Computing: Running sophisticated AI directly on robots with embedded GPUs, as demonstrated by Helix, marks a crucial step toward autonomous systems that don't rely on cloud connectivity, potentially democratizing access to advanced robotics.
Future Implications
These developments suggest we're entering a new era where AI systems can:
- Reason through complex problems with minimal training examples
- Adapt to new tasks through natural language instructions rather than exhaustive demonstrations
- Collaborate with other AI systems to solve problems beyond the capability of any single system
- Operate independently in environments without reliable network connectivity
The convergence of advanced reasoning capabilities with dexterous physical control systems brings us closer to general-purpose robots that can understand, learn, and adapt to the world in ways previously limited to science fiction. Rather than following pre-programmed routines, these systems can generate novel behaviors on demand and refine them through experience.
As these technologies mature, the focus will likely shift from raw performance metrics to usability, safety, and integration into human environments. The ultimate success of these systems will depend not just on their technical capabilities, but on how effectively they can augment human potential and address real-world challenges. If you want to chat with me for more insights/ is looking to hire AI researcher please send me a DM or email me.
0 comments
Comments sorted by top scores.