Revolutionize Your Business with AI — Unveiling the Future of Tech

Hugging Face Launches Smol2Operator: Turning Small VLMs into GUI-Operating Agents

Smol2Operator makes it easy to turn small vision-language models into GUI-operating agents. It reduces engineering overhead and unifies action spaces for coherent training.

, and Administrator

2025 October 2 . 10:09 PM

1 min read

Here we can see a man in the middle speaking something and behind him we can see a chair and the... — Here we can see a man in the middle speaking something and behind him we can see a chair and the banner of TEDx and beside that we can see a projector screen

Hugging Face Launches Smol2Operator: Turning Small VLMs into GUI-Operating Agents

Hugging Face has launched Smol2Operator, a practical guide turning small vision-language models (VLMs) into GUI-operating, tool-using agents. The release aims to lower barriers for developers to build operator-grade agents, not chasing leaderboard peaks.

Smol2Operator uses a two-phase post-training process over a small VLM to instill perception and agentic reasoning. It reduces engineering overhead and simplifies reproducing agent behavior with small models. The release includes data transformation utilities, training scripts, transformed datasets, and a 2.2B-parameter model checkpoint. Notably, it unifies the action space across heterogeneous sources, enabling coherent training across datasets.

The pipeline normalizes disparate GUI action taxonomies and coordinates them into a single, consistent function API. Smol2Operator slots into the smolagents runtime with ScreenEnv for evaluation. The release includes technical details and a full collection on Hugging Face, with a final checkpoint and a demo Space targeting process transparency and portability.

Hugging Face's Smol2Operator provides a reproducible, end-to-end recipe for turning small vision-language models into GUI-operating agents. It focuses on practicality and ease of use, reducing engineering overhead and unifying action spaces. The release includes essential resources for developers to build and evaluate operator-grade agents.

Latest

In this image there is a painting on the wall on which we can see there is a watch with some...

Smart-home-devices

Louis Vuitton Revives Classic Monterey Watch After 33 Years

The iconic Monterey returns after 33 years. This timepiece blends Louis Vuitton's heritage with modern watchmaking.

, and Administrator

2025 October 9

In this image on both sides there are buildings, electric poles. There are few vehicles parked in...

Climate change

Apple Invests €100m in Schroders' China Renewable Energy Strategy

Apple's significant investment in China's renewable energy sector signals growing global interest. This move could accelerate China's transition to cleaner energy, reducing global emissions and fossil fuel demand.

, and Administrator

2025 October 9

In this image, we can see an advertisement contains robots and some text.

Revolutionize Your Business with AI

Confluent Explores Sale Amidst Private Equity and Tech Interest

Confluent's robust streaming software draws interest from private equity and tech companies. A sale could benefit shareholders, but no deals are final yet.

, and Administrator

2025 October 9

In the image there is an insect on a web and the background is blurry.

Strengthen Your Digital Fortunes

UK's NCA Launches 'Power Off' Operation to Combat Cybercrime

The NCA's innovative 'Power Off' operation is using fake DDoS-for-hire sites to catch cybercriminals. It's already led to arrests in the UK and the US.

, and Administrator

2025 October 9

Hugging Face Launches Smol2Operator: Turning Small VLMs into GUI-Operating Agents

Hugging Face Launches Smol2Operator: Turning Small VLMs into GUI-Operating Agents

Read also:

Related

Latest