AI Lab Deepseek may get the technology industry attention this week. But one of its most important local competitors, Ali Baba, does not sit idly by.
QWEN team in Alibaba on Monday Absolute A new family of artificial intelligence models, QWEN2.5-VL, can perform a number of text and images analysis tasks. Models can analyze files, understand videos, and calculate objects in photos, as well as control a computer – similar to the recently launched operator in Openai.
According to the QWEN team, the best QWEN2.5-VL model outperforms GPT-4O from Openai, Claude 3.5 Sonnet, and Google Gemini 2.0 on a group of video understanding, mathematics, document analysis, and transmission assessments.
QWEN2.5-VL, which is available for testing at alibaba’s QWEN Chat The application and download From AI Dev Platform Face face, graphics and graphics can be analyzed, data extract from bills and models bills, and “understanding” multiple videos long hours, says QWEN. QWEN2.5-VL can also recognize “IPS of cinematic and television series, as well as a wide range of products.” For each team – This indicates that the models have been partially trained in copyright works.
QWEN2.5-VL, which was developed by a Chinese company, has some restrictions on the topics that you will discuss-at least in QWEN Chat. When I asked about the largest and most of QWEN2.5-VL, QWEN2.5-VL-72B, to talk about “Xi Jinping”, QWEN CHAT gave an error message.
Internet organizer in China Standards Several models have been developed in the country to ensure their responses “embodying basic socialist values”. a lot Chinese artificial intelligence systems refuse to respond to topics that may anger the organizers, such as Taiwan’s independence.
One of the most interesting QWEN2.5-VL features is its ability to interact with software-both computers and mobile devices. A video posted on X by Philipp Schmid, which is a technical progress in Huging Face, QWEN2.5-VL that launches the Booking.com app for Android and book a trip from Chongqing to Beijing.
Do not miss alibaba_qwen 2.5 VL! Despite all the deep noise, QWEN dropped the best multimedia! QWEN 2.5 VL is a vision language model that can control your computer, similar Openai Operator, extracting organized information from the plans, and more !!
Tl; dr;
3⃣ … pic.twitter.com/geegvdl0tiPhilip Schmid (_philschmid) January 27, 2025
In the video below, the QWEN2.5-VL model controls Linux-but it does not seem to make a lot of switching. Perhaps heavily, QWEN criteria showcase the QWEN2.5-VL recording badly on Osworld, a standard that tries to imitate a real computer environment.
LMAO QWEN 2.5 VL can perform a computer, out of the box, with Openai Operator! 🐐 pic.twitter.com/lwmecxznsu
– Vaibhav (VB) Srivastav (Reach_VB) January 27, 2025
The smaller and less developed model is available in the QWEN2.5-VL series, QWEN2.5-VL-3B and QWEN2.5-VL-7B, under a Massahi license. However, the QWEN2.5-VL-72B pioneering under the customized alibaba license, which requires companies and Devs with more than 100 million users per month requesting permission from QWEN/alibaba before publishing the model commercially.