Technology & AI

Alibaba’s Qwen Team Introduces Qwen3.7-Plus, Adds Vision, Deep Reasoning, Tool Persuasion, and Autonomous Iteration to the Bailian Platform

Alibaba’s Qwen team has released Qwen3.7-Plus. The model is now available through Bailian’s Alibaba Cloud platform. Bailian is a console that international users access as Model Studio. It provides API services to external developers. The release follows Alibaba’s launch in May of the Qwen3.7 generation.

Qwen3.7-Plus

Qwen3.7-Plus is a large multi-language model. The model understands pictures and videos, as well as written instructions. Its sibling, Qwen3.7-Max, is text-only.

This is a visual understanding, not a generation. The model reads photos and videos; it does not create itself. Alibaba’s photo and video production work resides in different model families.

The Alibaba team describes the release as a step in multimodal hybrid agent technology. An agent is a model that plans and executes steps. Building on image and video understanding, Qwen3.7-Plus adds five capabilities. This is critical thinking, self-organization, tool persuasion, validation and testing, and automatic iteration.

Self-editing means that the model writes and updates its own code. Invoking a tool means that it calls external functions or APIs. Validation and testing means that it processes the output and checks the results. Autonomous iteration means that it loops until the task is completed. Together, they define a model built to perform, not just respond.

The Case for Vision

Qwen3.7-Plus is the multimodal part of the 3.7 family. Its preview has already posted measurable results. In Vision Arena, Qwen3.7-Plus-Preview is ranked #16 overall. That placed Alibaba as the #5 lab in the vision. The model and lab level are different equations.

Vision Arena is a neutral leaderboard managed by LM Arena. Users vote for answers to understand the image in a blind combination. The #16 result sits behind the top US labs, but within the field. For heavy photo work, this is an important signal. Think OCR on a scale, chart reading, or video frame analysis.

Max’s text-only siblings support generational thinking. Max scored 56.6 on the Artificial Analysis Intelligence Index. That was the highest placement of the Chinese model at release.

The Agentic Loop

A clear change in Qwen3.7 is its focus on functionality. The Alibaba team sets up long-lasting business models. Bailian, the hosting platform, adds two relevant pieces.

The first is the Agentic RL (reinforcement learning) method. The platform uses real-world performance feedback to refine the model’s accuracy over time. The second is a set of built-in security mechanisms. This keeps the independent instruments within the set performance limits. That information is important when the agent runs commands or edits files.

Marktechpost Visual Explainer

AI Models · Field Guide
1 / 7

Alibaba Qwen · June 2, 2026

A large multimodal language model with image and video recognitioncritical thinking, and aspects of agency. Available via API on Alibaba Cloud’s Bailian platform, which is accessed internationally as Model Studio.

Use the arrows or swipe to explore →

01 · What is it

A large multimodal language model

  • Multimodalreads images and video, as well as inputting text.
  • Visible understanding, not generation — reads media, not creates it.
  • A multimodal child in text only Qwen3.7-Max.
  • Alibaba describes it as a multimodal hybrid agent technology.

02 · Skills

Five skills apart from seeing

  • Critical thinking – works through problems step by step.
  • Self-organization — writes and reviews his own code.
  • Tool request — calls external functions or APIs.
  • Verification and testing — run the output and check the results.
  • Automatic repetition – loops until the task is finished.

03 · Perspective measurements

Where he stands in the vision

  • Previews are limited #16 absolutely at Vision Arena (LM Arena).
  • That put Alibaba in charge #5 lab in the vision.
  • Model level and lab position are available separate calculations.
  • It supports OCR, chart reading, and video frame analysis.

For reference, our text sibling only scored points 56.6 in the Artificial Analysis Intelligence Index, the highest Chinese model when released.

04 · Agent loop

It is designed for long-lasting operations

  • Bailian can come Agent RL (reinforcement learning) method.
  • It uses real-world synthetic response precision refinement.
  • It’s built in safety tips keep independent tools within limits.
  • That is important when the agent runs commands or edits files.

05 · Guaranteed vs unguaranteed

What we know today

It is confirmed

  • Image and video comprehension
  • Agent feature set
  • Bailian API access
  • Proprietary, API only

Not yet published

  • Public price sheet
  • The size of the content window
  • Output token limits
  • Turn on the weights

06 · Why it is important

Practical learning

  • A The backend of the agent that can see with a single API.
  • It allows the workload to mix images, video, and tools.
  • Leaderboard level shows a promise, not a guarantee.
  • Verify the accuracy of your data before committing.


AI research, news, and engineering signal for engineers and data scientists. Learn more at markettechpost.com.

Key Takeaways

  • Alibaba has released Qwen3.7-Plus, a multimodal model that is now available via API on its Bailian (Model Studio) site.
  • It understands images and video as input – understanding, not production – and adds agent features.
  • Skills include critical thinking, self-organization, tool invocation, validation and testing, and automatic iteration.
  • Its preview ranked #16 in the Vision Arena, making Alibaba the #5 vision lab.

Check it out Technical details. Also, feel free to follow us Twitter and don’t forget to join our 150k+ ML SubReddit and Subscribe to Our newspaper. Wait! are you on telegram? now you can join us on telegram too.

Need to work with us on developing your GitHub Repo OR Hug Face Page OR Product Release OR Webinar etc.? Connect with us


Michal Sutter is a data science expert with a Master of Science in Data Science from the University of Padova. With a strong foundation in statistical analysis, machine learning, and data engineering, Michal excels at turning complex data sets into actionable insights.

Related Articles

Leave a Reply

Your email address will not be published. Required fields are marked *

Back to top button