Press Nest Africa

Menu
  • Home
  • News
  • Business
  • Political Press
  • Government
  • NGOs
  • BRICS Forum
  • Voices / Opinions
Home News Corporate News from Media OutReach Newswire

PolyU develops novel multi-modal agent to facilitate long video understanding by AI, accelerating development of generative AI-assisted video analysis

Media OutReach Newswire by Media OutReach Newswire
June 10, 2025
PolyU develops novel multi-modal agent to facilitate long video understanding by AI, accelerating development of generative AI-assisted video analysis
Share on FacebookShare on Twitter


HONG KONG SAR –
Media OutReach Newswire – 10 June 2025 – While Artificial Intelligence (AI) technology is evolving rapidly, AI models still struggle with understanding long videos. A research team from The Hong Kong Polytechnic University (PolyU) has developed a novel video-language agent, VideoMind, that enables AI models to perform long video reasoning and question-answering tasks by emulating humans’ way of thinking. The VideoMind framework incorporates an innovative Chain-of-Low-Rank Adaptation (LoRA) strategy to reduce the demand for computational resources and power, advancing the application of generative AI in video analysis. The findings have been submitted to the world-leading AI conferences.

A research team led by Prof. Changwen Chen, Interim Dean of the PolyU Faculty of Computer and Mathematical Sciences and Chair Professor of Visual Computing, has developed a novel video-language agent VideoMind that allows AI models to perform long video reasoning and question-answering tasks by emulating humans’ way of thinking. The VideoMind framework incorporates an innovative Chain-of-LoRA strategy to reduce the demand for computational resources and power, advancing the application of generative AI in video analysis.

Videos, especially those longer than 15 minutes, carry information that unfolds over time, such as the sequence of events, causality, coherence and scene transitions. To understand the video content, AI models therefore need not only to identify the objects present, but also take into account how they change throughout the video. As visuals in videos occupy a large number of tokens, video understanding requires vast amounts of computing capacity and memory, making it difficult for AI models to process long videos.

Prof. Changwen CHEN, Interim Dean of the PolyU Faculty of Computer and Mathematical Sciences and Chair Professor of Visual Computing, and his team have achieved a breakthrough in research on long video reasoning by AI. In designing VideoMind, they made reference to a human-like process of video understanding, and introduced a role-based workflow. The four roles included in the framework are: the Planner, to coordinate all other roles for each query; the Grounder, to localise and retrieve relevant moments; the Verifier, to validate the information accuracy of the retrieved moments and select the most reliable one; and the Answerer, to generate the query-aware answer. This progressive approach to video understanding helps address the challenge of temporal-grounded reasoning that most AI models face.

Another core innovation of the VideoMind framework lies in its adoption of a Chain-of-LoRA strategy. LoRA is a finetuning technique emerged in recent years. It adapts AI models for specific uses without performing full-parameter retraining. The innovative chain-of-LoRA strategy pioneered by the team involves applying four lightweight LoRA adapters in a unified model, each of which is designed for calling a specific role. With this strategy, the model can dynamically activate role-specific LoRA adapters during inference via self-calling to seamlessly switch among these roles, eliminating the need and cost of deploying multiple models while enhancing the efficiency and flexibility of the single model.

RelatedPosts

Wegovy® (semaglutide 2.4mg) by Novo Nordisk is now Available in Singapore for Weight Management

GoGoX marks 12th anniversary as co-founder Steven Lam being named World Economic Forum Young Global Leader

Focus on Empowerment: Octa Broker Helps Malaysians Jumpstart Their Careers

From Manila to CUHK: How One Student’s Journey Challenges Traditional Education Paths

VideoMind is open source on GitHub and Huggingface. Details of the experiments conducted to evaluate its effectiveness in temporal-grounded video understanding across 14 diverse benchmarks are also available. Comparing VideoMind with some state-of-the-art AI models, including GPT-4o and Gemini 1.5 Pro, the researchers found that the grounding accuracy of VideoMind outperformed all competitors in challenging tasks involving videos with an average duration of 27 minutes. Notably, the team included two versions of VideoMind in the experiments: one with a smaller, 2 billion (2B) parameter model, and another with a bigger, 7 billion (7B) parameter model. The results showed that, even at the 2B size, VideoMind still yielded performance comparable with many of the other 7B size models.

Prof. Chen said, “Humans switch among different thinking modes when understanding videos: breaking down tasks, identifying relevant moments, revisiting these to confirm details and synthesising their observations into coherent answers. The process is very efficient with the human brain using only about 25 watts of power, which is about a million times lower than that of a supercomputer with equivalent computing power. Inspired by this, we designed the role-based workflow that allows AI to understand videos like human, while leveraging the chain-of-LoRA strategy to minimise the need for computing power and memory in this process.”

AI is at the core of global technological development. The advancement of AI models is however constrained by insufficient computing power and excessive power consumption. Built upon a unified, open-source model Qwen2-VL and augmented with additional optimisation tools, the VideoMind framework has lowered the technological cost and the threshold for deployment, offering a feasible solution to the bottleneck of reducing power consumption in AI models.

Prof. Chen added, “VideoMind not only overcomes the performance limitations of AI models in video processing, but also serves as a modular, scalable and interpretable multimodal reasoning framework. We envision that it will expand the application of generative AI to various areas, such as intelligent surveillance, sports and entertainment video analysis, video search engines and more.”

Hashtag: #PolyU #AI #LLMs #VideoAnalysis #IntelligentSurveillance #VideoSearch

The issuer is solely responsible for the content of this announcement.



Source link

Related Posts

Wegovy® (semaglutide 2.4mg) by Novo Nordisk is now Available in Singapore for Weight Management
Corporate News from Media OutReach Newswire

Wegovy® (semaglutide 2.4mg) by Novo Nordisk is now Available in Singapore for Weight Management

July 12, 2025
GoGoX marks 12th anniversary as co-founder Steven Lam being named World Economic Forum Young Global Leader
Corporate News from Media OutReach Newswire

GoGoX marks 12th anniversary as co-founder Steven Lam being named World Economic Forum Young Global Leader

July 12, 2025
Focus on Empowerment: Octa Broker Helps Malaysians Jumpstart Their Careers
Corporate News from Media OutReach Newswire

Focus on Empowerment: Octa Broker Helps Malaysians Jumpstart Their Careers

July 12, 2025
From Manila to CUHK: How One Student’s Journey Challenges Traditional Education Paths
Corporate News from Media OutReach Newswire

From Manila to CUHK: How One Student’s Journey Challenges Traditional Education Paths

July 12, 2025
The 14th Canon x McDull Inter-school Ink Cartridge Recycling Award Presentation Ceremony Honoring Environmental Education and Green Living
Corporate News from Media OutReach Newswire

The 14th Canon x McDull Inter-school Ink Cartridge Recycling Award Presentation Ceremony Honoring Environmental Education and Green Living

July 12, 2025
MSIG Insurance (Malaysia) Bhd Announces CEO Transition: Mr Chua Retires After 42 Years Of Distinguished Service
Corporate News from Media OutReach Newswire

MSIG Insurance (Malaysia) Bhd Announces CEO Transition: Mr Chua Retires After 42 Years Of Distinguished Service

July 11, 2025
Cloud 11 Collaborates with 30 Global Partners to Establish itself as Asia’s New Creative Destination & Elevate Thailand’s Creative & Entertainment Industry
Corporate News from Media OutReach Newswire

Elevating Thailand’s Creative & Entertainment Industry, Cloud 11 Collaborates with 30 Global Giants and Partners to Establish Itself as Asia’s New Creative Destination

July 11, 2025
Leading Sportswear Brand DESCENTE Opens at Galaxy Macau; Fusing Technical Aesthetics with Leisure & Travel Lifestyle
Corporate News from Media OutReach Newswire

Leading Sportswear Brand DESCENTE Opens at Galaxy Macau; Fusing Technical Aesthetics with Leisure & Travel Lifestyle

July 11, 2025
Next Post
President reaffirms commitment to global diplomacy 

President reaffirms commitment to global diplomacy 

For Trump’s ‘no taxes on tips,’ the devil is in the details

For Trump’s ‘no taxes on tips,’ the devil is in the details

Da Nang to target luxury tourism from Dubai’s elite

Da Nang to target luxury tourism from Dubai’s elite

IT Leaders Forum 2025: Shaping the Future of Business and Technology in Africa

5th Digital Finance Africa hosted by IT News Africa set for 3 July 2025

Dai-ichi Life Group and Capgemini sign multi-year agreement to establish a Global Capability Center in India to drive international digital transformation

Dai-ichi Life Group and Capgemini sign multi-year agreement to establish a Global Capability Center in India to drive international digital transformation

Recommended.

WC Education urges parents to check if the login details are correct for online admissions

WC Education urges parents to check if the login details are correct for online admissions

May 23, 2025
Be Part of History: ONESIAM Proudly Presents the Biggest Pride Celebration Ever Across Three of Bangkok’s Most Iconic Destinations

Be Part of History: ONESIAM Proudly Presents the Biggest Pride Celebration Ever Across Three of Bangkok’s Most Iconic Destinations

May 19, 2025

Trending.

No Content Available

Publish News, Boost Your PR, SEO, and Business Exposure with SagloMedia's Dedicated Brand Sections

Discover More

News Publications

  • EBNewsDaily
  • South African Business News
  • BetsBulletin SA
  • PressNest
  • EconoNews
  • AfricaBiz Watch

Listing Directories

  • MySouthy
  • BizFinder Directory
  • ListBig
  • SA Companies
  • OutingPlace
  • Rental Kings

Quick Links

  • Home
  • Advertise
  • Publications
  • Company News
  • Privacy Policy
  • Copyright & Takedowns

SagloMedia

  • About us
  • Careers
  • Student Program
  • RSS Feeds
  • Press Code
  • Contact Us

Get In Touch

  • info@saglomedia.co.za
  • Tel: +27 10 880 3950
  • WhatsApp: +27 10 880 3950
  • Johannesburg, South Africa
  • SagloMedia
  • www.saglomedia.co.za
Copyright © 2025 | SagloMedia

Saglohost Web Hosting | Web Hosting South Africa | Web Design Johannesburg | Web Design South Africa | Saglotech | Web Design Company | SEO Company South Africa | SEO Company Johannesburg