Document
GUI Agents with Foundation Models: Data Resource, Framework and Application
Tutorial @ IJCAI 2025
[Survey]    [Tutorial]   

Summary

The rapid advancement of foundation models like large vision language models (VLMs) has paved the way for intelligent agents capable of autonomously interacting with Graphical User Interfaces (GUIs). This tutorial provides a comprehensive overview of the latest innovations in GUI agents and influential work across data resource, framework, and application.

Schedule

IJCAI GUI Agent Tutorial Schedule

  • 2:30 PM – 3:00 PM
    The Development of LLM-based GUI Agents
    A comprehensive overview of the evolution of LLM-based GUI agents, from early developments to their current advancements.
  • 3:00 PM – 3:30 PM
    Our Collaborative Research in GUI Agents
    An introduction to our team's latest research and contributions to the field of GUI agents.
  • 3:30 PM – 4:00 PM
    Coffee Break
  • 4:00 PM – 4:30 PM
    Key Methods in GUI Agent Framework and Evaluation
    Exploring the foundational methods used in the design of GUI agent frameworks and their evaluation approaches.
  • 4:30 PM – 5:00 PM
    From SFT to RL: Enhancing GUI Agents with GRPO
    A discussion on how reinforcement learning is applied to optimize the performance of GUI agents.

Tutorial Organizers

 

Shuai Wang

Technological Expert

Huawei Noah's Ark Lab

 

Kaiwen Zhou

Technological Expert

Huawei Noah's Ark Lab

 

Rui Shao

Professor

Harbin Institute of Technology (Shenzhen)

 
 

Gongwei Chen

PostDoc

Harbin Institute of Technology (Shenzhen)

 

Yuqi Zhou

Ph.D. Candidate

Renmin University of China