2026 7th International Conference on Computer Vision, Image and Deep Learning (CVIDL 2026)


Speakers

彭宇新.png

Prof. Yuxin Peng

Peking University

IEEE/CCF/CAAI/CIE/CSIG Fellow, National Science Fund for Distinguished Young Scholars

Yuxin Peng, IEEE/CCF/CAAI/CIE/CSIG Fellow, is the Boya Distinguished Professor at Wangxuan Institute of Computer Technology, Peking University. He was a recipient of the National Science Fund for Distinguished Young Scholars of China in 2019 and its continued funding in 2025. He received the Ph.D. degree in computer application technology from Peking University, Beijing, China, in 2003. His research interests mainly include multimedia analysis, computer vision and artificial intelligence. He has authored over 260 papers, including more than 170 papers in top-tier journals and conference proceedings. He has been granted 40 invention patents. He led his team to win the First Place in the video semantic search evaluation of TRECVID ten times. He won the First Prize of the Beijing Science and Technology Award in 2016 and the First Prize of the Scientific and Technological Progress Award of the Chinese Institute of Electronics in 2020 as the lead recipient. He was a recipient of the Best Paper award at MMM 2019 and NCIG 2018, and serves as the associate editor of IEEE TMM, TCSVT, etc.

Title: Fine-Grained Understanding and Physically-Grounded Generation

Abstract: Multimodal large language models (MLLMs) and diffusion models, as two representative types of foundation models, have demonstrated strong capabilities in visual content understanding and generation respectively, but they also face important challenges. For visual content understanding, MLLMs struggle to recognize fine-grained categories of real-world objects; for visual content generation, diffusion models have difficulty generating visual content that conforms to real-world physical laws. To address these challenges, this report first introduces our recent research progress in fine-grained recognition with MLLMs, hierarchical recognition based on fine-grained trees, and physics-driven video generation. Then, we present our latest advances in two application scenarios: aesthetic understanding and virtual try-on. Finally, we discuss the application of visual content understanding and generation technologies in the era of foundation models, and provide an outlook on future research directions for MLLMs and diffusion models.



黄华.png

Prof. Hua Huang

Beijing Normal University

Dean School of Artificial Intelligence, National Science Fund for Distinguished Young Scholars

Hua Huang, Professor and Dean School of Artificial Intelligence, Beijing Normal University. Director of the Engineering Research Center of Intelligent Technology and Educational Application, Ministry of Education, and Director of the Beijing Key Laboratory of Educational Artificial Intelligence. His research mainly focuses on image and video processing and artificial intelligence. He has published more than 100 papers in CCF Class A journals/conferences, and has been granted over 60 national invention patents. Part of his research achievements have been applied in industry, national defense, and the Internet sector. He concurrently serves as: Executive Director of the China Computer Federation (CCF), the China Society of Image and Graphics (CSIG), and the Chinese Association of Automation (CAA), Associate Editor of Computer-Aided Design & Computer Graphics and Journal of Electronics & Information Technology.


Title: Imaging Enhancement Techniques for Photoelectric Detectors

Abstract: 

Photoelectric detectors are the visual front end of modern sensing systems and play an important role in many fields. However, their imaging quality is often degraded by the combined effects of environmental conditions, device fabrication, and underlying physical mechanisms. These coupled degradation sources greatly limit detection performance and efficiency. Traditional imaging enhancement methods usually rely on a single statistical model or black-box restoration schemes. As a result, they often fail to capture the physical nature of image degradation and are also difficult to deploy efficiently on edge hardware.

In this talk, we will present our systematic thinking and practical exploration in imaging enhancement for photoelectric detectors. We will introduce a research framework featuring hierarchical understanding of degradation mechanisms, physics-driven restoration for different degradation types, and joint algorithm-hardware deployment. The proposed techniques have been applied to several representative scenarios, including uncooled infrared imaging, low-light visible imaging, and broadband multispectral imaging, and have significantly improved the performance and efficiency of photoelectric detectors.


袁晓如.png

Prof. Xiaoru Yuan

Peking University

Yuan Xiaoruis a researcher at the SIST, Boya Distinguished Professor of Textbook Development at Peking University, and Executive Deputy Director of the National Engineering Laboratory for Big Data Analysis and Application Technology. His research focuses on visualization and visual analytics and interdisciplinary collaboration with history and the humanities. He serves as a member of the Steering Committees of ChinaVis, IEEE VIS, and PacificVis. He is also the Honorary Director of the Technical Committee on Visualization and Visual Analytics, China Society of Image and Graphics.


Title: Provenance Computation of Historical and Cultural Data

Abstract: This talk aims to explore the core pathways for the deep integration of humanities research and AI, focusing on how to systematically construct foundational datasets tailored to humanities questions and design algorithm models accordingly. Through typical case studies such as the construction of a dataset of illustrations from ancient books, the analysis of spatiotemporal evolution of Chinese calligraphy styles, and the phylogenetic study of painted pottery decorations, we demonstrate key techniques for extracting deep features and correlations from unstructured cultural data. We emphasize the core role of visualization. It serves not only as the final presentation of results but also as a tool that builds a closed loop of human-computer collaborative interpretability. Furthermore, I will introduce the summer school practice collaboring with design institutions exploring specific pathways to advance the provenance computation of historical and cultural data through multidisciplinary integration.


刘敏.png

Prof. Min Liu

Hunan University

National Science Fund for Distinguished Young Scholars

Liu Min, a secondary professor at Hunan University and Party Committee Secretary of the School of Artificial Intelligence and Robotics, is a recipient of the National Outstanding Youth Science Fund, a Youth Changjiang Scholar of the Ministry of Education, and the lead scientist of the National Key R&D Program. He holds a bachelor's degree from Peking University and a Ph.D. from the University of California, Riverside. He serves as the Deputy Director of the Hunan Provincial Automation Society, Director of the Key Laboratory of Advanced Manufacturing Vision Inspection and Control Technology in the Machinery Industry, and Vice Director of the Youth Working Committee of the China Image and Graphics Association.

Title:Preliminary exploration of embodied surgical robots

Abstract:The breakthrough and comprehensive intelligent transformation and upgrading of core technologies in high-endmedical equipment such as surgical robots is a major national strategic task aimed at the forefront of world technology, major national needs, and people's lives and health. It provides decisive guarantee and strong support for breaking the technological monopoly of high-end digital medical equipment in Europe and America. The existing surgical robots lack an effective multimodal surgical target collaborative perception system and have high requirements for doctor operation, which seriously restricts their promotion and application in emergency response to major national emergencies such as national defense security and epidemic disasters. Embodied intelligence builds a closed-loop interaction mechanism of "perception cognition action", enabling surgical robots to understand the surgical environment, adapt to complex scenes, and make intelligent decisions like human doctors. This is the key path to achieving a leap in their autonomous capabilities. In response to the challenging issues mentioned above, this lecture provides an in-depth introduction to the basic principles and key methods of multimodal perception of surgical robots from preoperative, intraoperative to postoperative stages. It also showcases some preliminary progress made by our team in autonomous operation of surgical robots driven by embodied intelligence, providing important guarantees for reducing medical accidents in China.