近年来,扩散模型的兴起为图像、视频以及三维内容的高质量生成注入了强大动能,而具身智能通过在物理和数字世界的学习和进化,展现出了智能体与环境交互的巨大潜力。为此,拉斯维加斯手机娱乐网站 (CSIG)将于2024年11月27日(周三)晚上19:00举办CSIG图像图形技术国际在线研讨会第14期,本期将特别聚焦扩散模型与具身智能领域的最新研究成果。
会议邀请到了三位优秀的学生学者。他们针对多视角图像生成、表面几何估计、神经运动规划等核心难题,进行了深入而系统的探索。在本次报告中,他们将分享各自在计算机图形学顶级会议SIGGRAPH Asia 2024中即将发表的最新研究成果。我们诚挚邀请学术与工业界同行的积极参与,共同探讨扩散模型与具身智能领域的未来发展和应用。
会议时间
2024年11月27日(周三)晚上19:00
讲者简介
You-Cheng Cai is a postdoc in University of Science and Technology of China. He received his PhD degree from Hefei University of Technology. His main recent research focus is on 3D reconstruction, and neural rendering.
Talk title: MV2MV: Multi-View Image Translation via View-Consistent Diffusion Models
Abstract: Image translation has various applications in computer graphics and computer vision, aiming to transfer images from one domain to another. Thanks to the excellent generation capability of diffusion models, recent single-view image translation methods achieve realistic results. However, directly applying diffusion models for multi-view image translation remains challenging for two major obstacles: the need for paired training data and the limited view consistency. To overcome the obstacles, we present a first unified multi-view image to multi-view image translation framework based on diffusion models, called MV2MV. Firstly, we propose a novel self-supervised training strategy that exploits the success of off-the-shelf single-view image translators and the 3D Gaussian Splatting (3DGS) technique to generate pseudo ground truths as supervisory signals, leading to enhanced consistency and fine details. Additionally, we propose a latent multi-view consistency block, which utilizes the latent-3DGS as the underlying 3D representation to facilitate information exchange across multi-view images and inject 3D prior into the diffusion model to enforce consistency. Finally, our approach simultaneously optimizes the diffusion model and 3DGS to achieve a better trade-off between consistency and realism. Extensive experiments across various translation tasks demonstrate that MV2MV outperforms task-specific specialists in both quantitative and qualitative.
Chongjie Ye is a Ph.D. student in GAP Lab at The Chinese University of Hong Kong, Shenzhen, supervised by Prof. Xiaoguang Han. His main recent research focus is on multi-view 3D reconstruction and generative 3D reconstruction.
Talk title: StableNormal: Reducing Diffusion Variance for Stable and Sharp Normal
Abstract: This work addresses the challenge of high-quality surface normal estimation from monocular colored inputs (i.e., images and videos), a field which is recently revolutionized by repurposing diffusion priors. However, these attempts still struggle with high-variance inference, which conflicts with the deterministic nature of Image2Normal task. Our method, StableNormal, aims to reduce the inference variance, thus producing “stable” and “sharp” normal estimates, even under challenging imaging conditions, such as extreme lighting, motion/defocus blur, and low-quality/compressed images. It is also robust against transparent and reflective surfaces, as well as cluttered scenes with numerous objects. Specifically, StableNormal employs a coarseto-fine strategy, which starts with a one-step normal estimator (YOSO) to establish a reliable initial normal, that is relatively coarse, then followed by a semantic-guided refinement process (SG-DRN) that refines the normals to recover geometric details. The effectiveness of StableNormal is demonstrated through competitive performance on standard datasets like DIODE-indoor, iBims, ScannetV2, and NYUv2, and its capability in enhancing various downstream tasks, such as surface reconstruction and normal enhancement, is also showcased. These results evidence that StableNormal retains both the “stability” and “sharpness” necessary for accurate normal estimation. Our StableNormal is a good step to repurpose diffusion priors for deterministic estimation. To democratize this, code and models will be publicly available.
Xujie Shen is a master’s student in the College of Computer Science at Zhejiang University, affiliated with State Key Laboratory of CAD&CG. He received his bachelor’s degree in computer science from Zhejiang University. His research interests include robotics, 3D vision and machine learning.
Talk title: PC-Planner: Physics-Constrained Self-Supervised Learning for Robust Neural Motion Planning with Shape-Aware Distance Function
Abstract: Motion Planning (MP) is a critical challenge in robotics, especially pertinent with the burgeoning interest in embodied artificial intelligence. Traditional MP methods often struggle with high-dimensional complexities. Recently neural motion planners, particularly physics-informed neural planners based on the Eikonal equation, have been proposed to overcome the curse of dimensionality. However, these methods perform poorly in complex scenarios with shaped robots due to multiple solutions inherent in the Eikonal equation. To address these issues, this paper presents PC-Planner, a novel physics-constrained self-supervised learning framework for robot motion planning with various shapes in complex environments. To this end, we propose several physical constraints, including monotonic and optimal constraints, to stabilize the training process of the neural network with the Eikonal equation. Additionally, we introduce a novel shape-aware distance field that considers the robot's shape for efficient collision checking and Ground Truth (GT) speed computation. This field reduces the computational intensity, and facilitates adaptive motion planning at test time. Experiments in diverse scenarios with different robots demonstrate the superiority of the proposed method in efficiency and robustness for robot motion planning, particularly in complex environments.
主持人简介
Zeyu Wang is an Assistant Professor at The Hong Kong University of Science and Technology (Guangzhou). He received a PhD from Yale University and a BS from Peking University. His research interests include computer graphics, human-computer interaction, artificial intelligence, and digital cultural heritage. His research has been recognized by an NSFC Award, a CCF-Tencent Rhino-Bird Fellowship, an Adobe Research Fellowship, a Best Paper Award, and three Honorable Mention Awards.
Copyright © 2025 拉斯维加斯手机娱乐网站 京公网安备 11010802035643号 京ICP备12009057号-1
地址:北京市海淀区中关村东路95号 邮编:100190