Zizhang Li

I am a Master(Sep. 2021 - ) student in Control Science and Engineering department at Zhejiang University. I belong to April Lab, advised by Prof. Yong Liu. I obtained my B.Eng (Sep. 2017 - Jun. 2021) from the same department with an honor degree at Chu Kochen Honor College.

Currently, I'm a visiting researcher in Stanford University and working with Shangzhe Wu and Prof.Jiajun Wu. I also work very closely with Prof.Yiyi Liao at Zhejiang University.

When I was an undergraduate, I was a research intern in foundamental vision group at Sensetime, working with Chenxin Tao, Xizhou Zhu and mentored by Jifeng Dai. I was a remote research intern at CCVL of Johns Hopkins University, mentored by Weichao Qiu and Prof. Alan Yuille.

Email  /  Google Scholar  /  Github  /  Twitter

profile photo

Photo taken by Kechun Xu.


Recent News
    I'm looking for PhD to start in Fall 2024. Please feel free to contact me if you have any leads!!
  • Two papers accepted to ICCV2023.
  • Two papers accepted to ICRA2023.
  • One paper accepted to ECCV2022.
  • One paper accepted to CVPR2022.
  • One paper accepted to BMVC 2021.
  • One paper accepted to NeurIPS 2021.

Publications
    My current research interests lie in 3D reconstruction, neural rendering and computer vision. I'm particularly interested in inferring the 3D properties of object/scene from 2D image collections. Representative papers are highlighted.
    * indicates equal contributions.
RICO: Regularizing the Unobservable for Indoor Compositional Reconstruction
Zizhang Li, Xiaoyang Lyu, Yuanyuan Ding, Mengmeng Wang, Yiyi Liao, Yong Liu
ICCV, 2023
arXiv / code

We investigate the existing problems in SDF-based object compositional reconstruction under the partial observation, and propose different regularizations following the geometry prior to reach a clean and water-tight disentanglement.

Learning a Room with the Occ-SDF Hybrid: Signed Distance Function Mingled with Occupancy Aids Scene Representation
Xiaoyang Lyu, Peng Dai, Zizhang Li, Dongyu Yan, Yi Lin, Yifan Peng, Xiaojuan Qi
ICCV, 2023
Project page / arXiv / code

We study and analyze several key observations in indoor scene SDF-based volume rendering reconstruction methods. Upon those observations, we push forward an Occ-SDF hybrid representation for better reconstruction performance.

A Joint Modeling of Vision-Language-Action for Target-oriented Grasping in Clutter
Kechun Xu, Shuqi Zhao, Zhongxiang Zhou, Zizhang Li, Huaijin Pi, Yifeng Zhu, Yue Wang, Rong Xiong
ICRA, 2023
arXiv / code

We propose to jointly model vision, language and action with object-centric representations for the task of language-conditioned grasping in clutter.

Failure-aware Policy Learning for Self-assessable Robotics Tasks
Kechun Xu, Runjian Chen, Shuqi Zhao, Zizhang Li, Hongxiang Yu, Ci Chen, Yue Wang, Rong Xiong
ICRA, 2023
arXiv

We investigate the dependency between the self-assessment results and remaining actions by learning the failure-aware policy, and propose two policy architectures.

E-NeRV: Expedite Neural Video Representation with Disentangled Spatial-Temporal Context
Zizhang Li, Mengmeng Wang, Huaijin Pi, Kechun Xu, Jianbiao Mei, Yong Liu
ECCV, 2022
arXiv / code

We investigate the architecture of frame-wise implicit neural video representation and upgrade it by removing a large portion of redundant parameters, and re-design the network architecture following a spatial-temporal disentanglement motivation.

Learning Part Segmentation through Unsupervised Domain Adaptation from Synthetic Vehicles
Qing Liu, Adam Kortylewski, Zhishuai Zhang, Zizhang Li, Mengqi Guo, Qihao Liu, Xiaoding Yuan, Jiteng Mu, Weichao Qiu, Alan Yuille
CVPR, 2022, oral
arXiv / code

We construct a synthetic multi-part dataset with different categories of objects, evaluate different part segmentation UDA methods with this benchmark, and also provide an improved baseline.

MaIL: A Unified Mask-Image-Language Trimodal Network for Referring Image Segmentation
Zizhang Li*, Mengmeng Wang*, Jianbiao Mei, Yong Liu
arxiv, 2021
arXiv

We propose to regard the binary mask as a unique modality and train the tri-modal embedding space on top of ViLT for referring segmentation task.

Searching for TrioNet: Combining Convolution with Local and Global Self-Attention
Huaijin Pi, Huiyu Wang, Yingwei Li, Zizhang Li, Alan Yuille
BMVC, 2021
arXiv / code

We propose a weight-sharing NAS method to combine convolution, local and global self-attention operators.

Searching Parameterized AP Loss for Object Detection
Chenxin Tao*, Zizhang Li*, Xizhou Zhu, Gao Huang, Yong Liu, Jifeng Dai
NeurIPS, 2021
arXiv / code

We transform the non-diffrentiable AP metric to differentiable loss function by utilizing Bezier curve parameterization. We further use PPO to search the parameters and show improved performance of the PAP loss on various detectors.


Services
  • Reviewer of 3DV, AAAI, BMVC, CAI, CVPR, ECCV, ICCV, ICLR, NeurIPS.

Template thanks to Jon Barron