Many of the computer vision problems involve processing multiple entities be it objects, shapes, views or scenes. In such cases, graphs a.k.a. networks, are the key data structures for storing and organizing information. Yet, relationships between individual entities that had to be encoded in the edges often remain pairwise, or rather local. One of the most well accepted methods of seeking a global agreement is enforcing cycle-consistency, where the local errors are distributed over the entire graph such that the composition of maps/transforms along the cycles is close to the identity map. This art of consistently recovering absolute quantities from a collection of ratios is known as synchronization. From training generative adverserial networks to geometric structure from motion algorithms, from temporal video understanding to image-to-image translation, this capability of imposing consistency benefits a wide variety of vision tasks. In this tutorial, we first introduce the fundamentals of cycle-consistency and review the broad range of studies that make use of it. Next, we cover different techniques for solving multiview synchronization problems in computer vision, or in other words for achieving cycle consistency. Several techniques including graph theory, combinatorial optimization, Riemannian geometry, spectral decomposition, (non-)convex optimization, and MAP inference will be addressed. We also touch upon recent techniques that jointly optimize neural networks across multiple domains. Besides optimization techniques, we will also discuss the uncertainty and ambiguities inherent either in the data or in the model and show how the existing tools can be augmented to yield this valuable piece of information. We will finally showcase the applications of synchronizing linear/non-linear maps (e.g. functional maps) in multi-view based geometry reconstruction (RGB images or RGBD images), joint analysis of image collections, 3D reconstruction and understanding across multiple domains. Tools and methods presented in this tutorial are beneficial to a large audience as the synchronization is a common technique across several sub-fields of computer vision. Primarily, the target audience involve academicians, graduate students, industrial researchers and other practitioners who are interested in the state-of-the-art techniques for multi-view structure-from-motion, 3D geometry reconstruction, unsupervised map/object discovery, joint learning of neural networks and end-to-end multiview processing.
We have a packed and exciting half day ahead of us! The tutorials are pre-recorded and provided along with the corresponding slides. You can watch them any time. The schedule is for the Q&A session and is given with respect to PST. For attending Q&As we use the zoom-link reserved by CVPR.
Tolga Birdal is a Postdoctoral Research Fellow at the Geometric Computing group of Prof. Leonidas Guibas. He has recently defended his PhD thesis at the Computer Vision Group, Chair for Computer Aided Medical Procedures, Technical University of Munich and was a Doktorand at Siemens AG. He completed his Bachelors as an Electronics Engineer at the Sabanci University in 2008. In his subsequent postgraduate programme, he studied Computational Science and Engineering at Technical University of Munich. In continuation to his Master's thesis on “3D Deformable Surface Recovery Using RGBD Cameras”, he focused his research and development on large object detection, pose estimation and reconstruction using point clouds. Tolga is awarded both Ernst von Siemens Scholarship and EMVA Young Professional Award 2016 for his PhD work. He has several publications at the well respected venues such as NeurIPS, CVPR, ICCV, ECCV, IROS, ICASSP and 3DV. Aside from his academic life, Tolga is involved in entrepreneurship having co-founded multiple companies including Befunky, a widely used web based image processing platform.
In this tutorial, we first introduce the fundamentals of cycle-consistency and review the broad range of studies that make use of it. Next, we cover different techniques for solving multiview synchronization problems in computer vision, or in other words for achieving cycle consistency. Several techniques including graph theory, combinatorial optimization, Riemannian geometry, spectral decomposition, (non-)convex optimization, and MAP inference will be addressed. Besides optimization techniques, we will also discuss the uncertainty and ambiguities inherent either in the data or in the model and show how the existing tools can be augmented to yield this valuable piece of information. We will finally showcase the applications of synchronizing linear/non-linear maps. Tools and methods presented in this tutorial are beneficial to a large audience as the synchronization is a common technique across several sub-fields of computer vision.
Qixing Huang is an assistant professor of Computer Science at the University of Texas at Austin. He obtained his PhD in Computer Science from Stanford University. He was a research assistant professor at Toyota Technological Institute at Chicago before joining UT Austin. Dr. Huang's research spans the fields of computer vision, computer graphics, and machine learning, and publishes extensively in venues such as SIGGRAPH, CVPR, ICCV, ECCV, NeuriPS, ICML, and etc. In particular, his recent focus is on developing machine learning algorithms (particularly deep learning) that leverage Big Data to solve core problems in computer vision, computer graphics and computational biology. He is also interested in statistical data analysis, compressive sensing, low-rank matrix recovery, and large-scale optimization, which provides theoretical foundation for his research. He also received the best paper award at the Symposium on Geometry Processing 2013, the best dataset award at the Symposium on Geometry Processing 2018, and the most cited paper award of Computer-Aided Geometric Design in 2010 and 2011. He was an area chair for CVPR 2019 and CVPR 2020.
A fundamental problem in synchronization is to develop efficient computational formulations for the combinatorial constraints, including the cycle-consistency constraint for undirected graphs and the path-invariance constraint for directed graphs. This talk discusses advances on this topic during the past two decades, from greedy combinatorial approaches, constrained matrix recovery formulations, to very recent methods that utilize cycle-consistency bases and path-invariance bases joint learning of neural networks. We focus on the interplay between continuous optimization and graph theoretical approaches, exact and robust recovery conditions, and connections between synchronization and symmetries. We conclude the talk with a list of future directions.
Federica Arrigoni received her MS degree in Mathematics from the University of Milan (Italy) in 2013, and the PhD degree in Industrial and Information Engineering from the University of Udine (Italy) in 2018. Her PhD thesis titled “Synchronization Problems in Computer Vision” was awarded from the Italian Association for Computer Vision, Pattern Recognition and Machine Learning (CVPL) in 2018 and from the University of Udine in 2019. From 2018 to 2020 she was a postdoctoral researcher with the Czech Institute of Informatics, Robotics, and Cybernetics (CIIRC) of the Czech Technical University in Prague. She is currently an assistant professor with the Department of Information Engineering and Computer Science of the University of Trento (Italy). Her research focuses on geometric problems in computer vision, including structure from motion, 3D registration, multi-image matching and motion segmentation.
The “synchronization” problem is traditionally defined as the task of recovering unknown group elements (represented as nodes in a graph) starting from a set of pairwise ratios/differences (represented as edges in a graph). Particularly interesting is the case where the group admits a matrix representation (e.g., rotations/rigid-motions/permutations), where the unknown elements can be recovered via spectral decomposition. The spectral solution is very general and can be also applied to situations where the group structure is missing, such as the case of binary matrices, which appear in the context of motion segmentation. A tightly related problem is “bearing-based localization”, where the task is to compute the position of nodes in a graph starting from a set of pairwise directions. Recovering camera positions in a structure-from-motion system is an instance of bearing-based localization in 3D. This problem is well posed when the underlying graph is “parallel rigid”. An interesting formulation of localizability can be found by imposing “cycle-consistency”, namely the property that the sum of pairwise directions (weighted with unknown scales) along any cycle in the graph is zero.
Leonidas Guibas is the Paul Pigott Professor of Computer Science (and by courtesy, Electrical Engineering) at Stanford University, where he heads the Geometric Computation group. Dr. Guibas obtained his Ph.D. from Stanford University under the supervision of Donald Knuth. His main subsequent employers were Xerox PARC, DEC/SRC, MIT, and Stanford. He is a member and past acting director of the Stanford Artificial Intelligence Laboratory and a member of the Computer Graphics Laboratory, the Institute for Computational and Mathematical Engineering (iCME) and the Bio-X program. Dr. Guibas has been elected to the US National Academy of Engineering and the American Academy of Arts and Sciences, and is an ACM Fellow, an IEEE Fellow and winner of the ACM Allen Newell award and the ICCV Helmholtz prize. He is also a recent recipient of a five-year DoD Vannevar Bush Faculty Fellowship.
This presentation will introduce the machinery of functional maps between images or between 3D shapes. Traditional maps are pixel-to-pixel or point-to-point maps -- while functional maps are generalizations, mapping between function spaces defined over images or over 3D shapes and containing the former as special cases. As such, they are dual objects but come with two key advantages: (1) they are always linear mappings, allowing us to use many powerful tools from linear algebra and optimization, and (2), they enable us to express complex mappings compactly through the use of hierarchical function bases on the underlying objects. Fundamentally, functional maps act as information transporters -- since many important semantic properties of images or shapes, such as features or parts, can be encoded as functions. When we have many related images or 3D shapes connected via functional maps into a network, we can explore the consistency of information transport by following different paths in the network. Consistent functional maps are said to be synchronized. We describe algebraic conditions for synchronization and algorithms that can take noisy initial maps and improve their consistency -- and in the process also improve their quality in transferring semantic knowledge. We show a detailed example for the case of the image or 3D shape co-segmentation problem. We also describe applications to understanding shape differences and the analysis of shape collections.