On-the-Fly SfM: What you capture is What you get
Zongqian Zhan, Rui Xia*, Yifei Yu, Yibo Xu, Xin Wang*
School of Geodesy and Geomatics, Wuhan University, Wuhan, P.R.China
The proposed on-the-fly SfM Hardware

Abstract
Over the last decades, ample achievements have been made on Structure from motion (SfM). However, the vast majority of them basically work in an offline manner, i.e., images are firstly captured and then fed together into a SfM pipeline for obtaining poses and sparse point cloud.

In this work, on the contrary, we present an on-the-fly SfM: running online SfM while image capturing, the newly taken On-the-Fly image is online estimated with the corresponding pose and points, i.e., what you capture is what you get.

More specifically, our approach firstly employs a vocabulary tree that is unsupervised trained using learning-based global features for fast image retrieval of newly fly-in image.

Then, a robust feature matching mechanism with least squares (LSM) is presented to improve image registration performance.

Finally, via investigating the influence of newly fly-in image’s connected neighboring images, an efficient hierarchical weighted local bundle adjustment (BA) is used for optimization.


Download

Paper(PDF,707KB)
Code(github)
Video
Experiments
In this section, we report extensive experimental results on various datasets to demonstrate the capability of “what you capture is what you get” for our on-the-fly SfM.

All experiments are run on the machine with 16 CPU processors and RTX3080 GPU.
1.Datasets
All the datasets used in our experiments are shown in the following Table:
Name Image Num Source
SX 221 Self-captured(Click me to download)
YX 349
fr1_desk 613 Tum (Sturm and Engelhard, 2012)
fr1_xyz 798
fr3_st_far 938

2.Running Parameters
In this work, some free parameters are empirically set.

For the online image matching, the vocabulary tree is with 5-layer depth and 5 sub-clusters for each node.

Each new fly-in images selects Top-30 similar images for subsequent matching. The small local window in LSM is set as 15*15 pixels.

For efficient BA, as each image in the ripple has top-N candidate images which might return a large BA block, only top-8 similar images are considered. The constant weighting parameter k = 2 in all experiments.

3.Performance of fast image retrieval
Based on SX and fr3_st_far, we investigate three different image matching strategies: exhaustive matching using Colmap with default setting (EM), exhaustive Euclidean comparison using learning-based global feature (Hou and Xia, 2023) (EE) and our on-the-fly SfM (Ours)

Here is time consuming result on fr3_st_far.
Here is overlapping graph of SX.
Ours EE EM
Vertical and horizontal axis are image ID. The darker red the pixel is, the higher possibility the corresponding image pair overlaps with each other.

4.Performance of efficient local bundle adjustment
To demonstrate the efficacy of the local bundle adjustment in our on-the-fly SfM, three bundle adjustment solutions are compared:

(1) a global bundle adjustment that enrolls all images is performed (Glo.)
(2) a combined solution integrated with local and global bundle adjustment (Com.)
(3) local bundle adjustment with hierarchical weights (Ours).

Based on fr3_st_far, here is the cost time of bundle adjustment as the number of images changes
For the quality of our local bundle adjustment solution, we choose three indicators: averaging mean reprojection error of each BA (AMRE), mean reprojection error of final BA (MFRE) and mean track length (MLT).

The result of fr3_st_far are shown below:

5.On-the-fly performance of our SfM
The table below presents the average processing time for all images of each dataset, in particular, several key procedures are reported: image transmission (IT), feature extraction (FE), online image matching (OIM), two-view geometric verification (GV), Image registration (IR), Triangulation (Tri.) and bundle adjustment (BA).
COST TIME OF EACH CORE STAGE IN OURS SFM (MS)
Dataset SX YX fr1_desk fr1_xyz fr3_st_far
NoI 221 349 613 798 938
FE 617 625 157 155 172
OIM 1282 1391 872 911 1247
GV 1285 951 168 187 359
IR 91 72 41 56 72
Tri. 158 171 116 29 60
BA 190 131 74 184 198
Total 3623 3341 1428 1522 2108
IT 4200 4400 3500 3500 3500

Acknowledgement
This work was jointly supported by the National Science Foundation of China (No. 61871295, 42301507) and Natural Science Foundation of Hubei Province, China (No. 2022CFB727) and ISPRS Initiatives 2023.

Reference
J. Sturm, N. Engelhard, F. Endres, W. Burgard, and D. Cremers, “A benchmark for the evaluation of RGB-D SLAM systems,” in IEEE International Conference Intelligent. Robots System, 2012, pp. 573–580.

Q. Hou, R. Xia, J. Zhang, et al., “Learning visual overlapping image pairs for SfM via CNN fine-tuning with photogrammetric geometry information,” in International Journal of Applied Earth Observations and Geoinformation, 2023, 103162.

Y. Yue, X. Wang and Z. Zhan, “Single-Point Least Square Matching Embedded Method for Improving Visual SLAM,” in IEEE Sensors Journal, 2023, pp. 16176-16188.

About us
If you have any questions or advice, you can contact us via following address: