GSIS特邀论文|ISPRS主席Christian Heipke:深度学习与摄影测量和遥感学科的结合
为进一步深刻了解并分析深度学习与摄影测量和遥感结合的广度、深度和未来发展,GSIS特邀国际摄影测量与遥感学会(ISPRS)主席Christian Heipke教授撰写了Deep learning for geometric and semantic tasks in photogrammetry and remote sensing,总结了摄影测量学和遥感学的深度学习基础,并举例说明了汉诺威莱布尼茨大学利用深度学习正在开展的几何任务(共轭点对的检测、描述和匹配、,三维表面重建),航空影像自动分析(土地覆盖和土地利用分类、迁移学习和弹坑探测)以及近景领域(汽车相对姿态的识别、行人检测和跟踪、文化遗产文献标准化)的应用。
在文章最后,作者对深度学习和摄影测量和遥感结合的未来进行了描述。虽然深度学习在摄影测量和遥感领域有非常广泛的应用,但本质上,CNN(以及任何深度学习方法)都是分类器,深度学习 “cannot learn the unseen”。
In principle, a CNN can be considered a classifier.
In traditional classifiers (random forests, support vector machines, conditional random fields, maximum likelihood estimation, etc.) features representing the different classes are extracted from the data set in a pre-processing step, and classification is then performed based on these features. It is clear then that the results can only be as good as the selected features.
CNN overcome this problem by learning the features together with the corresponding label for each data sample.
The strength of CNN is the combined estimation of the feature representation and the labels during classification, and it seems that deeper networks are practically guaranteed to yield better results than shallow networks, as long as enough training data is available.
A CNN needs a sufficient number of representative training data, well balanced with respect to the related classes. Otherwise there is a risk of overfitting the classifier to the training data and a bias is likely to be introduced into the results.
To increase the amount of training data, data augmentation, transfer learning, approaches which are able to tolerate a certain amount of incorrect labels (label noise), semi-supervised and unsupervised learning (clustering) can be employed and should be studied.
In some cases, simulation techniques may also help.
A CNN “cannot learn the unseen”, the generalization capabilities are limited to previously seen training data.
Incremental learning and forgetting (or “unlearning”) data, e.g. those which are not relevant anymore due to a changing environment, is a topic which has received little attention in our field so far, yet this area offers a large potential, in particular for multi-temporal analysis.
A number of design decisions need to be taken, e.g. with respect to the network architecture and the design of the loss function. It is not clear in general, how different choices influence the results, and how robust the classifiers are. Some works suggest that CNN can be indeed be fooled relatively easily.
A CNN is based on correlations of different data sets. We argue that understanding a task to then reason about possible solutions in a way humans do is far beyond the scope of the currently employed methods (note that this does not mean that reasoning is not done, e.g. in a game of chess or Go. It does mean, however, that CNN does not have an intuition for possibly correct solutions and abstract deductive learning).
A CNN is largely a black box. While it may deliver very good results, it is largely unknown why and how exactly these results are being reached. Besides being a little frustrating from a scientific point of view, this means that the limitations of these methods cannot clearly be stated, resulting in some doubts whether the methods can be employed in real-world safety- and security-related areas – autonomous driving is a good example.
Thus, it seems that a number of difficult research questions still exist in our field.
Besides taking care of a better geometric and semantic accuracy of the results, improving their reliability is of great importance. This will only be possible by investigating better ways to explain why deep learning approaches give the results they do.
Another important aspect is the integration of deep learning approaches with other learning paradigms and prior knowledge, according to the motto, “Why learn what we already know?”.
So far, the approaches discussed in this paper are mainly stand-alone solutions. We believe that in the long run, only a combination of different methods will lead to success.
图3 标准分类器(顶部)和CNN分类器(底部)的概念。后者的优点是可以同时从训练数据中学习特征和模型参数。
Figure 3. Concept of a standard classifier (top) and a CNN classifier (bottom). The advantage of the latter is that the features and the model parameters are learned simultaneously from the training data.
Figure 4. Architecture of a typical Convolutional Neural Network for image analysis. The figure shows the successive steps of convolution and pooling to generate a feature vector which is classified in the final step, typically using the softmax classifier (the non-linear activation function is not depicted).
图5 U-net体系结构(an example of an encoder network with skip connections)
Christian Heipke 汉诺威莱布尼兹大学摄影测量和遥感学教授,目前他领导着一个大约25人的研究团队。其研究领域包括摄影测量学、遥感、图像解译以及它们与计算机视觉和地理信息系统的联系。他撰写(合著)了逾300篇科技论文,其中70多篇发表在同行评议的国际期刊上。他先后获得1992年ISPRS Otto von Gruber奖;2012年ISPRS Fred Doyle奖和2013年ASPR摄影测量Fairchild奖。同时,他任职于诸多学术团体。2004—2009年,他担任欧盟地理空间数据委员会(EuroSDR)副主席。2011—2014年,他担任德国大地测量委员会(DGK)主席,2012—2016年担任ISPRS秘书长。自2016年7月至今,他担任国际摄影测量与遥感学会(ISPRS)主席。
Christian Heipke is a professor of photogrammetry and remote sensing at Leibniz University Hannover, where he currently leads a group of about 25 researchers. His professional interests comprise all aspects of photogrammetry, remote sensing, image understanding and their connection to computer vision and GIS. His has authored or coauthored more than 300 scientific papers, more than 70 of which appeared in peer-reviewed international journals. He is the recipient of the 1992 ISPRS Otto von Gruber Award, the 2012 ISPRS Fred Doyle Award, and the 2013 ASPRS Photogrammetric (Fairchild) Award. He is an ordinary member of various learnt societies. From 2004 to 2009, he served as vice president of EuroSDR. From 2011-2014 he was chair of the German Geodetic Commission (DGK), from 2012-2016 ISPRS Secretary General. Currently he serves as ISPRS President.
Franz Rottensteiner 汉诺威莱布尼兹大学(LUH)副教授,现为摄影测量与地理空间信息研究所(IPI)“摄影测量图像分析”研究小组负责人。他在奥地利维也纳技术大学(TUW)取得博士学位。其研究方向包括图像定位、图像分类、基于图像和点云的自动目标检测和重建以及遥感数据的变化检测等方面。在2008年加入LUH之前,他分别在TUW和澳大利亚的新南威尔士大学和墨尔本大学工作。他撰写或合著了150多篇科学论文,其中36篇发表在同行评议的国际期刊上。他于2004年获得奥地利大地测量委员会的Karl Rinner奖,2017年获得Leica Geosystems公司赞助的Carl Pulfrich Award for Photogrammetry。自2011年起,他一直担任Photogrammetrie Fernerkundung Geoinformation副主编。作为ISPRS第II/4工作组主席,他发起并实施了ISPRS城市目标检测和三维建筑重建数据集。
Franz Rottensteiner is an Associate Professor and leader of the research group “Photogrammetric Image Analysis” at Leibniz University Hannover. He received the Dipl.-Ing. degree in surveying and the Ph.D. degree and venia docendi in photogrammetry, all from Vienna University of Technology (TUW), Vienna, Austria. His research interests include all aspects of image orientation, image classification, automated object detection and reconstruction from images and point clouds, and change detection from remote sensing data. Before joining LUH in 2008, he worked at TUW and the Universities of New South Wales and Melbourne, respectively, both in Australia. He has authored or coauthored more than 150 scientific papers, 36 of which have appeared in peer-reviewed international journals. He received the Karl Rinner Award of the Austrian Geodetic Commission in 2004 and the Carl Pulfrich Award for Photogrammetry, sponsored by Leica Geosystems, in 2017. Since 2011, he has been the Associate Editor of the ISI-listed journal “Photogrammetrie Fernerkundung Geoinformation”. Being the Chairman of the ISPRS Working Group II/4, he initiated and conducted the ISPRS benchmark on urban object detection and 3D building reconstruction.
· Yang C., Rottensteiner F., and Heipke C. 2019. “Classification of Land Cover and Land Use Based on Convolutional Neural Networks.” ISPRS Annals of the Photogrammetry, Remote Sensing and Spatial Information Sciences III-3: 251–258.
