查看原文
其他

卫星图像深度学习资源大列表

robmarkcole 慧天地 2021-09-20

点击图片上方蓝色字体“慧天地”即可订阅



Introduction

This document primarily lists resources for performing deep learning (DL) on satellite imagery. To a lesser extent Machine learning (ML, e.g. random forests, stochastic gradient descent) are also discussed, as are classical image processing techniques.


Top links


  • https://github.com/chrieke/awesome-satellite-imagery-datasets

  • A modern geospatial workflow

  • geospatial-machine-learning

  • Long list of satellite missions with example imagery

  • AWS datasets


Table of contents


  • Datasets

  • Online computing resources

  • Interesting dl projects

  • Production

  • Image formats and catalogues

  • State of the art

  • Online platforms for Geo analysis

  • Techniques

  • Useful references


Datasets


  • Warning satellite image files can be LARGE, even a small data set may comprise 50 GB of imagery

  • Various datasets listed here and at awesome-satellite-imagery-datasets


WorldView - SpaceNet

  • https://en.wikipedia.org/wiki/WorldView-3

  • 0.3m PAN, 1.24 MS, 3.7m SWIR. Off-Nadir (stereo) available.

  • Owned by DigitalGlobe

  • Intro to SpaceNet

  • SpaceNet dataset on AWS -> see this getting started notebook and this notebook on the off-Nadir dataset. Note -> REQUESTER PAYS

  • cloud_optimized_geotif here used in the 3D modelling notebook here.

  • Package of utilities to assist working with the SpaceNet dataset.

  • For more Worldview imagery see Kaggle DSTL competition.


Sentinel

  • As part of the EU Copernicus program, multiple Sentinel satellites are capturing imagery 

  • 13 bands, Spatial resolution of 10 m, 20 m and 60 m, 290 km swath, the temporal resolution is 5 days

  • Open access data on GCP

  • Paid access via sentinel-hub and python-api.

  • Example loading sentinel data in a notebook


Landsat


  • Long running US program 

  • 8 bands, 15 to 60 meters, 185km swath, the temporal resolution is 16 days

  • Imagery on GCP, see the GCP bucket here, with imagery analysed in this notebook on Pangeo

Shuttle Radar Topography Mission (digital elevation maps)

  • Data - open access


Kaggle


Kaggle hosts several large satellite image datasets (> 1 GB). A list if general image datasets is here. A list of land-use datasets is here. The kaggle blog is an interesting read.


Kaggle - Amazon from space - classification challenge

  • https://www.kaggle.com/c/planet-understanding-the-amazon-from-space/data

  • 3-5 meter resolution GeoTIFF images from planet Dove satellite constellation

  • 12 classes including - cloudy, primary + waterway etc

  • 1st place winner interview - used 11 custom CNN


Kaggle - DSTL - segmentation challenge

  • https://www.kaggle.com/c/dstl-satellite-imagery-feature-detection

  • Rating - medium, many good examples (see the Discussion as well as kernels), but as this competition was run a couple of years ago many examples use python 2

  • WorldView 3 - 45 satellite images covering 1km x 1km in both 3 (i.e. RGB) and 16-band (400nm - SWIR) images

  • 10 Labelled classes include Buildings, Road, Trees, Crops, Waterway, Vehicles

  • Interview with 1st place winner who used segmentation networks - 40+ models, each tweaked for particular target (e.g. roads, trees)

  • Deepsense 4th place solution

  • My analysis here


Kaggle - Airbus Ship Detection Challenge

  • https://www.kaggle.com/c/airbus-ship-detection/overview

  • Rating - medium, most solutions using deep-learning, many kernels, good example kernel.

  • I believe there was a problem with this dataset, which led to many complaints that the competition was ruined.


Kaggle - Draper - place images in order of time

  • https://www.kaggle.com/c/draper-satellite-image-chronology/data

  • Rating - hard. Not many useful kernels.

  • Images are grouped into sets of five, each of which have the same setId. Each image in a set was taken on a different day (but not necessarily at the same time each day). The images for each set cover approximately the same area but are not exactly aligned.

  • Kaggle interviews for entrants who used XGBOOST and a hybrid human/ML approach


Kaggle - Deepsat - classification challenge

Not satellite but airborne imagery. Each sample image is 28x28 pixels and consists of 4 bands - red, green, blue and near infrared. The training and test labels are one-hot encoded 1x6 vectors. Each image patch is size normalized to 28x28 pixels. Data in .mat Matlab format. JPEG?

  • Imagery source

  • Sat4 500,000 image patches covering four broad land cover classes - barren land, trees, grassland and a class that consists of all land cover classes other than the above three Example notebook

  • Sat6 405,000 image patches each of size 28x28 and covering 6 landcover classes - barren land, trees, grassland, roads, buildings and water bodies.

  • Deep Gradient Boosted Learning article


Kaggle - other

  • Satellite + loan data -> https://www.kaggle.com/reubencpereira/spatial-data-repo


Alternative datasets


There are a variety of datasets suitable for land classification problems.


UC Merced

  • http://weegee.vision.ucmerced.edu/datasets/landuse.html

  • This is a 21 class land use image dataset meant for research purposes.

  • There are 100 RGB TIFF images for each class

  • Each image measures 256x256 pixels with a pixel resolution of 1 foot


AWS datasets

  • Landsat -> free viewer at remotepixel and libra

  • Optical, radar, segmented etc. https://aws.amazon.com/earth/

  • SpaceNet - WorldView-3 and article here. Also example semantic segmentation using Raster Vision


Quilt

  • Several people have uploaded datasets to Quilt


Google Earth Engine

  • https://developers.google.com/earth-engine/

  • Various imagery and climate datasets, including Landsat & Sentinel imagery

  • Python API but  all compute happens on Googles servers


Weather Datasets

  • UK met-odffice

    ->https://www.metoffice.gov.uk/datapoint

  • NASA (make request and emailed when ready)

    -> https://search.earthdata.nasa.gov

  • NOAA (requires BigQuery)

    -> https://www.kaggle.com/noaa/goes16/home

  • Time series weather data for several US cities

    ->https://www.kaggle.com/selfishgene/historical-hourly-weather-data


Online computing resources


Generally a GPU is required for DL, and this section lists Jupyter environments with GPU available. There is a good overview of online Jupyter envs on the fast.at site.


Google Colab

  • Collaboratory notebookswith GPU as a backend for free for 12 hours at a time. Note that the GPU may be shared with other users, so if you aren't getting good performance try reloading.

  • Tensorflow available & pytorch can be installed, useful articles


Kaggle - also Google!

  • Free to use

  • GPU Kernels - may run for 1 hour

  • Tensorflow, pytorch & fast.ai available

  • Advantage that many datasets are already available

  • Read


Floydhub

  • https://www.floydhub.com/

  • Pricing -> https://www.floydhub.com/pricing

  • Free plan allows 1 job and 10GB storage, but pay for GPU.

  • Cloud GPUs (AWS backend)

  • Tensorboard

  • Version Control for DL

  • Deploy Models as REST APIs

  • Public Datasets


Clouderizer

  • https://clouderizer.com/

  • Clouderizer $5 month for 200 hours (Robbie plan)

  • Run projects locally, on cloud or both.

  • SSH terminal, Jupyter Notebooks and Tensorboard are securely accessible from Clouderizer Web Console.


Paperspace

  • https://www.paperspace.com/

  • 1-Click Jupyter Notebooks, GPU on demand

  • Pay as you go -> https://www.paperspace.com/pricing

  • Python API


Crestle

  • https://www.crestle.com/

  • Pricing -> https://www.crestle.com/pricing

  • Min plan is $5 per month with 200 hours per month. Pay $0.59/hour for GPU and storage $0.014/GB/day

  • Cloud GPU & persistent file store

  • Fast.ai lessons pre-installed


Interesting DL projects


Raster Vision by Azavea

  • https://www.azavea.com/projects/raster-vision/

  • An open source Python framework for building computer vision models on aerial, satellite, and other large imagery sets.

  • Accessible through the Raster Foundry

  • Example use cases on open data


RoboSat

  • https://github.com/mapbox/robosat

  • Generic ecosystem for feature extraction from aerial and satellite imagery.


RoboSat.Pink

  • A fork of robotsat

  • https://github.com/datapink/robosat.pink


DeepOSM

  • https://github.com/trailbehind/DeepOSM

  • Train a deep learning net with OpenStreetMap features and satellite imagery.


DeepNetsForEO - segmentation

  • https://github.com/nshaud/DeepNetsForEO

  • Uses SegNET for working on remote sensing images using deep learning.


Skynet-data

  • https://github.com/developmentseed/skynet-data

  • Data pipeline for machine learning with OpenStreetMap


Production


Custom REST API

  • Basic https://blog.keras.io/building-a-simple-keras-deep-learning-rest-api.html with code here

  • Advanced https://www.pyimagesearch.com/2018/01/29/scalable-keras-deep-learning-rest-api/

  • https://github.com/galiboo/olympus

Tensorflow Serving

  • https://www.tensorflow.org/serving/

  • Official version is python 2 but python 3 build here

  • Another approach is to use Docker

TensorFlow Serving makes it easy to deploy new algorithms and experiments, while keeping the same server architecture and APIs. Multiple models, or indeed multiple versions of the same model, can be served simultaneously.  TensorFlow Serving comes with a scheduler that groups individual inference requests into batches for joint execution on a GPU


Floydhub

  • Allows exposing model via rest API


modeldepot

  • https://modeldepot.io

  • ML models hosted


Image formats & catalogues



  • We certainly want to consider cloud optimised GeoTiffs https://www.cogeo.org/

  • https://terria.io/ for pretty catalogues

  • Remote pixel

  • Sentinel-hub eo-browser

  • Large datasets may come in HDF5 format, can view with -> https://www.hdfgroup.org/downloads/hdfview/

  • Climate data is often in netcdf format, which can be opened using xarray

  • The xarray docs list a number of ways that data can be stored and loaded.


STAC - SpatioTemporal Asset Catalog

  • Specification describing the layout of a catalogue comprising of static files. The aim is that the catalogue is crawlable so it can be indexed by a search engine and make imagery discoverable, without requiring yet another API interface.

  • An initiative of https://www.radiant.earth/ in particular https://github.com/cholmes

  • Spec at https://github.com/radiantearth/stac-spec

  • Browser at https://github.com/radiantearth/stac-browser

  • Talk at https://docs.google.com/presentation/d/1O6W0lMeXyUtPLl-k30WPJIyH1ecqrcWk29Np3bi6rl0/edit#slide=id.p

  • Example catalogue at

    https://landsat-stac.s3.amazonaws.com/catalog.json

  • Chat https://gitter.im/SpatioTemporal-Asset-Catalog/Lobby

  • Several useful repos on https://github.com/sat-utils


State of the art


What are companies doing?

  • Overall trend to using AWS S3 backend for image storage. There are a variety of tools for exploring and having teams collaborate on data on S3, e.g. T4.

  • Bucking the trend, Descartes & Airbus are using a google backend -> checkout gcsts for google cloud storage sile-system

  • Just speculating, but a serverless pipeline appears to be where companies are headed for routine compute tasks, whilst providing a Jupyter notebook approach for custom analysis.

  • Traditional data formats aren't designed for processing, so new standards are developing such as cloud optimised geotiffs and zarr


Online platforms for Geo analysis


  • This article discusses some of the available platforms -> TLDR Pangeo rocks, but must BYO imagery

  • Pangeo - open source resources for parallel processing using Dask and Xarray http://pangeo.io/index.html

  • Airbus Sandbox -> will provide access to imagery

  • Descartes Labs-> access to EO imagery from a variety of providers via python API -> not clear which imagery is available (Airbus + others?) or pricing

  • DigitalGlobe have a cloud hosted Jupyter notebook platform called GBDX. Cloud hosting means they can guarantee the infrastructure supports their algorithms, and they appear to be close/closer to deploying DL. Tutorial notebooks here. Only Sentinel-2 and Landsat data on free tier.

  • Planet have a Jupyter notebook platform which can be deployed locally and requires an API key (14 days free). They have a python wrapper (2.7..) to their rest API. No price after 14 day trial.


Techniques


This section explores the different techniques (DL, ML & classical) people are applying to common problems in satellite imagery analysis. Classification problems are the most simply addressed via DL, object detection is harder, and cloud detection harder still (niche interest).


Land classification

  • Very common problem, assign land classification to a pixel based on pixel value, can be addressed via simple sklearn cluster algorithm or deep learning.

  • Land use is related to classification, but we are trying to detect a scene, e.g. housing, forestry. I have tried CNN -> See my notebooks

  • Land Use Classification using Convolutional Neural Network in Keras

  • Sea-Land segmentation using DL

  • Pixel level segmentation on Azure

  • Deep Learning-Based Classification of Hyperspectral Data

  • A U-net based on Tensorflow for objection detection (or segmentation) of satellite images - DSTL dataset but python 2.7


Semantic segmentation

  • Pixel-wise classification

  • Instance segmentation with keras - links to satellite examples


Change detection

  • Monitor water levels, coast lines, size of urban areas, wildfire damage. Note, clouds change often too..!

  • Using PCA (python 2, requires updating) -> https://appliedmachinelearning.blog/2017/11/25/unsupervised-changed-detection-in-multi-temporal-satellite-images-using-pca-k-means-python-code/

  • Using CNN -> https://github.com/vbhavank/Unstructured-change-detection-using-CNN

  • Siamese neural network to detect changes in aerial images

  • https://www.spaceknow.com/

  • LANDSAT Time Series Analysis for Multi-temporal Land Cover Classification using Random Forest

  • Change Detection in 3D: Generating Digital Elevation Models from Dove Imagery

  • Change Detection in Hyperspectral Images Using Recurrent 3D Fully Convolutional Networks

  • PySAR - InSAR (Interferometric Synthetic Aperture Radar) timeseries analysis in python


Image registration

  • Wikipedia article on registration -> register for change detection or image stitching

  • Traditional approach -> define control points, employ RANSAC algorithm

  • Phase correlation used to estimate the translation between two images with sub-pixel accuracy, useful for allows accurate registration of low resolution imagery onto high resolution imagery, or register a sub-image on a full image -> Unlike many spatial-domain algorithms, the phase correlation method is resilient to noise, occlusions, and other defects. Applied to Landsat images here.


Object detection

  • A typical task is detecting boats on the ocean, which should be simpler than land based challenges owing to blank background in images, but is still challenging and no convincing robust solutions available.

  • Intro articles here and here.

  • DigitalGlobe article - they use a combination classical techniques (masks, erodes) to reduce the search space (identifying water via NDWI which requires SWIR) then apply a binary DL classifier on candidate regions of interest. They deploy the final algo as a task on their GBDX platform. They propose that in the future an R-CNN may be suitable for the whole process.

  • Planet use non DL felzenszwalb algorithm to detect ships

  • Segmentation of buildings on kaggle

  • Identifying Buildings in Satellite Images with Machine Learning and Quilt -> NDVI & edge detection via gaussian blur as features, fed to TPOT for training with labels from OpenStreetMap, modelled as a two class problem, “Buildings” and “Nature”.

  • Deep learning for satellite imagery via image segmentation

  • Building Extraction with YOLT2 and SpaceNet Data

  • Find sports fields using Mask R-CNN and overlay on open-street-map


Cloud detection

  • A subset of the object detection problem, but surprisingly challenging

  • From this article on sentinelhubthere are three popular classical algorithms that detects thresholds in multiple bands in order to identify clouds. In the same article they propose using semantic segmentation combined with a CNN for a cloud classifier (excellent review paper here), but state that this requires too much compute resources.

  • This article compares a number of ML algorithms, random forests, stochastic gradient descent, support vector machines, Bayesian method.

  • DL..


Super resolution

  • https://medium.com/the-downlinq/super-resolution-on-satellite-imagery-using-deep-learning-part-1-ec5c5cd3cd2

  • https://modeldepot.io/joe/vdsr-for-super-resolution


Pansharpening

  • Image fusion of low res multispectral with high res pan band. Several algorithms described in the ArcGIS docs, with the simplest being taking the mean of the pan and RGB pixel value.

  • Does not require DL, classical algos suffice, see this notebook and this kaggle kernel

  • https://github.com/mapbox/rio-pansharpen


Stereo imaging for terrain mapping & DEMs

  • Wikipedia DEM article and phase correlation article

  • Intro to depth from stereo

  • Map terrain from stereo images to produce a digital elevation model (DEM) -> high resolution & paired images required, typically 0.3 m, e.g. Worldview or GeoEye.

  • Process of creating a DEM here and here.

  • ArcGIS can generate DEMs from stereo images

  • https://github.com/MISS3D/s2p -> produces elevation models from images taken by high resolution optical satellites -> demo code on https://gfacciol.github.io/IS18/

  • Automatic 3D Reconstruction from Multi-Date Satellite Images

  • Semi-global matching with neural networks

  • Predict the fate of glaciers

  • monodepth - Unsupervised single image depth prediction with CNNs

  • Stereo Matching by Training a Convolutional Neural Network to Compare Image Patches

  • Terrain and hydrological analysis based on LiDAR-derived digital elevation models (DEM) - Python package

  • Phase correlation in scikit-image


Lidar

  • Reconstructing 3D buildings from aerial LiDAR with Mask R-CNN)


NVDI - vegetation index

  • Simple band math ndvi = np.true_divide((ir - r), (ir + r)) but challenging due to the size of the imagery.

  • Example notebook local

  • Landsat data in cloud optimised format analysed for NVDI with medium article here.


SAR

  • Removing speckle noise from Sentinel-1 SAR using a CNN

  • A dataset which is specifically made for deep learning on SAR and optical imagery is the SEN1-2 dataset, which contains around 250K corresponding patch pairs of Sentinel 1 (VV) and 2 (RGB) data. Data download, Paper


For fun

  • Style transfer - see the world in a new way

Useful open source software

  • QGIS- Create, edit, visualise, analyse and publish geospatial information. Python scripting and plugins.

  • Orfeo toolbox - remote sensing toolbox with python API (just a wrapper to the C code). Do activites such as pansharpening, ortho-rectification, image registration, image segmentation & classification. Not much documentation.

  • QUICK TERRAIN READER - view DEMS, Windows


Useful github repos

  • torchvision-enhance -> Enhance PyTorch vision for semantic segmentation, multi-channel images and TIF file,...

  • dl-satellite-docker -> docker files for geospatial analysis, including tensorflow, pytorch, gdal, xgboost...


Useful References


  • https://codelabs.developers.google.com/codelabs/tensorflow-for-poets/#0

  • https://github.com/taspinar/sidl/blob/master/notebooks/2_Detecting_road_and_roadtypes_in_sattelite_images.ipynb

  • Geonotebooks with Docker container

  • Sentinel NetCDF data

  • Open Data Cube - serve up cubes of data https://www.opendatacube.org/

  • Process Satellite data using AWS Lambda functions

  • OpenDroneMap- generate maps, point clouds, 3D models and DEMs from drone, balloon or kite images.



来源:Github(版权归原作者及刊载媒体所有)


荐读

点击下文标题即可阅读

利用卫星图像、地理信息系统和机器学习预报火灾实时卫星图像即将来临:几分钟就可获得情报图像

编辑 / 安有硕  审核 / 呼慧珊 游志龙

指导:万剑华教授

视频 小程序 ,轻点两下取消赞 在看 ,轻点两下取消在看

您可能也对以下帖子感兴趣

文章有问题?点此查看未经处理的缓存