查看原文
其他

【文献阅读】Google Earth Engine:人人可用的行星级地理空间分析平台

走天涯徐小洋 走天涯徐小洋地理数据科学 2022-05-17

Google Earth Engine:人人可用的行星级地理空间分析平台

分享一篇发表在《Remote Sensing of Environment》上面的引用次数达2000多次的经典文献。

Gorelick N, Hancher M, Dixon M, etal. Google Earth Engine: Planetary-scale geospatial analysis for everyone[J]. Remote Sensing of Environment, 2017, 202: 18–27.

摘要

Google Earth Engine是一个基于云平台的行星尺度地理空间分析平台,它将Google的海量计算能力应用于森林砍伐、干旱、灾害、疾病、粮食安全、水管理、气候监测和环境保护等各种社会问题。它作为一个集成平台在该领域独树一帜,其设计目的不仅是赋能于传统的遥感科学家,而是更广泛的缺乏利用传统超级计算机或大规模商品云计算资源所需要的技术能力的普通用户。

引言

  • 超算和PB级的遥感影像可以从NASA/ESA/NOAA等很多单位免费获得,很多工具如GeoSpark/GeoMesa/Hadoop等也被开发出来应用于海量的地理数据计算;
  • 要想充分利用以上资源需要IT基础设施CPU、GPU、网络等等的支持;
  • 遥感专家和有地理数据分析需求的用户不一定能获得足够的超算等资源;
  • Google Earth Engine(GEE)是一个云计算平台,和超算中心不一样,GEE有很好的易用性,不需要很强的代码能力。

平台总览

  • GEE平台包含了PB级的数据以及强大的并行计算能力,提供网络API接口和交互式开发环境(IDE)方便用户进行部署算法和可视化结果。
  • 数据目录中提供了海量的公开地理空间数据集,包含卫星和航空要给数据,光学和非光学的,气象预报、土地覆被等多种数据集;
  • 用户可以使用GEE API调用公开数据和个人数据,数据处理过程自动进行并行运算,用户可以通过本地客户端或者网页IDE来调用这些API;
  • 用户可以登陆earthengine.google.com来获取用户接口、教程、案例、函数参考等等,如果有GIS、遥感、JAVASCRIPT基础会更方便学习GEE,个人数据可以上传也可以将处理结果下载到本地。
图1. GEE IDE

数据目录

  • GEE公开数据目录包含了PB级的常用的地理空间数据:
    • 包含全部的Landsat和Sentinel-1/2,和其它气候预测、环境变化、LUCC、社会经济等数据;
    • 这个数据目录每天更新近6000个传感器的数据;
    • 用户可以通过REST接口上传个人数据;
  • GEE中的影像为了提高处理效率进行了预处理:
    • 影像保留原始的投影和分辨率,做成了256×256的切片,存储在高效且有容灾备份系统的数据库中;
    • 为了高速可视化,影像进行了金字塔切片;

GEE中常用数据集

DatasetNominal resolutionTemporal granularityTemporal coverageSpatial coverage
Landsat



Landsat 8 OLI/TIRS30 m16 day2013–NowGlobal
Landsat 7 ETM +30 m16 day2000–NowGlobal
Landsat 5 TM30 m16 day1984–2012Global
Landsat 4–8 surface reflectance30 m16 day1984–NowGlobal
Sentinel



Sentinel 1 A/B ground range detected10 m6 day2014–NowGlobal
Sentinel 2A MSI10/20 m10 day2015–NowGlobal
MODIS



MOD08 atmosphereDaily2000–NowGlobal
MOD09 surface reflectance500 m1 day/8 day2000–NowGlobal
MOD10 snow cover500 m1 day2000–NowGlobal
MOD11 temperature and emissivity1000 m1 day/8 day2000–NowGlobal
MCD12 Land cover500 mAnnual2000–NowGlobal
MOD13 Vegetation indices500/250 m16 day2000–NowGlobal
MOD14 Thermal anomalies & fire1000 m8 day2000–NowGlobal
MCD15 Leaf area index/FPAR500 m4 day2000–NowGlobal
MOD17 Gross primary productivity500 m8 day2000–NowGlobal
MCD43 BRDF-adjusted reflectance1000/500 m8 day/16 day2000–NowGlobal
MOD44 veg. cover conversion250 mAnnual2000–NowGlobal
MCD45 thermal anomalies and fire500 m30 day2000–NowGlobal
ASTER



L1 T radiance15/30/90 m1 day2000–NowGlobal
Global emissivity100 mOnce2000–2010Global
Other imagery



PROBA-V top of canopy reflectance100/300 m2 day2013–NowGlobal
EO-1 hyperion hyperspectral radiance30 mTargeted2001–NowGlobal
DMSP-OLS nighttime lights1 kmAnnual1992–2013Global
USDA NAIP aerial imagery1 mSub-annual2003–2015CONUS
Topography



Shuttle Radar Topography Mission30 mSingle200060°N–54°S
USGS National Elevation Dataset10 mSingleMultipleUnited States
USGS GMTED20107.5″SingleMultiple83°N–57°S
GTOPO3030″SingleMultipleGlobal
ETOPO11′SingleMultipleGlobal
Landcover



GlobCover300 mNon-periodic200990°N–65°S
USGS National Landcover Database30 mNon-periodic1992–2011CONUS
UMD global forest change30 mAnnual2000–201480°N–57°S
JRC global surface water30 mMonthly1984–201578°N–60°S
GLCF tree cover30 m5 year2000–2010Global
USDA NASS cropland data layer30 mAnnual1997–2015CONUS
Weather, precipitation & atmosphere



Global precipitation measurement6′3 h2014–NowGlobal
TRMM 3B42 precipitation15′3 h1998–201550°N–50°S
CHIRPS precipitation3′5 day1981–Now50°N–50°S
NLDAS-27.5′1 h1979–NowNorth America
GLDAS-215′3 h1948–2010Global
NCEP reanalysis2.5°6 h1948–NowGlobal
ORNL DAYMET weather1 kmAnnual1980–NowNorth America
GRIDMET4 km1 day1979–NowCONUS
NCEP global forecast system15′6 h2015–NowGlobal
NCEP climate forecast system12′6 h1979–NowGlobal
WorldClim30″12 images1960–1990Global
NEX downscaled climate projections1 km1 day1950–2099North America
Population



WorldPop100 m5 yearMultiple2010–2015
GPWv430″5 year2000–202085°N–60°S

系统架构

  • GEE是谷歌技术的集大成者:
    • Borg聚类管理系统(Borg cluster management system);
    • Bigtable和Spanner分布式数据库(distributed databases);
    • Colossus Google文件系统(Google File System);-FlumeJava框架,一种并行计算系统框架;
    • Google Fusion Tables一种网络数据库,支持空间数据表(点线面和属性);
图2. 简化的GEE系统架构图
  • GEE系统架构简介:
    • Earth Engine Code Editor和第三方WEB应用程序通过REST API进行交互和批量查询;
    • 动态请求(On-the-fly requests)前端控制,复杂的二级查询功能由Compute Master提供,管理分布式的计算服务器。
    • FlumeJava管理分布式计算,Borge聚类管理系统保证系统多用户负载均衡;
    • 任何个人用户的失败仅仅是重新进行查询。
  • GEE的查询基于函数
    • GEE库中有800多函数
  • GEE的函数可以进行组合,形成有向非循环图(Directed Acyclic Graph, DAG)
  • GEE开发目前支持Python和Javascript
GEE代码
代码的DAG
  • GEE的结果可以显示在动态地图上,根据交互式的地图确定投影和范围,用户可以开发出要大规模应用的算法,然后提交GEE进行批处理获取完整的计算结果。

GEE常用函数表

Function categoryExamplesMode of operation
Numerical operations

Primitive operationsadd, subtract, multiply, divide, etc.Per pixel/per feature
Trigonometric operationscos, sin, tan, acos, asin, atan, etc.
Standard functionsabs, pow, sqrt, exp., log, erf, etc.
Logical operationseq, neq, gt, gte, lt, lte, and, or
Bit/bitwise operationsand, or, xor, not, bit shift, etc.
Numeric castingint, float, double, uint8, etc.
Array/matrix operations

Elementwise operations(numeric operations as above)Per pixel/per feature
Array manipulationGet, length, cat, slice, sort, etc.
Array constructionIdentity, diagonal, etc.
Matrix operationsProduct, determinant, transpose, inverse, pseudoinverse, decomposition, etc.
Reduce and accumulateReduce, accum
Machine learning

Supervised classification and regressionBayes, CART, Random Forest, SVM, Perceptron, Mahalanobis, etc.Per pixel/per feature
Unsupervised ClassificationK-Means, LVQ, Cobweb, etc.
Other per-pixel image operations

Spectral operationsUnmixing, HSV transform, etc.Per pixel
Data maskingUnmask, update mask, etc.
VisualizationMin/max, color palette, gamma, SLD, etc.
LocationPixel area, pixel coordinates, etc.
Kernel operations

ConvolutionConvolve, blur, etc.Per image tile
MorphologyMin, max, mean, distance, etc.
TextureEntropy, GLCM, etc.
Simple shape kernelsCircle, rectangle, diamond, cross, etc.
Standard kernelsGaussian, Laplacian, Roberts, Sobel, etc.
Other kernelsEuclidean, Manhattan and Chebyshev distance, arbitrary kernels and combinations
Other Image Operations

Band manipulationAdd, select, rename, etc.Per image
Metadata propertiesGet, set, etc.
DerivativePixel-space derivative, spatial gradient
Edge detectionCanny, Hough transform
Terrain operationsSlope, aspect, hillshade, fill minima, etc.
Connected componentsComponents, component size
Image clippingClip
ResamplingBilinear, bicubic, etc.
WarpingTranslate, changeProj
Image registrationRegister, displacement, displace
Other tile-based operationsCumulative cost, medial axis, reduce resolution with arbitrary reducers, etc.
Image aggregationsSample region(s), reduce region(s) with arbitrary reducers
Reducers

SimpleCount, distinct, first, etc.Context-dependent
Mathematicalsum, product, min, max, etc.
LogicalLogical and/or, bitwise and/or
StatisticalMean, median, mode, percentile, standard deviation, covariance, histogram, etc.
CorrelationKendall, Spearman, Pearson, Sen's slope
RegressionLinear regression, robust linear regression
Geometry Operations

TypesPoint, LineString, Polygon, etc.Per-feature
MeasurementsLength, area, perimeter, distance, etc.
Constructive operationsIntersection, union, difference, etc.
PredicatesIntersects, contains, withinDistance, etc.
Other operationsBuffer, centroid, transform, simplify, etc.
Table/collection operations

Basic manipulationSort, merge, size, first, limit, distinct, flatten, remap, etc.Streaming
Property filteringeq, neq, gt, lt, date range, and, or, not, etc.
Spatial filteringIntersects, contains, withinDistance, etc.
Parallel processingMap, reduce, iterate
JoinsSimple, inner, grouping, etc.
Vector/raster operations

RasterizationPaint/draw, distancePer tile
Spatial interpolationKriging, IDW interpolation
VectorizationreduceToVectorsScatter/gather
Other data types

Number, string, list, dictionary, date, daterange, projection, etc.Context-dependent

数据分布模型(Data distribution model)

Earth Engine库中的函数采用多种内置的并行化和数据分布模型来实现高性能。每种模型都针对不同的数据访问模式(Data access pattern)进行了优化。

影像瓦片(Image tiling)

  • 高计算量的:影像瓦片,一般256×256像素
  • 低计算量:像元,JVM(JAVA虚拟机)和Jast-In-Time(JIT)实时计算

空间聚合(Spatial aggregations)

  • 使用场景:空间统计、栅格转矢量、影像分类等
  • scatter-gather模型,划分子区域,并行计算

流集合(Streaming collections)

  • 使用场景:时间序列影像
  • 影像瓦片和空间聚合的组合
  • 瓦片显著小于整个影像时,流处理非常高效

缓存和常见子表达式消除(Caching and common sub-expression elimination)

  • 将中间计算结果以哈希值为主键存储在高速分布式缓存中
  • 查询减少冗余计算

完整文章请点击阅读原文查看原文


您可能也对以下帖子感兴趣

文章有问题?点此查看未经处理的缓存