查看原文
其他

PNAS前沿:多层网络的拓扑聚类

潘佳栋 集智俱乐部 2022-05-09


导语


由于在建模相互依赖的系统方面具有很高的实用性,多层网络在许多研究领域中受到广泛关注。但是,多层网络的聚类,尤其是使用高阶交互信息的聚类,仍处于起步阶段。反过来,高阶连接通常是多层网络应用程序的关键。最近一篇发表于 PNAS 的论文中,研究人员将拓扑数据分析的概念引入到复杂的多层网络的研究中,提出了一种网络聚类的拓扑方法,他们将这种方法称为基于持续性图的聚类(clustering of multilayer networks,CPD)。CPD系统地考虑了网络层内和网络层之间节点交互的不同的高阶特性,并集成了来自近邻节点的信息。研究人员通过将CPD应用于一个具有社会重要性的新问题来说明CPD的效用:在房屋保险理赔的背景下,对房屋进行分区,以表示天气和气候引起的风险。

潘佳栋 | 作者

邓一雪 | 编辑



论文题目:

Topological clustering of multilayer networks

论文地址:

https://www.pnas.org/content/118/21/e2019994118




一、背景




1. 一些网络具有复杂和高度结构

现代社会中的许多系统都具有复杂且高度相互依赖的结构[1-7],多层网络可以用来对这些结构进行建模。因此,研究人员对使用复杂多层网络进行跨学科分析产生了浓厚的兴趣。多层网络考虑了多层连接(即网络)之间的关系,其中每一层网络代表一个系统或子系统。由于关键基础设施对抵御自然灾害、恐怖活动和网络威胁的安全性和抵抗力方面的需求[8-14],当今多层网络研究的主要目标之一是更好地了解多层网络中哪些部分更为重要,更易受特定危害的影响,并制定积极的策略以实现最佳分区,从而隔离不健康的系统组件并降低进一步故障传播的风险[15-17]。

与单层网络的情况类似,多层网络聚类的目的是揭示有意义的节点分组模式,并通过考虑节点之间可能涉及的不同交互方式将节点划分为社区。不过,与单层网络相反,多层网络的聚类仍然是一个相对欠发达的领域[18-24]。多层网络聚类提出了新的挑战。首先,多层网络的划分需要考虑到同一层中节点之间的关系以及不同层中节点之间的相互作用。其次,多层网络中不同层可能表现出不同的局部和全局结构特性。最后,高阶网络通常显示出更强的社区存在信息。

2. 多层网络的拓扑聚类的提出


为了应对这些挑战,研究人员将拓扑数据分析(topological data analysis,TDA)的概念引入到复杂的多层网络的研究中,并提出了一种网络聚类的拓扑方法。TDA是代数拓扑和数据科学[31-34]的一种新兴方法,它提供了一种数学上严格的机制来分析数据形状。尤其是,TDA允许人们分析观察数据的拓扑和几何特性,从而更深入地了解数据生成过程背后的隐藏机制。[35-38]

拓扑网络聚类背后的关键思想是根据近邻节点形状相似程度对其进行分组。特别是,该算法使用拓扑方法基于持续性图对每个节点周围的局部拓扑和几何进行比较,因此被称为“使用持续性图的聚类”(CPD)。CPD方法既可以系统地计算网络层内部和网络层之间的异构高阶特性,又可以集成来自近邻节点及其相互作用的重要信息。[39-41]

研究人员说明了他们的CPD算法的应用以及拓扑概念在复杂网络聚类中的实用性。他们以多层网络在房屋保险中为例进行说明。研究人员通过引入基于气候条件和房屋保险变量的多层复杂网络,基于拓扑CPD方法对房屋进行分区。与基于简单地理邻近度的传统工具相比,基于环境和社会人口统计学特征相似度的风险图可以更准确地模拟气候风险。




二、多层网络拓扑聚类的算法实现




1. 多层网络

研究人员使用图建模单层网络,得到一个对称的邻接矩阵A。之后,他们使用多层图建模多层网络,多层图是带有权重的单层图邻接矩阵的一个集合,包含层内关系和层间相互作用。最终,多层网络为具有矩阵块结构的超邻接矩阵的形式。[49-53]


2. 基于相似度的网络


可以使用各种联系和度量来定义多层网络中边之间的权重。在不存在应用驱动的边的概念的情况下,通常基于节点之间的相似度ω来构建边,从而形成所谓的基于相似度的网络。一种最广泛使用的相似性度量是相关系数,基于相关网络的应用范围从金融[54-56]到脑科学[57-59]到气候学[60-62]。研究人员使用类似的方式进行气候保险多层网络的具体案例研究,并基于观测变量的非线性非参数转换获得的最大相关性。

3. 节点嵌入


从复杂的网络中提取有意义的信息需要大量的计算量和内存空间。节点嵌入在保留结构信息的同时将复杂网络转换为低维空间,从而为解决这两个问题提供了方案。节点嵌入的方法可以分为两类:矩阵分解方法(matrix factorization methods)和随机游走方法(random walk methods)[64-65]。研究人员使用多层网络嵌入(multilayered network embedding),这是多层网络矩阵分解的扩展形式。

4. 基于持续性图的聚类(CPD)


研究人员提出了一种多层网络聚类方法,如图1所示。研究人员的目标是从以多种分辨率记录的数据形状相似性的角度,在无监督的环境中对多层网络进行聚类。为了在不断变化的相似度尺度上系统地量化多层网络的形状动力学,研究人员将拓扑数据分析的多透镜工具引入聚类方法中。[69-70]

图1. 三层网络CPD算法的流程图




三、多层网络的拓扑聚类的应用




研究人员重点介绍了将多层网络分析和拓扑聚类应用于真实房屋保险数据的结果。他们研究有关在10年内(2002年至2011年)加拿大安大略省504个正向分拣区因降雨破坏而造成的房屋保险索赔的信息,并通过对数据进行处理的方式消除了社会人口增长和通货膨胀的影响。

索赔数量和损失的分布在空间上是相同的(图2)。在所有考虑的变量中,信用评分的空间格局最不明显,而降水的空间格局最强,表现出在东南方向(朝向安大略湖和伊利湖地区;图2)的增加。

图2. 变量的空间分布

图3展示了由不同聚类算法提供的加拿大安大略省的聚类数量及其空间位置。四种方法都倾向于在安大略省的西北部形成一个大型簇。为了从不同的方法中分析聚类,研究人员研究了聚类中各种属性均值的差异。

图3. 分配给加拿大安大略省南部的邮政区域的聚类标签:(A)K-PaWM、(B)CPD、(C)Hierarchical和(D)K-medoids(为四种不同的聚类算法)


CPD形成了两个大型类(集群1和集群2),占据超过70%的区域(表3)。这两个集群的降水量、信用评分、平均索赔数和平均每项索赔损失相似,但是第二个类位于安大略省的城市地区,房屋年龄高于平均年龄。第三类的平均降水量最高,因此平均索赔数最高,每个地区的平均损失也最高。值得注意的是,与其他三种聚类方法相比,CPD的属性在聚类内的变异性要小。




四、结论




在本文中,研究人员将拓扑数据分析工具的新兴机制引入到复杂多层网络的分析中,开发了一种拓扑聚类算法,即CPD。它基于拓扑数据分析持续性图概念,基于多层网络中固有数据形状的多透镜比较。他们验证了CPD方法相对于基于欧几里得距离的传统算法的实用性。此外,他们基于拓扑相似性度量(即Wasserstein距离)提出了一种聚类算法的修改版本K-PaWM。他们发现,当数据表现出复杂的时空相关结构时,两种拓扑方法(即CPD和K-PaWM)都是传统聚类工具的竞争替代品。

将来,他们计划将提出的拓扑方法扩展到动态多层网络的聚类中,并证明拓扑聚类的稳定性保证。另一个有趣的方向是通过对总体中具有较少或更多共同形状特征的点进行分组,将基于相似性的聚类(similarity-based agglomerative clustering)[71-73]与CPD相结合。研究人员认为这种聚类在生物医学成像(例如肿瘤检测)中可能特别有用。他们认为拓扑和几何方法为复杂的多层网络的建模、分析和推理开辟了许多新的发展空间。

参考文献


1.S. Caschili, F. R. Medda, A. Wilson, An interdependent multi-layer model: Resilience of international networks. Netw. Spatial Econ. 15, 313–335 (2015).2.S.Pilosof,M.A.Porter,M.Pascual,S.Ke ́fi,Themultilayernatureofecological networks. Nat. Ecol. Evol. 1, 101 (2017).3. M. Pedersen, A. Zalesky, A. Omidvarnia, G. D. Jackson, Multilayer network switching rate predicts brain performance. Proc. Natl. Acad. Sci. U.S.A. 115, 13376–13381 (2018).4. M. Wu et al., A tensor-based framework for studying eigenvector multicentrality inmultilayer networks. Proc. Natl. Acad. Sci. U.S.A. 116, 15407–15413 (2019).5.C.Gomez,A.D.Gonza ́lez,H.Baroud,C.D.Bedoya-Motta,Integratingoperational and organizational aspects in interdependent infrastructure network recovery. RiskAnal. 39, 1913–1929 (2019).6. F. Messina et al., COVID-19: Viral–host interactome analyzed by network based-approach model to study pathogenesis of SARS-CoV-2 infection. J. Transl. Med. 18,1–10 (2020).7. Z. Kosowska-Stamirowska, Network effects govern the evolution of maritime trade.Proc. Natl. Acad. Sci. U.S.A. 117, 12719–12728 (2020).8. L. Tang, K. Jing, J. He, H. Stanley, Complex interdependent supply chain networks:Cascading failure and robustness. Phys. Stat. Mech. Appl. 443, 58–69 (2015).9. H. Baroud, K. Barker, J. E. Ramirez-Marquez, C. M. Rocco, Inherent costs and inter- dependent impacts of infrastructure network resilience. Risk Anal. 35, 642–662(2015).10. I. Bermudez et al., Twitter response to Munich July 2016 attack: Network analysis ofinfluence. Front. Big Data 2, 17 (2019).11. Q. Li, S. Dong, A. Mostafavi, Modeling of inter-organizational coordination dynam-ics in resilience planning of infrastructure systems: A multilayer network simulationframework. PloS One 14, e0224522 (2019).12. E. Nosyreva, L. Massel, “Application of multilayer networks to detect critical energyfacilities” in Proceedings of the VIth International Workshop ‘Critical Infrastructures: Contingency Management, Intelligent, Agent-Based, Cloud Computing and Cyber Security’(IWCI2019),L.Massel,N.Makagonova,A.Kopaygorodsky,A.Massel,Eds. (Atlantis Press, 2019), pp. 249–256.13. I. Bachmann, J. Bustos, B. Bustos, A survey on frameworks used for robustness analysis on interdependent networks. Complexity 2020, 1–17 (2020).14. N. Yadav, S. Chatterjee, A. Ganguly, Resilience of urban transport network-of- networks under intense flood hazards exacerbated by targeted attacks. Sci. Rep. 10, 10350(2020).15. T. Wang, M. Brede, A. Ianni, E. Mentzakis, Characterizing dynamic communication in online eating disorder communities: A multiplex network approach. Appl. Netw. Sci. 4, 12 (2019).16. P. V. Bindu, R. Mishra, P. Thilagam, Discovering spammer communities in Twitter. J. Intell. Inf. Syst. 51, 503–527 (2018).17. M. Wu et al., A tensor-based framework for studying eigenvector multicentrality in multilayer networks. Proc. Natl. Acad. Sci. U.S.A. 116, 15407–15413 (2019).18. R. Interdonato, M. Magnani, D. Perna, A. Tagarelli, D. Vega, Multilayer network simplification: Approaches, models and methods. Comput. Sci. Rev. 36, 100246 (2020).19. A. Tagarelli, A. Amelio, F. Gullo, Ensemble-based community detection in multilayernetworks. Data Min. Knowl. Discov. 31, 1506–1543 (2017).20. N. Arinik, R. Figueiredo, V. Labatut, Multiple partitioning of multiplex signednetworks: Application to European Parliament votes. Soc. Network. 60, 83–102(2020).21.T.Valle`s-Catala`,F.A.Massucci,R.Guimera`,M.Sales-Pardo,Multilayerstochasticblockmodels reveal the multilayer structure of complex networks. Phys. Rev. X 6, 011036(2016).22. S. Paul, Y. Chen, Consistent community detection in multi-relational data throughrestricted multi-layer stochastic blockmodel. Electron. J. Statist. 10, 3807–3870 (2016).23. P. Barbillon, S. Donnet, E. Lazega, A. Bar-Hen, Stochastic block models for multiplex networks: An application to a multilevel network of researchers. J. Roy. Stat. Soc. 180,295–314 (2017).24. J. D. Wilson, J. Palowitch, S. Bhamidi, A. B. Nobel, Community extraction in multilayernetworks with heterogeneous community structure. J. Mach. Learn. Res. 18, 5458–5506 (2017).25. D. R. DeFord, S. D. Pauls, Spectral clustering methods for multiplex networks. Phys.Stat. Mech. Appl. 533, 121949 (2019).26. F. Znidi, H. Davarikia, K. Iqbal, M. Barati, Multi-layer spectral clustering approachto intentional islanding in bulk power systems. J. Mod. Power Syst. Clean Energy 7,1044–1055 (2019).27. P. Mercado, F. Tudisco, M. Hein, “Spectral clustering of signed graphs via matrix powermeans” in Proceedings of the 36th International Conference on Machine Learning, K.Chaudhuri, R. Salakhutdinov, Eds. (ML Research Press, 2019), pp. 1–11.28. S. Paul, Y. Chen, Spectral and matrix factorization methods for consistent communitydetection in multi-layer networks. Ann. Stat. 48, 230–250 (2020).29. D. J. Watts, S. H. Strogatz, Collective dynamics of ‘small-world’ networks. Nature 393, 440–442 (1998).30. A. R. Benson, D. F. Gleich, J. Leskovec, Higher-order organization of complex networks. Science 353, 163–166 (2016).31.R.Ghrist,Barcodes:Thepersistenttopologyofdata.Bull.Am.Math.Soc.45,61–75 (2008).32. G. Carlsson, Topology and data. BAMS 46, 255–308 (2009).33. F. Chazal, B. Michel, An introduction to topological data analysis: Fundamen-tal and practical aspects for data scientists. arXiv [Preprint] (2017). https://arxiv.org/abs/1710.04019 (Accessed 15 January 2020). 34.G.Carlsson,“Persistenthomologyandappliedhomotopytheory”inHandbookofHomotopy Theory, H. Miller, Ed. (CRC Press, 2019), pp. 297–330.35. P. Y. Lum et al., Extracting insights from the shape of complex data using topology.Sci. Rep. 3, 1236 (2013).36. N. Otter, M. A. Porter, U. Tillmann, P. Grindrod, H. A. Harrington, A roadmap for thecomputation of persistent homology. EPJ Data Sci. 6, 17 (2017).37. L. Wasserman, Topological data analysis. Annu. Rev. Stat. App. 5, 501–532 (2018).38. M. R. McGuirl, A. Volkening, B. Sandstede, Topological data analysis of zebrafishpatterns. Proc. Natl. Acad. Sci. U.S.A. 117, 5113–5124 (2020).39. G. Singh, F. Memoli, G. Carlsson,”Topological methods for the analysis of highdimensional data sets and 3D object recognition” in Eurographics Symposium onPoint-Based Graphics 2007 (IEEE, 2007), pp. 91–100.40. F. Chazal, L. Guibas, S. Oudot, P. Skraba, Persistence-based clustering in Riemannianmanifolds. JACM 60, 1–38 (2013).41. U. Islambekov, Y.R. Gel, Unsupervised space-time clustering using persistenthomology. Environmetrics 30, e2539 (2019).42. M. Golnaraghi, Climate Change and the Insurance Industry: Taking Action as RiskManagers and Investors (The Geneva Association, 2018).43. A. A. Tsonis, K. L. Swanson, P. J. Roebber, What do networks have to do with climate?Bull. Am. Meteorol. Soc. 87, 585–596 (2006).44. A. Gozolchiani, K. Yamasaki, O. Gazit, S. Havlin, Pattern of climate network blinkinglinksfollowsElNin ̃oevents.Europhys.Lett.83,28005(2008).45. K. Steinhaeuser, N. V. Chawla, A. R. Ganguly, An exploration of climate data usingcomplex networks. SIGKDD Explor. Newsl. 12, 25–32 (2010).46. K. Steinhaeuser, A. R Ganguly, N. V. Chawla, Multivariate and multiscale dependencein the global climate system revealed through complex networks. Clim. Dynam. 39,889–895 (2012).47.J.Ludescheretal.,VeryearlywarningofnextElNin ̃o.Proc.Natl.Acad.Sci.U.S.A. 111, 2064–2066 (2014).48.W. K. Huang, D. S. Cooley, I. Ebert-Uphoff, C. Chen, S. Chatterjee, New exploratory tools for extremal dependence: χ networks and annual extremal networks. J. Agric. Biol. Environ. Stat. 24, 484–501 (2019).49.M. Kivela ̈ et al., Multilayer networks. J. Complex Netw. 2, 203–271 (2014).50.N. Wider, A. Garas, I. Scholtes, F. Schweitzer. An ensemble perspective on multi-layer networks. arXiv [Preprint] (2015). https://arxiv.org/abs/1507.00169 (Accessed 12 March 2020).51.F. W. Takes, W. A. Kosters, W. Boyd, M. Eelke, Heemskerk, Multiplex network motifs as building blocks of corporate networks. Appl. Netw. Sci. 3, 1–22 (2018).52.J. A. Baggio et al., Multiplex social ecological network analysis reveals how social changes affect community robustness more than resource depletion. Proc. Natl. Acad. Sci. U.S.A. 113, 13708–13713 (2016).53.A.Aleta,S.Meloni,Y.Moreno,Amultilayerperspectivefortheanalysisofurban transportation systems. Sci. Rep. 7, 44359 (2017).54. T. Millington, M. Niranjan, “Quantifying influence in financial markets via partial cor- relation network inference” in 11th International Symposium on Image and Signal Processing and Analysistextit (ISPA, 2019), pp. 306–311.55. P. Giudici, G. Polinesi, Crypto price discovery through correlation networks. Ann. Oper. Res. 299, 443–457 (2019).56. D. R. Williams, P. Rast, Back to the basics: Rethinking partial correlation network methodology. Br. J. Math. Stat. Psychol. 73, 187–212 (2020).57. M. Hawrylycz et al., Multi-scale correlation structure of gene expression in the brain. Neural Netw. 24, 933–942 (2011).58. G. Petri et al., Homological scaffolds of brain functional networks. J. R. Soc. Interf. 11, 20140873 (2014).59. R. Yu et al., Weighted graph regularized sparse brain network construction for MCI identification. Pattern Recogn. 90, 220–231 (2019).60. M. Lu, U. Lall, J. Kawale, S. Liess, V. Kumar, Exploring the predictability of 30-day extreme precipitation occurrence using a global SST–SLP correlation network. J. Clim. 29, 1013–1029 (2016).61. A. Karpatne, V. Kumar, “Big data in climate: Opportunities and challenges for machine learning” in Proceedings of the 23rd ACM SIGKDD International Confer- ence on Knowledge Discovery and Data Mining, KDD ’17 (Association for Computing Machinery, New York, 2017), pp. 21–22.62.   N. Ying, D. Zhou, Z. G. Han, Q. H. Chen, Q. Ye, Z. G. Xue, Rossby waves detection in the CO2 and temperature multilayer climate network. Geophys. Res. Lett. 47, e2019GL086507 (2020).63.   L. Breiman, J. H. Friedman, Estimating optimal transformations for multiple regres- sion and correlation. J. Am. Stat. Assoc. 80, 580–598 (1985).64.   P. Goyal, E. Ferrara, Graph embedding techniques, applications, and performance: A survey. Knowl. Base Syst. 151, 78–94 (2018).65.   W. L. Hamilton, R. Ying, J. Leskovec, Representation learning on graphs: Methods and applications. arXiv [Preprint] (2017). https://arxiv.org/abs/1709.05584 (Accessed 15 January 2020).66.   J. Li, C. Chen, H. Tong, H. Liu, “Multi-layered network embedding” in Proceedings of SDM18 (SIAM, 2018), pp. 684–692.67.   R. G. Barcodes, The persistent topology of data. BAMS 45, 61–75 (2008).68.   A. Zomorodian, Fast construction of the Vietoris–Rips complex. Comput. Graph. 34,263–271 (2010).69.   X. Dong, P. Frossard, P. Vandergheynst, N. Nefedov, Clustering with multi-layergraphs: A spectral perspective. IEEE Trans. Signal Process. 60, 5820–5831 (2012).70.   J. Chen, H. Molter, M. Sorge, O. Suchy`, “Cluster editing in multi-layer and temporal graphs” in 29th International Symposium on Algorithms and Computation (ISAAC 2018), W.-L. Hsu, D.-J. Lee, C.-S. Liao, Eds. (Dagstuhl Publishing, 2018), pp. 24:1–24:13.71. C. Li, G. Biswas, Unsupervised learning with mixed numeric and nominal data. IEEE Trans. Knowl. Data Eng. 14, 673–690 (2002).72. B. Stratman, S. Mahadevan, C. Li, G. Biswas, Identification of critical inspection sam- ples among railroad wheels by similarity-based agglomerative clustering. Integrated Comput. Aided Eng. 18, 203–219 (2011).73. G. Pettet, S. Nannapaneni, B. Stadnick, A. Dubey, G. Biswas, “Incident analysis and prediction using clustering and bayesian network” in 2017 IEEE SmartWorld, Ubiquitous Intelligence & Computing, Advanced & Trusted Computed, Scalable Com- puting & Communications, Cloud & Big Data Computing, Internet of People and Smart City Innovation (SmartWorld/SCALCOM/UIC/ATC/CBDCom/IOP/SCI) (IEEE, 2017), pp. 1–8.


(参考文献可上下滑动查看)



复杂科学最新论文


集智斑图顶刊论文速递栏目上线以来,持续收录来自Nature、Science等顶刊的最新论文,追踪复杂系统、网络科学、计算社会科学等领域的前沿进展。现在正式推出订阅功能,每周通过微信服务号「集智斑图」推送论文信息。扫描下方二维码即可一键订阅:



推荐阅读



点击“阅读原文”,追踪复杂科学顶刊论文

您可能也对以下帖子感兴趣

文章有问题?点此查看未经处理的缓存