查看原文
其他

离散图模型的神经网络学习 | 网络科学论文速递21篇

集智斑图 集智俱乐部 2022-04-08

本文由机器翻译,仅供参考,感兴趣请查阅论文原文

核心速递



  • 离散图模型的神经网络学习;

  • 两种离散动力学模型的关系: 一维元胞自动机与积分值变换;

  • 公共汽车混合交通元胞自动机模型中的交叉转换;

  • 在生物网络中,具有断裂纤维化对称性的电路执行核心逻辑计算;

  • 气候变化统计力学特刊简介;

  • 精确逼近极值统计量;

  • 优先连接网络中平均场近似的批判;

  • 声学-草皮: 基于声学保护隐私的新型冠状病毒肺炎接触追踪;

  • 赢得竞争: 在类似SIS 的流行过程中加强对抗传染;

  • 维基百科和威斯敏斯特: 英国政客维基百科页面的质量和动态;

  • 在线竞争影响力最大化;

  • 基于联合演员表征和社交媒体情感的电影票房预测;

  • 可证明和有效地使用 Turán 阴影近似近派系: PEANUTS;

  • 少即是多: 利用社会信任提高欺骗攻击的有效性;

  • 在线滥用行为数据集中注释一致性分析;

  • 概率节点失效模型下的网络连通性;

  • 基于模糊图反馈的在线稠密子图发现;

  • 连接的力量: 利用网络分析促进应收账款融资;

  • 团体运动比赛中的竞技平衡;

  • 量化应对全球紧急情况的政策: 来自新型冠状病毒肺炎流感大流行的启示;

  • 量化县际流动模式对美国新型冠状病毒肺炎暴发的影响;





离散图模型的神经网络学习


原文标题:

Learning of Discrete Graphical Models with Neural Networks

地址:

http://arxiv.org/abs/2006.11937

作者:

Abhijith J.,Andrey Lokhov,Sidhant Misra,Marc Vuffray


Abstract:Graphical models are widely used in science to represent joint probability distributions with an underlying conditional dependence structure. The inverse problem of learning a discrete graphical model given i.i.d samples from its joint distribution can be solved with near-optimal sample complexity using a convex optimization method known as Generalized Regularized Interaction Screening Estimator (GRISE). But the computational cost of GRISE becomes prohibitive when the energy function of the true graphical model has higher order terms. We introduce NN-GRISE, a neural net based algorithm for graphical model learning, to tackle this limitation of GRISE. We use neural nets as function approximators in an interaction screening objective function. The optimization of this objective then produces a neural-net representation for the conditionals of the graphical model. NN-GRISE algorithm is seen to be a better alternative to GRISE when the energy function of the true model has a high order with a high degree of symmetry. In these cases, NN-GRISE is able to find the correct parsimonious representation for the conditionals without being fed any prior information about the true model. NN-GRISE can also be used to learn the underlying structure of the true model with some simple modifications to its training procedure. In addition, we also show a variant of NN-GRISE that can be used to learn a neural net representation for the full energy function of the true model.

摘要:图形模型在科学上被广泛用来表示具有条件依赖结构的联合概率分布。利用广义正则化相互作用筛选估计(GRISE)方法,可以用近似最优的样本复杂度来求解给定的离散图形模型即从其联合分布中抽取的样本的学习反问题,这种方法是一种凸优化方法。但是当真实图模型的能量函数具有较高阶项时,GRISE 的计算代价就变得高不可攀。我们介绍 NN-GRISE,一种基于神经网络的图形模型学习算法,以解决这一局限性。在交互式筛选目标函数中,我们使用神经网络作为函数逼近器。这个目标的优化然后产生了图形模型条件的神经网络表示。当真实模型的能量函数具有高阶对称性时,NN-GRISE 算法被认为是一种较好的代替 GRISE 算法的方法。在这些情况下,NN-GRISE 能够为条件句找到正确的简约表示,而不需要提供任何关于真实模型的先验信息。Nn-grise 还可以用来学习真实模型的底层结构,只需对其训练过程进行一些简单的修改。另外,我们还展示了 NN-GRISE 的一个变体,它可以用来学习真实模型的全能函数的神经网络表示。



两种离散动力学模型的关系:

一维元胞自动机与积分值变换


原文标题:

Relationship of Two Discrete Dynamical Models: One-dimensional Cellular Automata and Integral Value Transformations

地址:

http://arxiv.org/abs/2006.13741

作者:

Sreeya Ghosh,Sudhakar Sahoo,Sk. Sarif Hassan,Jayanta Kumar Das,Pabitra Pal Choudhury


Abstract:Cellular Automaton (CA) and an Integral Value Transformation (IVT) are two well established mathematical models which evolve in discrete time steps. Theoretically, studies on CA suggest that CA is capable of producing a great variety of evolution patterns. However computation of non-linear CA or higher dimensional CA maybe complex, whereas IVTs can be manipulated easily. The main purpose of this paper is to study the link between a transition function of a one-dimensional CA and IVTs. Mathematically, we have also established the algebraic structures of a set of transition functions of a one-dimensional CA as well as that of a set of IVTs using binary operations. Also DNA sequence evolution has been modelled using IVTs.

摘要:细胞自动机变换(CA)和积分值变换(IVT)是两个在离散时间步进化的数学模型。理论上,对 CA 的研究表明 CA 能够产生各种各样的进化模式。然而,非线性元胞自动机或高维元胞自动机的计算可能比较复杂,而 ivt 则比较容易操作。本文的主要目的是研究一维 CA 的转移函数与 ivt 之间的关系。在数学上,我们还建立了一维元胞自动机转移函数集的代数结构,以及用二进制运算建立的一组元胞自动机转移函数的代数结构。此外,DNA 序列进化已经模拟使用 ivt。



公共汽车混合交通

元胞自动机模型中的交叉转换


原文标题:

Crossover transitions in a bus-car mixed-traffic cellular automata model

地址:

http://arxiv.org/abs/2006.13532

作者:

Damian N. Dailisan,May T. Lim


Abstract:We modify the Nagel-Schreckenberg (NaSch) cellular automata model to study mixed-traffic dynamics. We focus on the interplay between passenger availability and bus-stopping constraints. Buses stop next to occupied cells of a discretized sidewalk model. By parametrizing the spacing distance between designated stops, our simulation covers the range of load-anywhere behavior to that of well-spaced stops. The interplay of passenger arrival rates and bus densities drives crossover transitions from platooning to non-platooned (free-flow and congested) states. We show that platoons can be dissolved by either decreasing the passenger arrival rate or increasing the bus density. The critical passenger arrival rate at which platoons are dissolved is an exponential function of vehicle density. We also find that at low densities, spacing stops close together induces platooned states, which reduces system speeds and increases waiting times of passengers.

摘要:我们修改 Nagel-Schreckenberg (NaSch)细胞自动机模型来研究混合交通动力学。我们着重于乘客可用性和公共汽车停车限制之间的相互作用。公共汽车停靠在一个离散化的人行道模型的占用单元旁边。通过参数化指定停止之间的间距,我们的模拟覆盖了从任何地方的负载行为到适当间隔停止的范围。乘客到达率和公交车密度之间的相互作用促使交叉路口从排队状态过渡到非排队状态(自由流和拥挤)。我们表明,排可以通过降低乘客到达率或增加公共汽车密度来解散。解散排的临界乘客到达率是车辆密度的一个指数函数。我们还发现,在低密度条件下,间隔站靠得很近导致了平排状态,降低了系统速度,增加了乘客的等待时间。



在生物网络中,

具有断裂纤维化对称性

的电路执行核心逻辑计算


原文标题:

Circuits with broken fibration symmetries perform core logic computations in biological networks

地址:

http://arxiv.org/abs/2006.13334

作者:

Ian Leifer,Flaviano Morone,Saulo D. S. Reis,Jose S. Andrade Jr.,Mariano Sigman,Hernan A. Makse


Abstract:We show that logic computational circuits in gene regulatory networks arise from a fibration symmetry breaking in the network structure. From this idea we implement a constructive procedure that reveals a hierarchy of genetic circuits, ubiquitous across species, that are surprising analogues to the emblematic circuits of solid-state electronics: starting from the transistor and progressing to ring oscillators, current-mirror circuits to toggle switches and flip-flops. These canonical variants serve fundamental operations of synchronization and clocks (in their symmetric states) and memory storage (in their broken symmetry states). These conclusions introduce a theoretically principled strategy to search for computational building blocks in biological networks, and present a systematic route to design synthetic biological circuits.

摘要:我们证明,基因调控网络中的逻辑计算电路是由网络结构中的纤维对称性破缺产生的。从这个想法出发,我们实现了一个建设性的过程,揭示了基因电路的层次结构,这些基因电路在物种间无处不在,它们与固态电子器件的典型电路惊人地相似: 从晶体管开始,逐渐发展成环形振荡器,电流-镜像电路到切换开关和触发器。这些规范变量服务于同步和时钟(处于对称状态)以及内存存储(处于对称破缺状态)的基本操作。这些结论为寻找生物网络中的计算模块提供了一种理论上的原则性策略,并为合成生物电路的设计提供了一条系统路线。



气候变化统计力学特刊简介


原文标题:

Introduction to the Special Issue on the Statistical Mechanics of Climate

地址:

http://arxiv.org/abs/2006.13495

作者:

Valerio Lucarini


Abstract:We introduce the special issue on the Statistical Mechanics of Climate published on the Journal of Statistical Physics by presenting an informal discussion of some theoretical aspects of climate dynamics that make it a topic of great interest for mathematicians and theoretical physicists. In particular, we briefly discuss its nonequilibrium and multiscale properties, the relationship between natural climate variability and climate change, the different regimes of climate response to perturbations, and critical transitions.

摘要:我们在《统计物理学杂志》上发表了一篇关于气候变化统计力学的特刊,通过对 Climate dynamics 的一些理论方面的非正式讨论来介绍它,这使它成为数学家和理论物理学家们极感兴趣的话题。特别是,我们简要地讨论了它的非平衡和多尺度特性,自然气候变化和气候变化之间的关系,气候对扰动的不同反应机制,以及临界过渡。



精确逼近极值统计量


原文标题:

Accurately approximating extreme value statistics

地址:

http://arxiv.org/abs/2006.13677

作者:

Lior Zarfaty,Eli Barkai,David A. Kessler


Abstract:We consider the extreme value statistics ofN independent and identically distributed random variables, which is a classic problem in probability theory. When N→∞, small fluctuations around the renormalized maximum of the variables are described by the Fisher-Tippett-Gnedenko theorem, which states that the distribution of maxima converges to one out of three forms. However, for a wide class of distributions, the convergence rate with N is ultra-slow. Here, we apply the Lambert scaling method which greatly accelerates the rate of convergence to the limiting Gumbel form. We also find this scaling useful when large deviations from the mean extreme value are considered.

摘要:我们考虑的极值统计N 独立和同分布的随机变量,这是一个经典的问题在21概率论N→∞,本文利用 Fisher-Tippett-Gnedenko 定理描述了变量重整化最大值周围的小波动,该定理指出最大值的分布收敛于三种形式中的一种。然而,对于一类广泛的分布,收敛速度与N 是超慢的。在这里,我们应用朗伯定标方法,大大加快了收敛速度到极限 Gumbel 形式。我们还发现,当考虑平均极值的大偏差时,这种标度也是有用的。



优先连接网络中平均场近似的批判


原文标题:

A critique of the Mean Field Approximation in preferential attachment networks

地址:

http://arxiv.org/abs/2006.13295

作者:

Matthijs Ruijgrok


Abstract:The Mean Field Approximation (MFA), or continuum method, is often used in courses on Networks to derive the degree distribution of preferential attachment networks. This method is simple and the outcome is close to the correct answer. However, this paper shows that the method is flawed in several aspects, leading to unresolvable contradictions. More importantly, the MFA is not explicitly derived from a mathematical model. An analysis of the implied model shows that it makes an approximation which is far from the truth and another one which can not be motivated in general. The success of the MFA for preferential attachment networks is therefore accidental and the method is not suitable for teaching undergraduates.

摘要:平均场近似法(MFA) ,或称连续统计法,是网络课程中常用的推导优先连接网络度分布的方法。这种方法简单,结果接近正确答案。然而,本文指出,该方法存在一些缺陷,导致了无法解决的矛盾。更重要的是,MFA 没有明确地从数学模型派生出来。对隐含模型的分析表明,隐含模型所作的近似与事实相去甚远,而且是一般情况下无法激发的近似。优先连接网络 MFA 的成功是偶然的,这种方法不适用于本科教学。



声学-草皮: 

基于声学保护隐私的

新型冠状病毒肺炎接触追踪


原文标题:

ACOUSTIC-TURF: Acoustic-based Privacy-Preserving COVID-19 Contact Tracing

地址:

http://arxiv.org/abs/2006.13362

作者:

Yuxiang Luo,Cheng Zhang,Yunqi Zhang,Chaoshun Zuo,Dong Xuan,Zhiqiang Lin,Adam C. Champion,Ness Shroff


Abstract:In this paper, we propose a new privacy-preserving, automated contact tracing system, ACOUSTIC-TURF, to fight COVID-19 using acoustic signals sent from ubiquitous mobile devices. At a high level, ACOUSTIC-TURF adaptively broadcasts inaudible ultrasonic signals with randomly generated IDs in the vicinity. Simultaneously, the system receives other ultrasonic signals sent from nearby (e.g., 6 feet) users. In such a system, individual user IDs are not disclosed to others and the system can accurately detect encounters in physical proximity with 6-foot granularity. We have implemented a prototype of ACOUSTIC-TURF on Android and evaluated its performance in terms of acoustic-signal-based encounter detection accuracy and power consumption at different ranges and under various occlusion scenarios. Experimental results show that ACOUSTIC-TURF can detect multiple contacts within a 6-foot range for mobile phones placed in pockets and outside pockets. Furthermore, our acoustic-signal-based system achieves greater precision than wireless-signal-based approaches when contact tracing is performed through walls. ACOUSTIC-TURF correctly determines that people on opposite sides of a wall are not in contact with one another, whereas the Bluetooth-based approaches detect nonexistent contacts among them.

摘要:在这篇论文中,我们提出了一种新的保护隐私的自动接触追踪系统 acoutic-turf,它可以利用无处不在的移动设备发出的声音信号来对抗新型冠状病毒肺炎。在高水平上,ACOUSTIC-TURF 利用附近随机产生的 id 自适应广播不可听见的超声波信号。同时,系统接收来自附近用户(例如,6英尺)的其他超声波信号。在这样一个系统中,个人用户 id 不会向其他用户公开,系统可以在物理距离近到6英尺的粒度上准确地探测到遭遇。我们已经在 Android 平台上实现了 acoutic-turf 的原型机,并根据基于声学信号的遭遇检测精度和在不同范围和不同遮挡情况下的功耗对其性能进行了评估。实验结果表明,ACOUSTIC-TURF 可以检测6英尺范围内的多个接触,为手机放置在口袋和外部口袋。此外,我们的声学信号为基础的系统达到更高的精度比无线信号为基础的方法时,接触跟踪是通过墙壁执行。声学-草皮可以正确地判断墙壁两侧的人们之间没有接触,而基于蓝牙的方法可以检测到他们之间不存在的接触。



赢得竞争: 

在类似SIS 的流行过程中加强对抗传染


原文标题:

Winning the competition: enhancing counter-contagion in SIS-like epidemic processes

地址:

http://arxiv.org/abs/2006.13395

作者:

Argyris Kalogeratos,Stefano Sarao Mannelli


Abstract:In this paper we consider the epidemic competition between two generic diffusion processes, where each competing side is represented by a different state of a stochastic process. For this setting, we present the Generalized Largest Reduction in Infectious Edges (gLRIE) dynamic resource allocation strategy to advantage the preferred state against the other. Motivated by social epidemics, we apply this method to a generic continuous-time SIS-like diffusion model where we allow for: i) arbitrary node transition rate functions that describe the dynamics of propagation depending on the network state, and ii) competition between the healthy (positive) and infected (negative) states, which are both diffusive at the same time, yet mutually exclusive on each node. Finally we use simulations to compare empirically the proposed gLRIE against competitive approaches from literature.

摘要:在本文中,我们考虑了两个通用扩散过程之间的传染病竞争,其中每个竞争方由一个随机过程的不同状态表示。在这种情况下,我们提出了广义最大感染边约简(gLRIE)动态资源分配策略,以优化首选状态。受社会流行病的影响,我们将这种方法应用到一般的连续时间 SIS- 类扩散模型中,其中我们允许: i)任意的节点转移率函数描述依赖于网络状态的传播动态,ii)健康(正)和感染(负)状态之间的竞争,这两种状态同时扩散,但在每个节点上相互排斥。最后,我们利用模拟实验来比较所提出的 gLRIE 与文献中的竞争方法。



维基百科和威斯敏斯特: 

英国政客维基百科页面的质量和动态


原文标题:

Wikipedia and Westminster: Quality and Dynamics of Wikipedia Pages about UK Politicians

地址:

http://arxiv.org/abs/2006.13400

作者:

Pushkal Agarwal,Miriam Redi,Nishanth Sastry,Edward Wood,Andrew Blick


Abstract:Wikipedia is a major source of information providing a large variety of content online, trusted by readers from around the world. Readers go to Wikipedia to get reliable information about different subjects, one of the most popular being living people, and especially politicians. While a lot is known about the general usage and information consumption on Wikipedia, less is known about the life-cycle and quality of Wikipedia articles in the context of politics. The aim of this study is to quantify and qualify content production and consumption for articles about politicians, with a specific focus on UK Members of Parliament (MPs). First, we analyze spatio-temporal patterns of readers' and editors' engagement with MPs' Wikipedia pages, finding huge peaks of attention during election times, related to signs of engagement on other social media (e.g. Twitter). Second, we quantify editors' polarisation and find that most editors specialize in a specific party and choose specific news outlets as references. Finally we observe that the average citation quality is pretty high, with statements on 'Early life and career' missing citations most often (18%).

摘要:维基百科是提供大量在线内容的主要信息来源,受到世界各地读者的信任。读者去维基百科是为了获得关于不同主题的可靠信息,而这些主题是当代最受欢迎的人物之一,尤其是政治家。虽然人们对维基百科的一般使用情况和信息消耗了解很多,但对于维基百科文章在政治背景下的生命周期和质量却知之甚少。这项研究的目的是量化和限制有关政治家的文章内容的生产和消费,特别关注于英国国会议员(下院议员)。首先,我们分析了读者和编辑参与国会议员维基百科页面的时空模式,发现在选举期间,与其他社交媒体(如 Twitter)参与迹象相关的关注度达到了高峰。其次,我们量化了编辑的两极分化,发现大多数编辑专注于特定的党派,并选择特定的新闻媒体作为参考。最后,我们观察到平均引用质量相当高,“早期生活和职业”的陈述最常缺少引用(18%)。



在线竞争影响力最大化


原文标题:

Online Competitive Influence Maximization

地址:

http://arxiv.org/abs/2006.13411

作者:

Jinhang Zuo,Xutong Liu,Carlee Joe-Wong,John C. S. Lui,Wei Chen


Abstract:Online influence maximization has attracted much attention as a way to maximize influence spread through a social network while learning the values of unknown network parameters. Most previous works focus on single-item diffusion. In this paper, we introduce a new Online Competitive Influence Maximization (OCIM) problem, where two competing items (e.g., products, news stories) propagate in the same network and influence probabilities on edges are unknown. We adapt the combinatorial multi-armed bandit (CMAB) framework for the OCIM problem, but unlike the non-competitive setting, the important monotonicity property (influence spread increases when influence probabilities on edges increase) no longer holds due to the competitive nature of propagation, which brings a significant new challenge to the problem. We prove that the Triggering Probability Modulated (TPM) condition for CMAB still holds, and then utilize the property of competitive diffusion to introduce a new offline oracle, and discuss how to implement this new oracle in various cases. We propose an OCIM-OIFU algorithm with such an oracle that achieves logarithmic regret. We also design an OCIM-ETC algorithm that has worse regret bound but requires less feedback and easier offline computation. Our experimental evaluations demonstrate the effectiveness of our algorithms.

摘要:网络影响力最大化作为一种在学习未知网络参数值的同时,通过社会网络最大化影响力传播的方式,已经引起了人们的广泛关注。以往的研究大多集中在单项扩散上。本文提出了一个新的在线竞争影响力最大化问题(OCIM) ,其中两个竞争项(如产品、新闻报道)在同一网络中传播,边缘的影响概率是未知的。我们将组合多臂老虎机框架应用于 OCIM 问题,但与非竞争性设置不同的是,由于传播的竞争性质,重要的单调性质(影响扩散随着影响边概率的增加而增加)不再成立,这给该问题带来了新的重大挑战。证明了 CMAB 的触发概率调制(TPM)条件仍然成立,并利用竞争扩散性质引入了一个新的离线预言,讨论了在不同情况下如何实现这个新的预言。本文提出了一种 OCIM-OIFU 算法,该算法可以实现对数遗憾。我们还设计了一个 OCIM-ETC 算法,该算法具有较差的后悔界,但需要较少的反馈和更容易的脱机计算。我们的实验评估证明了我们的算法的有效性。



基于联合演员表征和

社交媒体情感的电影票房预测


原文标题:

Movie Box office Prediction via Joint Actor Representations and Social Media Sentiment

地址:

http://arxiv.org/abs/2006.13417

作者:

Dezhou Shen


Abstract:In recent years, driven by the Asian film industry, such as China and India, the global box office has maintained a steady growth trend. Previous studies have rarely used long-term, full-sample film data in analysis, lack of research on actors' social networks. Existing film box office prediction algorithms only use film meta-data, lack of using social network characteristics and the model is less interpretable. I propose a FC-GRU-CNN binary classification model in of box office prediction task, combining five characteristics, including the film meta-data, Sina Weibo text sentiment, actors' social network measurement, all pairs shortest path and actors' art contribution. Exploiting long-term memory ability of GRU layer in long sequences and the mapping ability of CNN layer in retrieving all pairs shortest path matrix features, proposed model is 14% higher in accuracy than the current best C-LSTM model.

摘要:近年来,在中国、印度等亚洲电影产业的推动下,全球票房保持了稳步增长的趋势。以前的研究很少使用长期的、全样本的电影数据进行分析,缺乏对演员社交网络的研究。现有的电影票房预测算法仅使用电影元数据,缺乏利用社会网络的特点和模型的可解释性。结合电影元数据、新浪微博文本情感、演员社交网络测量、所有对的最短路径和演员的艺术贡献等五个特征,提出了一个 FC-GRU-CNN 票房预测任务的二元分类模型。利用长序列中 GRU 层的长时记忆能力和 CNN 层的映射能力提取所有对的最短路径矩阵特征,该模型的准确率比目前最优的 C-LSTM 模型高14%。



可证明和有效地使用 Turán 

阴影近似近派系: PEANUTS


原文标题:

Provably and Efficiently Approximating Near-cliques using the Turán Shadow: PEANUTS

地址:

http://arxiv.org/abs/2006.13483

作者:

Shweta Jain,C. Seshadhri


Abstract:Clique and near-clique counts are important graph properties with applications in graph generation, graph modeling, graph analytics, community detection among others. They are the archetypal examples of dense subgraphs. While there are several different definitions of near-cliques, most of them share the attribute that they are cliques that are missing a small number of edges. Clique counting is itself considered a challenging problem. Counting near-cliques is significantly harder more so since the search space for near-cliques is orders of magnitude larger than that of cliques. We give a formulation of a near-clique as a clique that is missing a constant number of edges. We exploit the fact that a near-clique contains a smaller clique, and use techniques for clique sampling to count near-cliques. This method allows us to count near-cliques with 1 or 2 missing edges, in graphs with tens of millions of edges. To the best of our knowledge, there was no known efficient method for this problem, and we obtain a 10x - 100x speedup over existing algorithms for counting near-cliques. Our main technique is a space-efficient adaptation of the Tur\'an Shadow sampling approach, recently introduced by Jain and Seshadhri (WWW 2017). This approach constructs a large recursion tree (called the Tur\'an Shadow) that represents cliques in a graph. We design a novel algorithm that builds an estimator for near-cliques, using an online, compact construction of the Tur\'an Shadow.

摘要:团计数和近团计数是图的重要属性,在图形生成、图形建模、图形分析、社区检测等领域有着广泛的应用。它们是稠密子图的典型例子。虽然近派系有几种不同的定义,但大多数都有一个共同的属性,那就是它们都缺少一些边。集团计数本身就被认为是一个具有挑战性的问题。计算近派系的难度要大得多,因为近派系的搜索空间比数量级的搜索空间大得多。我们给出了一个近派的公式作为一个小团是缺少一个固定数量的边。我们利用了这样一个事实,即近团体包含一个较小的团体,并使用团体抽样技术计数近团体。这种方法允许我们计算具有数千万条边的图中缺失1条或2条边的近似团。据我们所知,没有已知的有效方法来解决这个问题,我们获得了一个10倍-100倍的加速比现有的算法计算近派系。我们的主要技术是对最近由 Jain 和 Seshadhri (WWW 2017)引入的 Tur‘ an 阴影采样方法的有效空间适应。这种方法构造了一个大型的递归树(称为 Tur‘ an Shadow) ,用于表示图中的小团。我们设计了一个新的算法,使用在线,紧凑的图尔安阴影构造近派系的估计器。



少即是多: 

利用社会信任提高欺骗攻击的有效性


原文标题:

Less is More: Exploiting Social Trust to Increase the Effectiveness of a Deception Attack

地址:

http://arxiv.org/abs/2006.13499

作者:

Shahryar Baki,Rakesh M. Verma,Arjun Mukherjee,Omprakash Gnawali


Abstract:Cyber attacks such as phishing, IRS scams, etc., still are successful in fooling Internet users. Users are the last line of defense against these attacks since attackers seem to always find a way to bypass security systems. Understanding users' reason about the scams and frauds can help security providers to improve users security hygiene practices. In this work, we study the users' reasoning and the effectiveness of several variables within the context of the company representative fraud. Some of the variables that we study are: 1) the effect of using LinkedIn as a medium for delivering the phishing message instead of using email, 2) the effectiveness of natural language generation techniques in generating phishing emails, and 3) how some simple customizations, e.g., adding sender's contact info to the email, affect participants perception. The results obtained from the within-subject study show that participants are not prepared even for a well-known attack - company representative fraud. Findings include: approximately 65% mean detection rate and insights into how the success rate changes with the facade and correspondent (sender/receiver) information. A significant finding is that a smaller set of well-chosen strategies is better than a large `mess' of strategies. We also find significant differences in how males and females approach the same company representative fraud. Insights from our work could help defenders in developing better strategies to evaluate their defenses and in devising better training strategies.

摘要:网络攻击如网络钓鱼、国税局诈骗等,仍然能够成功地欺骗互联网用户。用户是抵御这些攻击的最后一道防线,因为攻击者似乎总能找到绕过安全系统的方法。了解用户对诈骗和欺诈的理由,可以帮助保安提供者改善用户的安全卫生做法。在本文中,我们研究了在公司代表人欺诈的背景下,用户的推理和几个变量的有效性。我们研究的一些变量是: 1)使用 LinkedIn 作为传递钓鱼信息的媒介而不是使用电子邮件的效果; 2)自然语言生成技术在产生钓鱼邮件方面的有效性; 3)一些简单的定制,例如,在电子邮件中添加发件人的联系信息,如何影响参与者的感知。从内部研究得出的结果表明,参与者甚至对众所周知的攻击性公司代表欺诈都没有准备。研究结果包括: 大约65% 的平均检测率和成功率如何变化的外观和通信者(发送者 / 接收者)信息的洞察力。一个重要的发现是,一组精心选择的策略比一大堆策略要好。我们还发现,男性和女性在如何处理同一家公司的代表欺诈行为上也存在显著差异。从我们的工作中得到的见解可以帮助防守者制定更好的战略来评估他们的防守,并设计更好的训练策略。



在线滥用行为数据集中注释一致性分析


原文标题:

On Analyzing Annotation Consistency in Online Abusive Behavior Datasets

地址:

http://arxiv.org/abs/2006.13507

作者:

Md Rabiul Awal,Rui Cao,Roy Ka-Wei Lee,Sandra Mitrović


Abstract:Online abusive behavior is an important issue that breaks the cohesiveness of online social communities and even raises public safety concerns in our societies. Motivated by this rising issue, researchers have proposed, collected, and annotated online abusive content datasets. These datasets play a critical role in facilitating the research on online hate speech and abusive behaviors. However, the annotation of such datasets is a difficult task; it is often contentious on what should be the true label of a given text as the semantic difference of the labels may be blurred (e.g., abusive and hate) and often subjective. In this study, we proposed an analytical framework to study the annotation consistency in online hate and abusive content datasets. We applied our proposed framework to evaluate the consistency of the annotation in three popular datasets that are widely used in online hate speech and abusive behavior studies. We found that there is still a substantial amount of annotation inconsistency in the existing datasets, particularly when the labels are semantically similar.

摘要:网络虐待行为是一个重要的问题,它破坏了网络社区的凝聚力,甚至引发了社会中的公共安全问题。受这一新兴问题的推动,研究人员提出、收集并注释了在线辱骂性内容数据集。这些数据集在促进对网络仇恨言论和虐待行为的研究方面发挥了关键作用。然而,这类数据集的注释是一项困难的任务; 对于给定文本的真实标签往往存在争议,因为标签的语义差异可能模糊不清(例如辱骂和仇恨) ,而且往往是主观的。在本研究中,我们提出了一个分析框架来研究在线讨厌和滥用内容数据集中的注释一致性。我们应用我们提出的框架来评估三个流行数据集中的注释的一致性,这三个数据集广泛应用于在线仇恨言论和辱骂行为研究。我们发现在现有的数据集中仍然存在大量的注释不一致,特别是当标签在语义上相似时。



概率节点失效模型下的网络连通性


原文标题:

Network connectivity under a probabilistic node failure model

地址:

https://arxiv.org/abs/2006.13551

作者:

Lucia Cavallaro,Stefania Costantini,Pasquale De Meo,Antonio Liotta,Giovanni Stilo


Abstract:Centrality metrics have been widely applied to identify the nodes in a graph whose removal is effective in decomposing the graph into smaller sub-components. The node--removal process is generally used to test network robustness against failures. Most of the available studies assume that the node removal task is always successful. Yet, we argue that this assumption is unrealistic. Indeed, the removal process should take into account also the strength of the targeted node itself, to simulate the failure scenarios in a more effective and realistic fashion. Unlike previous literature, herein a {\em probabilistic node failure model} is proposed, in which nodes may fail with a particular probability, considering two variants, namely: {\em Uniform} (in which the nodes survival-to-failure probability is fixed) and {\em Best Connected} (BC) (where the nodes survival probability is proportional to their degree). To evaluate our method, we consider five popular centrality metrics carrying out an experimental, comparative analysis to evaluate them in terms of {\em effectiveness} and {\em coverage}, on four real-world graphs. By effectiveness and coverage we mean the ability of selecting nodes whose removal decreases graph connectivity the most. Specifically, the graph spectral radius reduction works as a proxy indicator of effectiveness, and the reduction of the largest connected component (LCC) size is a parameter to assess coverage. The metric that caused the biggest drop has been then compared with the Benchmark analysis (i.e, the non-probabilistic degree centrality node removal process) to compare the two approaches. The main finding has been that significant differences emerged through this comparison with a deviation range that varies from 2\% up to 80\% regardless of the dataset used that highlight the existence of a gap between the common practice with a more realistic approach.

摘要:中心性度量被广泛地应用于识别图中的节点,它的去除可以有效地将图分解为更小的子组件。节点删除过程通常用于测试网络对失败的健壮性。现有的大多数研究假定节点移除任务总是成功的。然而,我们认为这种假设是不现实的。事实上,移除过程还应该考虑到目标节点本身的强度,以更有效和更现实的方式模拟故障场景。不同于以往的文献,本文提出了一个{ em 概率节点失效模型} ,该模型考虑两个变量,即{ em 一致性}(其中节点的生存到失效概率是固定的)和{ em 最佳连接}(BC)(其中节点的生存概率与节点的度成正比)。为了评估我们的方法,我们考虑了五个流行的中心性度量进行了一个实验,比较分析,以评估他们的{ em 有效性}和{ em 覆盖率} ,在四个现实世界的图。所谓有效性和覆盖率,是指选择删除率降低最多的节点的能力。具体来说,图谱半径缩减作为有效性的一个代理指标,而最大连接元件(图论)缩减(LCC)大小是评估覆盖率的一个参数。然后将导致最大降低的度量与 Benchmark 分析(即非概率度中心性节点删除过程)进行比较,以比较两种方法。主要的发现是,通过这种比较,出现了显著的差异,偏差范围从2% 到80% 不等,而不管使用的数据集突出表明了普通实践与更现实的方法之间存在差距。



基于模糊图反馈的在线稠密子图发现


原文标题:

Online Dense Subgraph Discovery via Blurred-Graph Feedback

地址:

https://arxiv.org/abs/2006.13642

作者:

Yuko Kuroki,Atsushi Miyauchi,Junya Honda,Masashi Sugiyama


AbstractDense subgraph discovery aims to find a dense component in edge-weighted graphs. This is a fundamental graph-mining task with a variety of applications and thus has received much attention recently. Although most existing methods assume that each individual edge weight is easily obtained, such an assumption is not necessarily valid in practice. In this paper, we introduce a novel learning problem for dense subgraph discovery in which a learner queries edge subsets rather than only single edges and observes a noisy sum of edge weights in a queried subset. For this problem, we first propose a polynomial-time algorithm that obtains a nearly-optimal solution with high probability. Moreover, to deal with large-sized graphs, we design a more scalable algorithm with a theoretical guarantee. Computational experiments using real-world graphs demonstrate the effectiveness of our algorithms.

摘要:稠密子图发现旨在寻找边加权图中的稠密分量。这是一项基本的图形挖掘任务,应用广泛,近年来受到广泛关注。虽然现有的大多数方法假设每个单独的边权重是容易获得的,这样的假设在实践中不一定有效。本文提出了一种新的稠密子图发现学习问题,其中学习者查询边子图,而不仅仅是查询单个边,并在查询子图中观察边权重的噪声和。对于这个问题,我们首先提出了一个多项式时间算法,得到了一个近似最优的解与高概率。此外,为了处理大尺寸的图,我们设计了一个具有理论保证的更加可伸缩的算法。使用真实世界图表的计算实验证明了我们的算法的有效性。



复杂网络分析中 

laplace 伪逆的对角逼近


原文标题:

Approximation of the Diagonal of a Laplacian's Pseudoinverse for Complex Network Analysis

地址:

http://arxiv.org/abs/2006.13679

作者:

Eugenio Angriman,Maria Predari,Alexander van der Grinten,Henning Meyerhenke

Abstract:The ubiquity of massive graph data sets in numerous applications requires fast algorithms for extracting knowledge from these data. We are motivated here by three electrical measures for the analysis of large small-world graphs G=(V,E) -- i.e., graphs with diameter in O(log|V|), which are abundant in complex network analysis. From a computational point of view, the three measures have in common that their crucial component is the diagonal of the graph Laplacian's pseudoinverse, L†. Computing diag(L†) exactly by pseudoinversion, however, is as expensive as dense matrix multiplication -- and the standard tools in practice even require cubic time. Moreover, the pseudoinverse requires quadratic space -- hardly feasible for large graphs. Resorting to approximation by, e.g., using the Johnson-Lindenstrauss transform, requires the solution of O(log|V|/ϵ2) Laplacian linear systems to guarantee a relative error, which is still very expensive for large inputs.
In this paper, we present a novel approximation algorithm that requires the solution of only one Laplacian linear system. The remaining parts are purely combinatorial -- mainly sampling uniform spanning trees, which we relate to diag
(L†) via effective resistances. For small-world networks, our algorithm obtains a ±ϵ-approximation with high probability, in a time that is nearly-linear in |E| and quadratic in 1/ϵ. Another positive aspect of our algorithm is its parallel nature due to independent sampling. We thus provide two parallel implementations of our algorithm: one using OpenMP, one MPI + OpenMP. In our experiments against the state of the art, our algorithm (i) yields more accurate results, (ii) is much faster and more memory-efficient, and (iii) obtains good parallel speedups, in particular in the distributed setting.
摘要:海量图形数据集在众多应用中无处不在,需要快速的算法从这些数据中提取知识。我们在这里的动机是为了分析大的小世界图的三个电气措施G=(V,E) 在复杂网络分析中,网络分析是一种非常丰富的技术。从计算的角度来看,这三个测度有一个共同点,它们的关键部分是图的拉普拉斯赝逆的对角线,L†. 。计算诊所(L†) 然而,准确地通过伪逆运算就像稠密的矩阵乘法一样昂贵——而且实际中的标准工具甚至需要三次方的时间。此外,伪逆需要二次空间——对于大图来说几乎不可行。借助于近似,例如,使用约翰逊-林登-施特劳斯变换,需要求解O(log|V|/ϵ2) 拉普拉斯线性系统,以保证相对误差,这仍然是非常昂贵的大型输入, 在本文中,我们提出了一种新的近似演算法,它只需要一个拉普拉斯线性系统的解。其余部分是纯粹的组合,主要是抽样均匀生成树,我们涉及 diag(L†)  高概率近似,在一定条件下近似为线性|E| ,然后平方1/ϵ.  另一个积极的方面,我们的算法是它的并行性质,由于独立采样。因此,我们提供了两个并行实现我们的算法: 一个使用 OpenMP,一个 MPI + OpenMP。在我们针对现有技术的实验中,我们的算法(i)产生更准确的结果,(ii)更快和内存效率更高,(iii)获得良好的并行加速,特别是在分布式设置中。



连接的力量: 

利用网络分析促进应收账款融资


原文标题:

The Power of Connection: Leveraging Network Analysis to Advance Receivable Financing

地址:

http://arxiv.org/abs/2006.13738

作者:

Ilaria Bordino,Francesco Gullo,Giacomo Legnaro

Abstract:Receivable financing is the process whereby cash is advanced to firms against receivables their customers have yet to pay: a receivable can be sold to a funder, which immediately gives the firm cash in return for a small percentage of the receivable amount as a fee. Receivable financing has been traditionally handled in a centralized way, where every request is processed by the funder individually and independently of one another. In this work we propose a novel, network-based approach to receivable financing, which enables customers of the same funder to autonomously pay each other as much as possible, and gives benefits to both the funder (reduced cash anticipation and exposure risk) and its customers (smaller fees and lightweight service establishment). Our main contributions consist in providing a principled formulation of the network-based receivable-settlement strategy, and showing how to achieve all algorithmic challenges posed by the design of this strategy. We formulate network-based receivable financing as a novel combinatorial-optimization problem on a multigraph of receivables. We show that the problem is NP-hard, and devise an exact branch-and-bound algorithm, as well as algorithms to efficiently find effective approximate solutions. Our more efficient algorithms are based on cycle enumeration and selection, and exploit a theoretical characterization in terms of a knapsack problem, as well as a refining strategy that properly adds paths between cycles. We also investigate the real-world issue of avoiding temporary violations of the problem constraints, and design methods for handling it. An extensive experimental evaluation is performed on real receivable data. Results attest the good performance of our methods.
摘要:应收款融资是指将现金预付给企业,作为其客户尚未支付的应收款的抵押: 应收款可以出售给投资者,投资者立即将应收款的一小部分作为费用返还给企业现金。应收款融资传统上是以集中方式处理的,即每一项请求都由出资方单独或相互独立地处理。在这项工作中,我们提出了一种新颖的、基于网络的应收账款融资方法,这种方法使同一投资者的客户能够尽可能自主地支付彼此的账款,并使投资者及其客户双方都受益(降低现金预期和暴露风险)(收费较少和服务机构较轻)。我们的主要贡献在于为基于网络的应收账款结算战略提供了一个有原则的制定方案,并展示了如何实现该战略设计带来的所有算法挑战。我们将基于网络的应收账款融资问题描述为一个多应收账款图上的新的组合优化问题。我们证明了该问题是 np 难的,并设计了一个精确的分枝定界算法,以及有效地找到有效的近似解的算法。我们更有效的算法是基于循环枚举和选择,利用一个理论上的角色塑造背包问题,以及一个改进策略,正确地增加循环之间的路径。我们还调查了避免临时违反问题约束的现实问题,以及处理它的设计方法。对实际应收数据进行了广泛的实验评估。实验结果证明了该方法的有效性。


团体运动比赛中的竞技平衡


原文标题:

Competitive Balance in Team Sports Games

地址:

http://arxiv.org/abs/2006.13763

作者:

Sofia M Nikolakaki,Ogheneovo Dibie,Ahmad Beirami,Nicholas Peterson,Navid Aghdaie,Kazi Zaman

Abstract:Competition is a primary driver of player satisfaction and engagement in multiplayer online games. Traditional matchmaking systems aim at creating matches involving teams of similar aggregated individual skill levels, such as Elo score or TrueSkill. However, team dynamics cannot be solely captured using such linear predictors. Recently, it has been shown that nonlinear predictors that target to learn probability of winning as a function of player and team features significantly outperforms these linear skill-based methods. In this paper, we show that using final score difference provides yet a better prediction metric for competitive balance. We also show that a linear model trained on a carefully selected set of team and individual features achieves almost the performance of the more powerful neural network model while offering two orders of magnitude inference speed improvement. This shows significant promise for implementation in online matchmaking systems.
摘要:在多人在线游戏中,竞争是玩家满意度和参与度的主要驱动力。传统的匹配系统旨在创造匹配,包括具有相似聚合个人技能水平的团队,如 Elo 得分或 TrueSkill。然而,团队动力学不能仅仅用这样的线性预测来捕捉。最近的研究表明,非线性预测的目标学习概率的成功率作为一个函数的球员和团队特征明显优于这些线性技能为基础的方法。在本文中,我们表明,使用最终得分差异提供了一个更好的竞争平衡的预测度量。我们还表明,一个线性模型训练了一组精心选择的团队和个人特征,实现了几乎更强大的神经网络模型的性能,同时提供了2个数量级 / 推理速度提高。这显示了在线匹配系统中实现的重要前景。


量化应对全球紧急情况的政策:

来自新型冠状病毒

肺炎流感大流行的启示


原文标题:

Quantifying Policy Responses to a Global Emergency: Insights from the COVID-19 Pandemic

地址:

http://arxiv.org/abs/2006.13853

作者:

Jian Gao,Yian Yin,Benjamin F. Jones,Dashun Wang

Abstract:Public policy must confront emergencies that evolve in real time and in uncertain directions, yet little is known about the nature of policy response. Here we take the coronavirus pandemic as a global and extraordinarily consequential case, and study the global policy response by analyzing a novel dataset recording policy documents published by government agencies, think tanks, and intergovernmental organizations (IGOs) across 114 countries (37,725 policy documents from Jan 2nd through May 26th 2020). Our analyses reveal four primary findings. (1) Global policy attention to COVID-19 follows a remarkably similar trajectory as the total confirmed cases of COVID-19, yet with evolving policy focus from public health to broader social issues. (2) The COVID-19 policy frontier disproportionately draws on the latest, peer-reviewed, and high-impact scientific insights. Moreover, policy documents that cite science appear especially impactful within the policy domain. (3) The global policy frontier is primarily interconnected through IGOs, such as the WHO, which produce policy documents that are central to the COVID19 policy network and draw especially strongly on scientific literature. Removing IGOs' contributions fundamentally alters the global policy landscape, with the policy citation network among government agencies increasingly fragmented into many isolated clusters. (4) Countries exhibit highly heterogeneous policy attention to COVID-19. Most strikingly, a country's early policy attention to COVID-19 shows a surprising degree of predictability for the country's subsequent deaths. Overall, these results uncover fundamental patterns of policy interactions and, given the consequential nature of emergent threats and the paucity of quantitative approaches to understand them, open up novel dimensions for assessing and effectively coordinating global and local responses to COVID-19 and beyond.
摘要:公共政策必须面对实时和不确定方向演变的紧急情况,然而对于政策反应的性质却知之甚少。在这里,我们把冠状病毒大流行作为一个全球性和非常重要的案例,并通过分析一个新的数据集记录政策文件,由政府机构,智囊团和政府间组织(政府间组织)发表在114个国家(37,725政策文件从1月2日至2020年5月26日)的全球政策响应。我们的分析揭示了四个主要的发现。(1)全球对新型冠状病毒肺炎的政策关注与新型冠状病毒肺炎确诊病例总数的轨迹非常相似,但政策重点从公共卫生到更广泛的社会问题不断变化。(2)新型冠状病毒肺炎政策前沿不成比例地吸收了最新的、经过同行评议的、影响力巨大的科学见解。此外,引用科学的政策文件在政策领域内显得尤其有影响力。(3)全球政策前沿主要通过诸如卫生组织等政府间组织相互联系,卫生组织编写的政策文件是 COVID19政策网络的核心,并特别大量利用科学文献。消除政府间组织的贡献从根本上改变了全球政策格局,政府机构之间的政策引用网络越来越分散成许多孤立的集群。(4)各国对新型冠状病毒肺炎组织的政策关注高度异质化。最引人注目的是,一个国家早期对新型冠状病毒肺炎的政策关注,显示了这个国家随后死亡的惊人可预测程度。总的来说,这些结果揭示了政策互动的基本模式,并且,考虑到紧急威胁的重要性和缺乏定量方法来理解它们,开辟了新的维度来评估和有效协调全球和地方对新型冠状病毒肺炎及以后的反应。


量化县际流动模式对美国

新型冠状病毒肺炎暴发的影响


原文标题:

Quantifying the influence of inter-county mobility patterns on the COVID-19 outbreak in the United States

地址:

http://arxiv.org/abs/2006.13860

作者:

Qianqian Sun,Yixuan Pan,Weiyi Zhou,Chenfeng Xiong,Lei Zhang

Abstract:As a highly infectious respiratory disease, COVID-19 has become a pandemic that threatens global health. Without an effective treatment, non-pharmaceutical interventions, such as travel restrictions, have been widely promoted to mitigate the outbreak. Current studies analyze mobility metrics such as travel distance; however, there is a lack of research on interzonal travel flow and its impact on the pandemic. Our study specifically focuses on the inter-county mobility pattern and its influence on the COVID-19 spread in the United States. To retrieve real-world mobility patterns, we utilize an integrated set of mobile device location data including over 100 million anonymous devices. We first investigate the nationwide temporal trend and spatial distribution of inter-county mobility. Then we zoom in on the epicenter of the U.S. outbreak, New York City, and evaluate the impacts of its outflow on other counties. Finally, we develop a "log-linear double-risk" model at the county level to quantify the influence of both "external risk" imported by inter-county mobility flows and the "internal risk" defined as the vulnerability of a county in terms of population with high-risk phenotypes. Our study enhances the situation awareness of inter-county mobility in the U.S. and can help improve non-pharmaceutical interventions for COVID-19.
摘要:作为一个高度传染性的唿吸系统疾病,新型冠状病毒肺炎已经成为威胁全球健康的流行病。如果没有有效的治疗,非药物干预措施,如旅行限制,已被广泛推广,以减轻疫情。目前的研究分析流动性指标,如旅行距离,然而,缺乏研究跨区域旅行流及其对流行病的影响。我们的研究主要集中在县际流动模式及其对新型冠状病毒肺炎在美国传播的影响。为了检索真实世界的移动模式,我们利用了一组集成的移动设备位置数据,其中包括超过1亿个匿名设备。我们首先调查了全国县际流动的时间趋势和空间分布。然后我们放大到美国疫情爆发的中心,纽约市,并评估其外流对其他县的影响。最后,我们在县一级建立了一个“对数线性双重风险”模型,以量化县际流动带来的“外部风险”和定义为一个县在高风险表型人口方面的脆弱性的“内部风险”的影响。我们的研究提高了美国县际流动性的状况意识,并且可以帮助改善新型冠状病毒肺炎的非药物干预。



来源:集智斑图
编辑:王建萍


近期网络科学论文速递


近距离感染传播的蒙特卡罗模拟研究 | 网络科学论文速递39篇

英国新冠肺炎禁闭: 对空气污染有什么影响 | 网络科学论文速递21篇

新型冠状病毒肺炎在不同社区传播的 SIR 模型假设 | 网络科学论文速递30篇





集智俱乐部QQ群|877391004

商务合作及投稿转载|swarma@swarma.org

◆ ◆ ◆

搜索公众号:集智俱乐部


加入“没有围墙的研究所”

让苹果砸得更猛烈些吧!

您可能也对以下帖子感兴趣

文章有问题?点此查看未经处理的缓存