前沿综述:集体智能与深度学习的交叉进展
导语
当前最先进的深度学习模型存在一些根本问题,而来自集体智能,特别是复杂系统的概念,往往可以提供有效解决方案。事实上,复杂系统领域的概念,如元胞自动机、自组织、涌现、集体行为,与人工神经网络有着悠久的联系。2021年12月arxiv上刊载的一篇综述文章,梳理了复杂系统与神经网络相互交织的发展历程,从图像处理、深度强化学习、多智能体学习、元学习等几个活跃领域,展示了结合集体智能原理对深度学习研究的推动作用。
研究领域:深度学习,集体智能,复杂系统,元胞自动机,自组织,涌现
David Ha, Yujin Tang | 作者
陈斯信 | 译者
梁金 | 审校
邓一雪 | 编辑
论文题目:
Collective Intelligence for Deep Learning: A Survey of Recent Developments论文链接:https://arxiv.org/abs/2111.14377
目录
摘要
一、引言
二、历史背景
三、深度学习中的集体智能
1、图像处理
2、深度强化学习
3、多智能体学习
4、元学习
四、将深度学习用于集体智能研究
五、结论
摘要
摘要
一、引言
一、引言
图1. GPU硬件的近期发展使我们能对数以千计的机器人进行现实三维模拟,就像本图所展现的Rudin等人所做的一样[53]。这样的进步使我们能对大规模、可以进行互动并协同发展智能行为的的人工智能体进行三维模拟。
二、历史背景
二、历史背景
三、集体智能用于深度学习
三、集体智能用于深度学习
图2. Randazzo等人创建的用于识别MNIST数字的神经元胞自动机[50]还有一个在线的可互动的演示程序。每个元胞只允许看到一个像素的内容,并与它的邻居交流。随着时间推移,这些元胞将就哪个数字是最有可能的形成共识。但有趣的是,分歧还是会因为像素位置而产生,特别是在图像故意糅合了不同数字的时候。
1、图像处理
2、深度强化学习
图4. 在二维和三维空间里模拟软体机器人的例子。每个细胞代表一个独立的神经网络,具有局部感知功能,能产生局部的动作,包括与相邻细胞进行交流。训练这些系统来完成各种任务,不仅涉及到训练神经网络,还涉及到对形成智能体形态的软体细胞的设计和布置。图片来自Horibe等人的研究[28]。
图5. 对特定具有固定形态的机器人,传统强化学习方法训练的是一个特定的策略。但最近的工作,比如本图展现的Huang等人的研究[29],试图训练一个单一的模块神经网络,负责控制机器人的一个部分。因此,每个机器人的全局策略是这些相同的模块化神经网络协调的结果。他们表明,这样的系统可以在各种不同的骨架结构中通用,从跳虫到四足步行类,甚至是一些未见过的形态。
图6. 自组织也使得强化学习训练环境中的系统能够在给定任务时进行自配置(自设计)。Pathak等人[44]探讨了这种动态和模块化的智能体,并标明它们不仅可以泛化到未见过的环境,还可以泛化到由额外模块组成的未见过的形态。
图7. Tang和Ha[65]探讨了利用了自组织和注意力特性的强化学习智能体,这种智能体将它们的观测视为一个包含了感官输入的,任意排序的、长度可变的列表。他们将视觉任务(如CarRacing何Atari Pong[4,66])中的输入划分为一个二维网格,并打乱了顺序(左图)。他们还在连续控制任务[18]中,以乱序增加了许多额外的噪音输入通道(右图);智能体必须学会识别哪些输入是有用的。系统中的每个感觉神经元都收到各自的输入流,并通过协调来完成手头的任务。
3、多智能体学习
图8. MAgent[74]是一套环境,在网格世界中,大量的像素级智能体在战斗或其他竞争场景中互动。与大多数专注于单一智能体或只有少数智能体的强化学习研究平台不同,MAgent旨在为扩展到数百万智能体的强化学习研究提供支持。这个平台的环境现在作为PettingZoo[67]开源库的一部分进行维护,用于多智能体强化学习的研究。
图9. Neural MMO[63]是一个在程序化生成的虚拟世界中模拟智能体群体的平台,旨在为多智能体研究提供支持,并将其对计算资源的要求限制在一定范围内。用户可以从一组平台提供的游戏系统中选择,为他们具体的研究问题创造环境——平台支持多达一千种智能体、一平方公里的地图以及几千个时间步长。该项目正在积极开发中,有大量的文档和工具,为研究人员提供记录和可视化工具。在发稿时,这个平台即将在2021年的NeurIPS会议上进行演示。
4、元学习
图10. Sandler等人[55]和Kirsch等人[33]的最近研究工作,试图泛化人工神经网络的公认概念,使每个神经元可以有多个状态,而不是一个标量值,也使每个突触的功能是双向的,来促进学习和推理。如本图所示,Kirsch等人[33]提出用一个简单的循环神经网络(具有不同的内部隐藏状态)来模拟每个突触,并标明网络可以通过简单地运行循环神经网络单元来训练,而不是使用反向传播。
四、深度学习用于集体智能
四、深度学习用于集体智能
图11. 深度学习方法已经被用来在连续的元胞自动机系统(如Lenia[6])中自动发现人工生命体[16,51]。最近的研究[16]不仅能自动发现有趣的模式,而且还能将用户对特定类型的有趣性的偏好纳入其搜索过程。这样,它就可以在Lenia中寻找局部空间里的模式,或类图灵的生命形式。
五、结论
五、结论
参考文献
Baker B, Kanitscheider I, Markov T, Wu Y, Powell G, McGrew B and Mordatch I (2019) Emergent tool use from multi-agent autocurricula. arXiv preprint arXiv:1909.07528 . Bansal T, Pachocki J, Sidor S, Sutskever I and Mordatch I (2017) Emergent complexity via multi-agent competition. arXiv preprint arXiv:1710.03748 . Bhatia J, Jackson H, Tian Y, Xu J and Matusik W (2021) Evolution gym: A large-scale benchmark for evolving soft robots. In: Advances in Neural Information Processing Systems. Curran Associates, Inc. URL https://sites.google.com/corp/view/evolution-gym-benchmark/. Brockman G, Cheung V, Pettersson L, Schneider J, Schulman J, Tang J and Zaremba W (2016) Openai gym. arXiv preprint arXiv:1606.01540 . Brown TB, Mann B, Ryder N, Subbiah M, Kaplan J, Dhariwal P, Neelakantan A, Shyam P, Sastry G, Askell A et al. (2020) Language models are few-shot learners. arXiv preprint arXiv:2005.14165 . Chan BWC (2019) Lenia: Biology of artificial life. Complex Systems 28(3): 251–286. DOI:10.25088/ complexsystems.28.3.251. URL http://dx.doi.org/10.25088/complexsystems.28.3.251. Chan BWC (2020) Lenia and expanded universe. In: Proceedings of the ALIFE 2020: The 2020 Conference on Artificial Life. pp. 221–229. DOI:10.1162/isal a 00297. URL https://doi.org/10.1162/isal_ a_00297. Cheney N, MacCurdy R, Clune J and Lipson H (2014) Unshackling evolution: evolving soft robots with multiple materials and a powerful generative encoding. ACM SIGEVOlution 7(1): 11–23. Chollet F et al. (2015) keras. Chua LO and Roska T (2002) Cellular neural networks and visual computing: foundations and applications. Cambridge university press. Chua LO and Yang L (1988) Cellular neural networks: Applications. IEEE Transactions on circuits and systems 35(10): 1273–1290. Chua LO and Yang L (1988) Cellular neural networks: Theory. IEEE Transactions on circuits and systems 35(10): 1257–1272. Cisneros H, Sivic J and Mikolov T (2019) Evolving structures in complex systems. arXiv preprint arXiv:1911.01086 . Conway J et al. (1970) The game of life. Scientific American 223(4): 4. Daigavane A, Ravindran B and Aggarwal G (2021) Understanding convolutions on graphs. Distill DOI: 10.23915/distill.00032. Https://distill.pub/2021/understanding-gnns. Deng J, Dong W, Socher R, Li LJ, Li K and Fei-Fei L (2009) Imagenet: A large-scale hierarchical image database. In: 2009 IEEE conference on computer vision and pattern recognition. Ieee, pp. 248–255. Etcheverry M, Moulin-Frier C and Oudeyer PY (2020) Hierarchically organized latent modules for exploratory search in morphogenetic systems. In: Advances in Neural Information Processing Systems, volume 33. Curran Associates, Inc., pp. 4846–4859. URL https://proceedings.neurips.cc/paper/2020/file/ 33a5435d4f945aa6154b31a73bab3b73-Paper.pdf. Foerster JN, Assael YM, De Freitas N and Whiteson S (2016) Learning to communicate with deep multi-agent reinforcement learning. arXiv preprint arXiv:1605.06676 . Freeman CD, Metz L and Ha D (2019) Learning to predict without looking ahead: World models without forward prediction URL https://learningtopredict.github.io. Gilpin W (2019) Cellular automata as convolutional neural networks. Physical Review E 100(3): 032402. GoraS L, Chua LO and Leenaerts D (1995) Turing patterns in cnns. i. once over lightly. IEEE Transactions on Circuits and Systems I: Fundamental Theory and Applications 42(10): 602–611. Grattarola D, Livi L and Alippi C (2021) Learning graph cellular automata. Ha D (2018) Reinforcement learning for improving agent design URL https://designrl.github.io. Ha D (2020) Slime volleyball gym environment. https://github.com/hardmaru/ slimevolleygym. Hamann H (2018) Swarm robotics: A formal approach. Springer. He K, Zhang X, Ren S and Sun J (2016) Deep residual learning for image recognition. In: Proceedings of the IEEE conference on computer vision and pattern recognition. pp. 770–778. Heiden E, Millard D, Coumans E, Sheng Y and Sukhatme GS (2021) NeuralSim: Augmenting differentiable simulators with neural networks. In: Proceedings of the IEEE International Conference on Robotics and Automation (ICRA). URL https://github.com/google-research/ tiny-differentiable-simulator. Hill A, Raffin A, Ernestus M, Gleave A, Kanervisto A, Traore R, Dhariwal P, Hesse C, Klimov O, Nichol A et al. (2018) Stable baselines. Hooker S (2020) The hardware lottery. arXiv preprint arXiv:2009.06489 URL https:// hardwarelottery.github.io/. Horibe K, Walker K and Risi S (2021) Regenerating soft robots through neural cellular automata. In: EuroGP. pp. 36–50. Huang W, Mordatch I and Pathak D (2020) One policy to control them all: Shared modular policies for agentagnostic control. In: International Conference on Machine Learning. PMLR, pp. 4455–4464. Jaderberg M, Czarnecki WM, Dunning I, Marris L, Lever G, Castaneda AG, Beattie C, Rabinowitz NC, Morcos AS, Ruderman A et al. (2019) Human-level performance in 3d multiplayer games with populationbased reinforcement learning. Science 364(6443): 859–865. Joachimczak M, Suzuki R and Arita T (2016) Artificial metamorphosis: Evolutionary design of transforming, soft-bodied robots. Artificial life 22(3): 271–298. Jumper J, Evans R, Pritzel A, Green T, Figurnov M, Ronneberger O, Tunyasuvunakool K, Bates R, Zˇ´ıdek A, Potapenko A et al. (2021) Highly accurate protein structure prediction with alphafold. Nature 596(7873): 583– 589. Kirsch L and Schmidhuber J (2020) Meta learning backpropagation and improving it. arXiv preprint arXiv:2012.14905 . Kozek T, Roska T and Chua LO (1993) Genetic algorithm for cnn template learning. IEEE Transactions on Circuits and Systems I: Fundamental Theory and Applications 40(6): 392–402. Krizhevsky A, Sutskever I and Hinton GE (2012) Imagenet classification with deep convolutional neural networks. Advances in neural information processing systems 25: 1097–1105. Liu S, Lever G, Merel J, Tunyasuvunakool S, Heess N and Graepel T (2019) Emergent coordination through competition. arXiv preprint arXiv:1902.07151 . Mnih V, Kavukcuoglu K, Silver D, Rusu AA, Veness J, Bellemare MG, Graves A, Riedmiller M, Fidjeland AK, Ostrovski G et al. (2015) Human-level control through deep reinforcement learning. nature 518(7540): 529–533. Mordvintsev A, Randazzo E, Niklasson E and Levin M (2020) Growing neural cellular automata. Distill DOI: 10.23915/distill.00023. URL https://distill.pub/2020/growing-ca. Nagy Z, Voroshazi Z and Szolgay P (2006) A real-time mammalian retina model implementation on fpga. In: 2006 10th International Workshop on Cellular Neural Networks and Their Applications. IEEE, pp. 1–1. Ohsawa S, Akuzawa K, Matsushima T, Bezerra G, Iwasawa Y, Kajino H, Takenaka S and Matsuo Y (2018) Neuron as an agent. URL https://openreview.net/forum?id=BkfEzz-0-. OroojlooyJadid A and Hajinezhad D (2019) A review of cooperative multi-agent deep reinforcement learning. arXiv preprint arXiv:1908.03963 . Ott J (2020) Giving up control: Neurons as reinforcement learning agents. arXiv preprint arXiv:2003.11642 . Pandarinath C, O’Shea DJ, Collins J, Jozefowicz R, Stavisky SD, Kao JC, Trautmann EM, Kaufman MT, Ryu SI, Hochberg LR et al. (2018) Inferring single-trial neural population dynamics using sequential auto-encoders. Nature methods 15(10): 805–815. Pathak D, Lu C, Darrell T, Isola P and Efros AA (2019) Learning to control self-assembling morphologies: a study of generalization via modularity. arXiv preprint arXiv:1902.05546 . Peng Z, Hui KM, Liu C, Zhou B et al. (2021) Learning to simulate self-driven particles system with coordinated policy optimization. Advances in Neural Information Processing Systems 34. Qin Y, Feng M, Lu H and Cottrell GW (2018) Hierarchical cellular automata for visual saliency. International Journal of Computer Vision 126(7): 751–770. Qu X, Sun Z, Ong YS, Gupta A and Wei P (2020) Minimalistic attacks: How little it takes to fool deep reinforcement learning policies. IEEE Transactions on Cognitive and Developmental Systems . Radford A, Kim JW, Hallacy C, Ramesh A, Goh G, Agarwal S, Sastry G, Askell A, Mishkin P, Clark J et al. (2021) Learning transferable visual models from natural language supervision. arXiv preprint arXiv:2103.00020 . Radford A, Narasimhan K, Salimans T and Sutskever I (2018) Improving language understanding by generative pre-training . Radford A, Wu J, Child R, Luan D, Amodei D, Sutskever I et al. (2019) Language models are unsupervised multitask learners. OpenAI blog 1(8): 9. Randazzo E, Mordvintsev A, Niklasson E, Levin M and Greydanus S (2020) Self-classifying mnist digits. Distill DOI:10.23915/distill.00027.002. URL https://distill.pub/2020/selforg/mnist. Reinke C, Etcheverry M and Oudeyer PY (2020) Intrinsically motivated discovery of diverse patterns in self-organizing systems. In: International Conference on Learning Representations. URL https:// openreview.net/forum?id=rkg6sJHYDr. Resnick C, Eldridge W, Ha D, Britz D, Foerster J, Togelius J, Cho K and Bruna J (2018) Pommerman: A multi-agent playground. arXiv preprint arXiv:1809.07124 . Rolnick D, Donti PL, Kaack LH, Kochanski K, Lacoste A, Sankaran K, Ross AS, Milojevic-Dupont N, Jaques N, Waldman-Brown A et al. (2019) Tackling climate change with machine learning. arXiv preprint arXiv:1906.05433 . Rubenstein M, Cornejo A and Nagpal R (2014) Programmable self-assembly in a thousand-robot swarm. Science 345(6198): 795–799. Rudin N, Hoeller D, Reist P and Hutter M (2021) Learning to walk in minutes using massively parallel deep reinforcement learning. arXiv preprint arXiv:2109.11978 . Sanchez-Lengeling B, Reif E, Pearce A and Wiltschko AB (2021) A gentle introduction to graph neural networks. Distill 6(9): e33. Sandler M, Vladymyrov M, Zhmoginov A, Miller N, Madams T, Jackson A and Arcas BAY (2021) Metalearning bidirectional update rules. In: International Conference on Machine Learning. PMLR, pp. 9288–9300 Sandler M, Zhmoginov A, Luo L, Mordvintsev A, Randazzo E et al. (2020) Image segmentation via cellular automata. arXiv preprint arXiv:2008.04965 . Schmidhuber J (2014) Who invented backpropagation? More[DL2] . Schoenholz S and Cubuk ED (2020) Jax md: a framework for differentiable physics. Advances in Neural Information Processing Systems 33. Senior AW, Evans R, Jumper J, Kirkpatrick J, Sifre L, Green T, Qin C, Zˇ´ıdek A, Nelson AW, Bridgland A et al. (2020) Improved protein structure prediction using potentials from deep learning. Nature 577(7792): 706–710. Silver D, Huang A, Maddison CJ, Guez A, Sifre L, Van Den Driessche G, Schrittwieser J, Antonoglou I, Panneershelvam V, Lanctot M et al. (2016) Mastering the game of go with deep neural networks and tree search. nature 529(7587): 484–489. Simonyan K and Zisserman A (2014) Very deep convolutional networks for large-scale image recognition. arXiv preprint arXiv:1409.1556 . Stoy K, Brandt D, Christensen DJ and Brandt D (2010) Self-reconfigurable robots: an introduction . Suarez J, Du Y, Isola P and Mordatch I (2019) Neural mmo: A massively multiagent game environment for training and evaluating intelligent agents. arXiv preprint arXiv:1903.00784 . Suarez J, Du Y, Zhu C, Mordatch I and Isola P (2021) The neural mmo platform for massively multiagent research. In: Thirty-fifth Conference on Neural Information Processing Systems Datasets and Benchmarks Track. URL https://openreview.net/forum?id=J0d-I8yFtP. Sudhakaran S, Grbic D, Li S, Katona A, Najarro E, Glanois C and Risi S (2021) Growing 3d artefacts and functional machines with neural cellular automata. arXiv preprint arXiv:2103.08737 . Tang Y and Ha D (2021) The sensory neuron as a transformer: Permutation-invariant neural networks for reinforcement learning. In: Thirty-Fifth Conference on Neural Information Processing Systems. URL https: //openreview.net/forum?id=wtLW-Amuds. https://attentionneuron.github.io. Tang Y, Nguyen D and Ha D (2020) Neuroevolution of self-interpretable agents. In: Proceedings of the Genetic and Evolutionary Computation Conference. URL https://attentionagent.github.io. Terry JK, Black B, Jayakumar M, Hari A, Sullivan R, Santos L, Dieffendahl C, Williams NL, Lokesh Y, Horsch C et al. (2020) Pettingzoo: Gym for multi-agent reinforcement learning. arXiv preprint arXiv:2009.14471 . Vinyals O, Babuschkin I, Czarnecki WM, Mathieu M, Dudzik A, Chung J, Choi DH, Powell R, Ewalds T, Georgiev P et al. (2019) Grandmaster level in starcraft ii using multi-agent reinforcement learning. Nature 575(7782): 350–354. Wang T, Liao R, Ba J and Fidler S (2018) Nervenet: Learning structured policy with graph neural networks. In: International Conference on Learning Representations. Wolfram S (2002) A new kind of science, volume 5. Wolfram media Champaign, IL. Wu Z, Pan S, Chen F, Long G, Zhang C and Philip SY (2020) A comprehensive survey on graph neural networks. IEEE transactions on neural networks and learning systems 32(1): 4–24. Zenil H (2009) Compression-based investigation of the dynamical properties of cellular automata and other systems. arXiv preprint arXiv:0910.4042 . Zhang D, Choi C, Kim J and Kim YM (2021) Learning to generate 3d shapes with generative cellular automata. In: International Conference on Learning Representations. URL https://openreview.net/forum? id=rABUmU3ulQh. Zheng L, Yang J, Cai H, Zhou M, Zhang W, Wang J and Yu Y (2018) Magent: A many-agent reinforcement learning platform for artificial collective intelligence. In: Proceedings of the AAAI Conference on Artificial Intelligence, volume 32
参考文献可上下滑动查看
复杂科学最新论文
集智斑图顶刊论文速递栏目上线以来,持续收录来自Nature、Science等顶刊的最新论文,追踪复杂系统、网络科学、计算社会科学等领域的前沿进展。现在正式推出订阅功能,每周通过微信服务号「集智斑图」推送论文信息。扫描下方二维码即可一键订阅:
推荐阅读
Science评论:深度学习助力高通量结构生物信息学 研究速递:深度学习引入适应性,提升蛋白质结构预测能力 什么是元胞自动机 | 集智百科 《张江·复杂科学前沿27讲》完整上线! 成为集智VIP,解锁全站课程/读书会 加入集智,一起复杂!
点击“阅读原文”,追踪复杂科学顶刊论文