智能是什么?范畴论为通用人工智能提供普适框架
导语
在人工智能方兴的1950年代,图灵就曾写过一篇名为《计算机器和智能》的论文,提问“机器会思考吗?(Can Machines Think?)”。图灵提出了一种用于判定机器是否具有智能的测试方法,即图灵测试。而对于现代人工智能,我们有必要对这个问题进行再思考:首先,要考虑什么是智能。
由于一般的对智能的探讨缺乏数学的基础框架,我们便不能说出,甚至不能制订一个标准,来判断机器是否具有“思考”能力。也许任何框架都不敢说具有真正的“通用性”,但一个保留部分异议的数学理论框架的诞生也对判断机器智能这个议题具有深刻意义。在近日发表于 arXiv 的论文“A Categorical Framework of General Intelligence”中,作者致力于用范畴论这一数学领域公认的“普适性”语言,构建一个通用人工智能的组成框架。
贾伊阳 | 作者
梁金 | 编辑
论文题目:A Categorical Framework of General Intelligence论文链接:https://arxiv.org/abs/2303.04571
目录
一、引言
二、世界范畴
三、通信与解释
四、目标
五、训练下的不变性
一、引言
一、引言
传感器(sensor)接收来自外部环境的多模态信号,包括但不限于文字输入、视频/音频/图像输入等。 世界范畴(the world category)感知和理解传入的信号,并相应地更新其内部状态。 具有目标的规划器 (planner)持续地监测世界范畴的状态,并根据其目标生成计划。 最后,执行器(actor)执行这些计划,通过生成输出信号(如文本输出、视频/图像输出、音频输出、机器人操作信号等)影响外部环境。
二、世界范畴
二、世界范畴
2.1 自我状态
2.2 共情
有三种重要相关情况值得讨论。
1. 当只有自我状态的一个非常小的子集相关时,共情是非常有帮助的。例如,在多实体游戏中,每个实体都有自己的行动集、状态和奖励函数,共情在很大程度上有助于理解每个实体的情况和行为。
2. 如果其他实体具有私有传感器,那么无法实现完全的共情。具体来说,如果一个模型无法感知其他实体的私有传感器,并且其自我状态测试集T包括与这些传感器相关的测试,则该模型无法完全共情其他实体。
2.3 子范畴
三、通信与解释
三、通信与解释
3.1 解释性
四、目标
四、目标
五、训练下的不变性
五、训练下的不变性
参考文献
1. Adámek, J., Herrlich, H., and Strecker, G. (1990). Abstract and concrete categories. Wiley-Interscience.
2. Botvinick, M. and Cohen, J. (1998). Rubber hands ‘feel’touch that eyes see. Nature, 391(6669):756–756.
3. Brown, T., Mann, B., Ryder, N., Subbiah, M., Kaplan, J. D., Dhariwal, P., Neelakantan, A., Shyam, P., Sastry, G.,Askell, A., et al. (2020). Language models are few-shot learners. Advances in neural information processing systems, 33:1877–1901.
4. Chen, T., Kornblith, S., Norouzi, M., and Hinton, G. (2020). A simple framework for contrastive learning of visual representations. In International conference on machine learning, pages 1597–1607. PMLR.
5. Chen, X. and He, K. (2021). Exploring simple siamese representation learning. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 15750–15758.
6. Devlin, J., Chang, M.-W., Lee, K., and Toutanova, K. (2018). Bert: Pre-training of deep bidirectional transformers for language understanding. arXiv preprint arXiv:1810.04805.
7. Doersch, C., Gupta, A., and Efros, A. A. (2015). Unsupervised visual representation learning by context prediction.In Proceedings of the IEEE international conference on computer vision, pages 1422–1430.
8. Dosovitskiy, A., Beyer, L., Kolesnikov, A., Weissenborn, D., Zhai, X., Unterthiner, T., Dehghani, M., Minderer, M.,Heigold, G., Gelly, S., et al. (2020). An image is worth 16x16 words: Transformers for image recognition at scale. arXiv preprint arXiv:2010.11929.
9. Ehrsson, H. H., Spence, C., and Passingham, R. E. (2004). That’s my hand! activity in premotor cortex reflects feeling of ownership of a limb. Science, 305(5685):875–877.
10. Fang, Y., Wang, W., Xie, B., Sun, Q., Wu, L., Wang, X., Huang, T., Wang, X., and Cao, Y. (2022). Eva: Exploring the limits of masked visual representation learning at scale. arXiv preprint arXiv:2211.07636.
11. Grill, J.-B., Strub, F., Altché, F., Tallec, C., Richemond, P., Buchatskaya, E., Doersch, C., Avila Pires, B., Guo, Z., Gheshlaghi Azar, M., et al. (2020). Bootstrap your own latent-a new approach to self-supervised learning. Advances in neural information processing systems, 33:21271–21284.
12. He, K., Chen, X., Xie, S., Li, Y., Dollár, P., and Girshick, R. (2022). Masked autoencoders are scalable vision learners. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 16000–16009.
13. He, K., Fan, H., Wu, Y., Xie, S., and Girshick, R. (2020). Momentum contrast for unsupervised visual representation learning. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pages 9729–9738.
14. Huang, S., Dong, L., Wang, W., Hao, Y., Singhal, S., Ma, S., Lv, T., Cui, L., Mohammed, O. K., Liu, Q., Aggarwal, K., Chi, Z., Bjorck, J., Chaudhary, V., Som, S., Song, X., and Wei, F. (2023). Language is not all you need: Aligning perception with language models. arXiv preprint arXiv:2302.14045.
15. Kilteni, K., Groten, R., and Slater, M. (2012). The sense of embodiment in virtual reality. Presence: Teleoperators and Virtual Environments, 21(4):373–387.
16. Lundberg, S. M. and Lee, S.-I. (2017). A unified approach to interpreting model predictions. Advances in neural information processing systems, 30.
17. Mac Lane, S. (2013). Categories for the working mathematician, volume 5. Springer Science & Business Media. Masaki Kashiwara, P. S. (2006). Categories and Sheaves. Springer.
18. Noroozi, M. and Favaro, P. (2016). Unsupervised learning of visual representations by solving jigsaw puzzles. In European conference on computer vision, pages 69–84. Springer.
19. Pathak, D., Krahenbuhl, P., Donahue, J., Darrell, T., and Efros, A. A. (2016). Context encoders: Feature learning by inpainting. In Proceedings of the IEEE conference on computer vision and pattern recognition, pages 2536–2544.
20. Radford, A., Kim, J. W., Hallacy, C., Ramesh, A., Goh, G., Agarwal, S., Sastry, G., Askell, A., Mishkin, P., Clark, J., et al. (2021). Learning transferable visual models from natural language supervision. In International Conference on Machine Learning, pages 8748–8763. PMLR.
21. Radford, A., Narasimhan, K., Salimans, T., Sutskever, I., et al. (2018). Improving language understanding by generative pre-training. OpenAI blog.
22. Radford, A., Wu, J., Child, R., Luan, D., Amodei, D., Sutskever, I., et al. (2019). Language models are unsupervised multitask learners. OpenAI blog, 1(8):9.
23. Raffel, C., Shazeer, N., Roberts, A., Lee, K., Narang, S., Matena, M., Zhou, Y., Li, W., Liu, P. J., et al. (2020). Exploring the limits of transfer learning with a unified text-to-text transformer. J. Mach. Learn. Res., 21(140):1–67.
24. Ramesh, A., Dhariwal, P., Nichol, A., Chu, C., and Chen, M. (2022). Hierarchical text-conditional image generation with clip latents. arXiv preprint arXiv:2204.06125.
25. Ramesh, A., Pavlov, M., Goh, G., Gray, S., Voss, C., Radford, A., Chen, M., and Sutskever, I. (2021). Zero-shot text-to-image generation. In International Conference on Machine Learning, pages 8821–8831. PMLR.
26. Riehl, E. (2017). Category theory in context. Courier Dover Publications.
27. Rombach, R., Blattmann, A., Lorenz, D., Esser, P., and Ommer, B. (2022). High-resolution image synthesis with latent diffusion models. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 10684–10695.
28. Sohl-Dickstein, J., Weiss, E., Maheswaranathan, N., and Ganguli, S. (2015). Deep unsupervised learning using nonequilibrium thermodynamics. In International Conference on Machine Learning, pages 2256–2265. PMLR.
29. Sundararajan, M., Taly, A., and Yan, Q. (2017). Axiomatic attribution for deep networks. In International conference on machine learning, pages 3319–3328. PMLR.
30. Tsakiris, M. and Haggard, P. (2005). The rubber hand illusion revisited: visuotactile integration and self-attribution. Journal of experimental psychology: Human perception and performance, 31(1):80.
31. Yuan, Y. (2022). On the power of foundation models. arXiv preprint arXiv:2211.16327.
32. Yuan, Y. (2023). Succinct representations for concepts. arXiv preprint arXiv:2303.00446.
33. Zbontar, J., Jing, L., Misra, I., LeCun, Y., and Deny, S. (2021). Barlow twins: Self-supervised learning via redundancy reduction. In Meila, M. and Zhang, T., editors, Proceedings of the 38th International Conference on Machine Learning, ICML 2021, 18-24 July 2021, Virtual Event, volume 139 of Proceedings of Machine Learning Research, pages 12310–12320. PMLR.
(参考文献可上下滑动查看)
范畴论课程
详情请点击:
人人可学的范畴论第一季——跨领域的科学方法论 | 精品入门系列课
人人可学的范畴论第二季——跨学科的科学方法论 | 精品入门系列课“后ChatGPT”读书会启动
推荐阅读