脉冲频率编码:传统的脉冲神经网络只能实现经验性的脉冲发放频率,为了实现脉冲发放频率的可控,我们提出了一种脉冲频率编码方法,可以使用超参数控制,实现了脉冲神经网络在发放率和性能之间的权衡取舍。我们通过超参数 k 控制输入的分布,当超参数 k 越大时,脉冲神经元发放率越低;当 k 越小时,脉冲表达能力越强,模型精度越高。频率编码如下公式所示:其中,alpha 使用训练数据的第一个 batch 统计获得。如下图所示,我们通过系数 alpha 改变输入分布的方差实现脉冲的频率控制。 在选择不同 k 时,可以有效控制训练后各层的脉冲发放频率,进而更加灵活地适应不同能耗的场景。 脉冲幅度编码:为了从脉冲幅度的角度编码获得更多信息,我们使用实数的脉冲幅度,通过把编码后的脉冲和以上相同的 alpha 系数相乘得到,可以表示为:在推理阶段, alpha 系数可以和权重矩阵合并,可以保证脉冲矩阵乘法的加法性质。
[1] Zhu, R.-J., Zhao, Q., Li, G., and Eshraghian, J. K. Spikegpt: Generative pre-trained language model with spiking neural networks. arXiv preprint arXiv:2302.13939, 2023.[2] Lv, C., Li, T., Xu, J., Gu, C., Ling, Z., Zhang, C., Zheng, X., and Huang, X. Spikebert: A language spikformer trained with two-stage knowledge distillation from bert. arXiv preprint arXiv:2308.15122, 2023.[3] Chen, Z., Deng, L., Wang, B., Li, G., and Xie, Y. A comprehensive and modularized statistical framework for gradient norm equality in deep neural networks. IEEE Transactions on Pattern Analysis and Machine Intelligence, 44 (1):13–31, 2020.[4] Ying-Hui Liu and Xiao-Jing Wang. Spike-frequency adaptation of a generalized leaky integrate-and-fire model neuron. Journal of computational neuroscience, 10:25–45, 2001.[5] Devlin, J., Chang, M.-W., Lee, K., and Toutanova, K. Bert: Pre-training of deep bidirectional transformers for language understanding. arXiv preprint arXiv:1810.04805, 2018.[6] Xing, X., Du, L., Wang, X., Zeng, X., Wang, Y., Zhang, Z., and Zhang, J. Bipft: Binary pre-trained foundation transformer with low-rank estimation of binarization residual polynomials. In Proceedings of the AAAI Conference on Artificial Intelligence, volume 38, pp. 16094–16102, 2024.
[7] Lewis, M., Liu, Y., Goyal, N., Ghazvininejad, M., Mohamed, A., Levy, O., Stoyanov, V., and Zettlemoyer, L. Bart: Denoising sequence-to-sequence pre-training for natural language generation, translation, and comprehension. arXiv preprint arXiv:1910.13461, 2019.