干货|超实用的 Python/NumPy实现的多模神经网络语言模型(附Github)
选自:Github
翻译:张妮娜
在基础NumPy系统中实现“多模态神经语言模型”(Kiros et al, ICML 2014),包含加法和乘法的对数双线性图像字幕生成器。这些模型不同于其他大多数图像字幕生成器,它们不使用循环神经网络。
如果你正在寻找一个可以在CPU上训练的、既简单又基础的图像字幕生成器,此代码可能对你有用,它也可以用于教学目的。这个代码被用为多伦多大学本科神经网络课程作业的一部分。
使用VGG19模型的MSCOCO数据集,单个模型以25分的成绩实现BLEU4,而整体上可以达到近27分。相比较而言,具有相同特征的“Show and Tell”长短时记忆网络则取得27分之多,而现有技术水平大约为34分。因此,这些模型与当前的技术水平相距甚远。作为我的博士论文的一部分,在此公开此代码。
Visualization
Here are results on 1000 images using an ensemble of additive log-bilinear models trained using this code.
Dependencies
This code is written in python. To use it you will need:
Python 2.7
A recent version of NumPy (http://www.numpy.org/)and SciPy(http://www.scipy.org/)
Quickstart for Toronto users
To train an additive log-bilinear model with the default settings, open IPython and run the following:
import coco_proc, trainer
z, zd, zt = coco_proc.process(context=5)
trainer.trainer(z, zd)
this will store trained models in the models directory and periodically compute BLEU using the Perl code and reference captions in the gen directory. All the hyperparameters settings can be tuned in trainer.py. Links to MSCOCO data are in config.py.
Getting started
You will first need to download the pre-processed MSCOCO data. All necessary files can be downloaded by running:
wget http://www.cs.toronto.edu/~rkiros/data/mnlm.zip
After unpacking, open config.py and set the paths accordingly. Then you can proceed to the quickstart instructions. All training settings can be found in trainer.py. Testing trained models is done with tester.py. The lm directory contains classes for the additive and multiplicative log-bilinear models. Helper functions, such as beam search, is found in the utils directory.
Reference
If you found this code useful, please cite the following paper:
Ryan Kiros, Ruslan Salakhutdinov, Richard S. Zemel. "Multimodal Neural Language Models." ICML (2014).
@inproceedings{kiros2014multimodal,
title={Multimodal Neural Language Models.},
author={Kiros, Ryan and Salakhutdinov, Ruslan and Zemel, Richard S},
booktitle={ICML},
volume={14},
pages={595--603},
year={2014}
}
License
Apache License 2.0
(http://www.apache.org/licenses/LICENSE-2.0)
点击阅读原文跳转Github地址