其他
【神文】用文言文英语和白话文三语评述机器学习之产生式模型
翻译:尹肖贻 校对:孙月杰
谈吐风声 援笔立成
——生成式模型 选录
such as: Adjacent pixels tend to be similar (a property of the distribution of natural images) The top half of the image is usually sky-colored The roof and body of the house probably have different colors Any windows are likely to be reflecting the color of the sky The house has to be structurally stable The house has to be made out of materials like brick, wood, and cement rather than cotton or meat ... and so on. 诸如: 邻笔常类同; 上而为天; 蓬牖或异色; 玻璃映天蓝; 屋体耸立; 房上多泥石瓦木,而非锦衣玉食; 诸如此类。 比方说: l 相邻像素倾向于相似(自然图像的一个性质); l 图像的上半部分往往是天空的颜色; l 房顶跟房体很可能颜色不一样; l 窗子很可能反射着天空的色彩; l 房子必须有坚固的结构; l 房子由瓦、木头、水泥什么的造出来的,而不是棉花、肉什么的; l ……
We need to specify these relationships via equations and code, even if implicitly rather than analytically (writing down the 360,000-dimensional equations for P). 吾辈演此技艺于算术编码之上,顾隐含而非人为,于概率域而求索。 我们把这些关系写成等式与代码,即使内隐的而非分析的(写成360,000维的等式来拟合概率分布P)【注:360000=600^2,是作者前文设定的图像大小】
Trying to write down the equations that describe "house pictures" may seem insane, but in the field of Machine Learning, this is an area of intense research effort known as Generative Modeling. 如是演绎勾画房舍之图,似有愚公移山之浩繁。于机器学习,此类演绎,统称生成式建模。 试图写出等式来描述“房舍图像”,乍看起来非常“疯狂”。在机器学习领域,这样“疯狂”地研究产生(或描述)图像规则的领域,叫做生成式建模。
Formally, generative models allow us to create observation data out of thin air by sampling from the joint distribution over observation data and class labels. That is, when you sample from a generative distribution, you get back a tuple consisting of an image and a class label. 正言之,生成模型,即以采样图文并之标记,便可于虚无之间,织造所示所闻。换言之,采样产生之分布,既得图文,又得标记。 确切地说,生成式模型让我们可以,通过在观测数据并分类标签的联合分布上采样的方式,从无到有地产生出观测数据(create observation data out of thin air)。换句话说,当我们在生成式模型的分布上采样,我们得到图像与其标签构成的元组。
This is in contrast to discriminative models, which can only sample from the distribution of class labels, conditioned on observations (you need to supply it with the image for it to tell you what it is). Generative models are much harder to create than discriminative models. 异于判别模型,采样需于观察条件之下,得其类别标记,即告知模型,该图为何物,如是这般。生成模型较判别模型,难度远甚。 与判别式模型相对照,后者的采样空间是给定观测图像条件下的类别标签。(即,你需要告诉算法,图像里面是什么 )。从这个对比不难发现,建立生成式模型比判别式模型,难度大得多。
There have been some awesome papers in the last couple of years that have improved generative modeling performance by combining them with Deep Learning techniques. Many researchers believe that breakthroughs in generative models are key to solving ``unsupervised learning'', and maybe even general AI, so this stuff gets a lot of attention. 近年得益于深度模型进步,生成模型亦多有论文述论。研究界广存同仁笃信生成模型,举其为非监督学习之要津,乃至广义人工智能之秘诀,而颇受青睐。 近几年,深度学习技术对生成式模型有所建树。许多研究者认为,生成式模型方面的突破,是解决“非监督学习”的关键,乃至于广义人工智能的根基,所以生成式成为时代的宠儿。
Discriminative classifiers model the posterior p(y|x) directly, or learn a direct map from inputs x to the class labels. 判别式分类器则直入后验,以图文甲推得标记乙。 判别式分类器直接模拟后验分布p(y|x),即从输入x直接得到它的类别标签。
诚然,且不论生成模型之计算浩繁,条列阙如,世人偏爱判别之于产生,似成共识。 的确,即使撇开计算问题与数据缺失的情况不谈,学界的共识也倾向于认为,判别式模型比生成式模型更好。
Another piece of prevailing folk wisdom is that the number of examples needed to fit a model is often roughly linear in the number of free parameters of a model. This has its theoretical basis in the observation that for "many" models, the VC dimension is roughly linear or at most some low-order polynomial in the number of parameters, and it is known that sample complexity in the discriminative setting is linear in the VC dimension. 另有乡野智谋,以为模型参数之多寡,较于估算情况之多寡,乃伯仲之间。此源于别案另律,VC维度同参数之数量,若线性比齐,则不分轩轾,或稍有不足。另有定俗曰,公断之集,采样之简明或繁冗,全在VC维度。 另一类普遍的“民间智慧”认为,模型需要拟合的样本的数量,往往跟模型参数的数量呈近似的线性关系。这在许多模型中有理论基础,VC维与参数接近线性相关,至多也只是呈较低次数的多项式关系。众所周知,在判别集合中采样的复杂度,跟VC维呈线性关系。
We consider the naive Bayes model (for both discrete and continuous inputs) and its discriminative analog, logistic regression/linear classification, and show: (a) The generative model does indeed have a higher asymptotic error (as the number of training examples becomes large) than the discriminative model, but (b) The generative model may also approach its asymptotic error much faster than the discriminative model-possibly with a number of training examples that is only logarithmic, rather than linear, in the number of parameters. This suggests-and our empirical results strongly support-that, as the number of training examples is increased, there can be two distinct regimes of performance, the first in which the generative model has already approached its asymptotic error and is thus doing better, and the second in which the discriminative model approaches its lower asymptotic error and does better. 吾等推演朴素贝氏模型,所填图文或断或连,而比之以譬喻罗吉斯回归或分类模型,断言如是:(一)生成模型至于无穷处,确多有差额,然(二)其收敛之效,远速于判别,对数之于线性,彰显无遗。此二者揭示,并经验范之,曰,以训练数之增加,模型神态各异,初以生成模型鳌里夺尊,后以判别模型一骑绝尘。 我们研究了朴素贝叶斯模型(对于离散跟连续两种输入情况)和它对应的判别式版本逻辑斯蒂回归/线性分类器,得出如下结论:(a)生成式模型(随着训练样本的增大)的确有更高的渐进误差,但是(b)生成式模型达到渐进误差线的速度,比判别式模型快得多,也许只需要参数的对数的量的训练样本,而不是线性的量,就可以达到渐近线。这表明——而且我们原先的结论强有力地支持——随着训练样本的增加,不同的模型表现出不同的性态:生成式模型比对应的判别式模型更快地达到渐进误差线,在判别式模型还未达到它的渐进误差线时,取得长时间的领先;而在那一点以后,判别式模型具有更低的渐进误差,而表现更佳。
But maybe this is too abstract of an explanation. A practical use-case is for simulating possible futures when planning a decision and reasoning. As I wrote earlier, I know what to do to avoid making a car accident even though I never experienced one. I actually have zero training example of that category! Nor anything close to it (thankfully). I am able to do so only because I can generate the sequence of events and their consequence for me, if I chose to do some (fatal) action. Self-driving cars? Robots? Dialogue systems? etc. 然前腔文饰太盛,练达之处在于判事处世之际,推衍缘由,预谋未然。纵余愚钝,然即未历车马横祸,尚晓规避;纵无此类教训,不致引火烧身。盖余之智力,于性命攸关之时,可瞻前而顾后。自驾车、机械人、高谈阔论机,凡此种种,机缘一统,如是而得。 不过,这个回答太抽象了。较为实际的情况下,生成式模型的功用在于模拟未来的数据,以做出判断或推理。正如我前文所写的那样,虽然没有经历过车祸,我也知道怎样避开这样的事情。可是,我从未参加过躲避车祸的培训!(幸运的是)甚至没有任何一件与之类似的事情!我之所以能够做这件事,因为我能够在(致命的)事件中,按时间的先后,估算事件的发展,并预判它们对我可能造成的影响。自动驾驶车、机器人、人机对话系统,其理论基础都是这样的。
Another practical example is structured outputs, where you want to generate Y conditionally on X. If you have good algorithms for generating Y's in the first place, the conditional extension is pretty straightforward. When Y is a very high-dimensional object (image, sentence, data structure, complex set of actions, choice of a combination of drug treatments, etc.), then these techniques can be useful. 另有例证,倘得后文勾股凛然,生成模型可由甲,阐证而得乙。若肇始之时,已知乙产生之算术,延展之事,势如破竹。当乙处高维空间之中(图、句、数桁、杂务集、药物运筹之类),生成模型良堪一用。 生成式模型另外一个应用的例子是,输出数据如果是结构化的,就可以在条件集X的基础上产生数据Y。如果你手里有一个Y的产生算法,你可以很直接地得到概率条件的延展。当Y处于高维空间的时候(例如图像、句子、数据结构、复杂的行动集合、医药治疗的组合方案等),生成式模型都有用武之地。
We are using images because they are fun and tell us a lot (humans are highly visual animals), which helps to debug and understand the limitations of these algorithms. 吾辈效力于图像之事业,盖因其趣味盎然、神采丰厚(更兼人类视觉最锐);试炼或深思算法之轩轾,无出其右。 我们大多跟图像打交道,因为图像是有趣的,蕴含了丰富的信息(人类是高度依赖视觉的动物)。图像能帮助我们调试、理解算法的制约和边界。
请长按识别右图中的二维码!