查看原文
其他

ASF 生成式工具指南(中英文对照)

开源雨林 2023-09-19


作者:Apache 软件基金会

翻译:刘天栋 Ted

版本 1.0 章节(目录)

Version: 1.0

  • 对 ASF 项目的贡献能否包括人工智能生成的内容?

    Can contributions to ASF projects include AI generated content?

  • 文档如何处理?

    What about Documentation?

  • 图像如何处理?

    What about Images?

  • 如果贡献中包含人工智能生成的内容,而该内容被识别出抄袭或复制的材料,我们该怎么办?

    What do we do if a contribution includes AI generated content and some form of tooling has identified materials that have been copied?

对 ASF 项目的贡献能否包括人工智能生成的内容

Apache-2.0 许可协议和 Apache 个人贡献许可协议(ICLA)都提醒贡献者,他们有责任披露所提交贡献中任何非其本人原创且受版权保护的材料。在使用生成式人工智能工具时,以及在使用来自公共网站的资料或其他开源项目的代码时,这一点同样适用。

The Apache-2.0 license, and the Apache Individual Contribution License Agreement, both remind contributors that they are responsible for disclosing any copyrighted materials in submitted contributions that are not their original creations. This is as true when using generative AI tooling, as it is when using materials from public websites or code from other open-source projects.

在披露这些材料时,投稿人还应说明这些材料的许可协议情况。ASF 制定了《第三方许可政策》[1],就可接受的许可协议提供指导意见,并说明如何处理第三方作品[2]

When disclosing these materials, contributors should also identify the licensing for these materials. The ASF maintains a 3rd Party Licensing Policy that provides guidance on which licenses are acceptable, along with instructions on the treatment of 3rd Party Works.

虽然一般来说,非人类(如机器或猴子[3])生成的内容不具有版权,但如果内容由人工智能生成的部分和人类撰写的其他部分组成,则人类撰写的部分可能具有版权。

While in general, the content generated by a non-human (e.g., machine or monkey) is not copyrightable, if content consists of some portions generated by AI and other portions authored by a human, the portions authored by a human may be copyrightable.

正如以下美国版权局注册指南[4](3/16/2023)所解释的那样:

As explained by the following U.S. Copyright Office Registration Guidance (3/16/2023):

"例如,人类可以选择或安排人工智能生成的材料,使其具有足够的创造性,"由此产生的作品整体上构成原创作品"。 或者,艺术家可以对人工智能技术最初生成的材料进行修改,使修改达到版权保护的标准。在这些情况下,版权将只保护作品中人类创作的部分,这些部分'独立于'人工智能生成的材料本身,并且'不影响'其版权地位"。

“For example, a human may select or arrange AI-generated material in a sufficiently creative way that “the resulting work as a whole constitutes an original work of authorship.” Or an artist may modify material originally generated by AI technology to such a degree that the modifications meet the standard for copyright protection. In these cases, copyright will only protect the human-authored aspects of the work, which are ‘independent of’ and do ‘not affect’ the copyright status of the AI-generated material itself.”

这些由人类撰写的部分可能仅仅来自人类提供的提示或随后做出的修改。不过,生成式人工智能的一个突出问题是有可能复制其所训练的材料的部分内容,其中一些可能是受版权保护的主题。因此,在使用生成式人工智能工具时,建议使用具有以下功能的工具:识别任何与工具训练数据相似的内容,以及该内容的许可证。

These portions authored by a human may simply come from the prompt the human-provided or subsequent changes they make. However, a prominent concern with generative AI is the risk of reproducing portions of materials that they were trained on, some of which may be copyrightable subject matter. Thus, a recommended practice when using generative AI tooling is to use tools with features that identify any included content that is similar to parts of the tool’s training data, as well as the license of that content.

鉴于以上所述,如果贡献者能确保全部或部分使用人工智能生成的代码可以被用于贡献:

Given the above, code generated in whole or in part using AI can be contributed if the contributor ensures that:

1、生成式人工智能工具的条款和条件没有对输出结果的使用施加任何与开源定义(OSI - Open Source Definition)不一致的限制(例如,ChatGPT 的条款就不一致)。

The terms and conditions of the generative AI tool do not place any restrictions on the use of the output that would be inconsistent with the Open Source Definition (e.g., ChatGPT’s terms are inconsistent).

2、至少满足以下条件之一:

At least one of the following conditions is met:

① 输出不受版权保护的主题(即使由人制作也不属于可受版权保护的主题);

      The output is not copyrightable subject matter (and would not be even if produced by a human)

② 输出结果中不包括第三方材料;

      No third-party materials are included in the output

③ 输出结果中包含的任何第三方材料都是在第三方版权持有者的许可下(例如,在兼容的开源许可协议下)使用的,并遵守了适用的许可条款。

      Any third-party materials that are included in the output are being used with permission (e.g., under a compatible open-source license) of the third-party copyright holders and in compliance with the applicable license terms

3、如果人工智能工具本身提供了关于可能已被复制的材料的足够信息,或从代码扫描结果中获得了足够的信息,那么贡献者就可以合理地确定条件 2.2 或 2.3 已经满足。

A contributor obtains reasonable certainty that conditions 2.2 or 2.3 are met if the AI tool itself provides sufficient information about materials that may have been copied, or from code scanning results

  • 例如,AWS CodeWhisperer 最近增加了一项功能,提供通知和归属。

    • E.g. AWS CodeWhisperer recently added a feature that provides notice and attribution

在提供使用生成式人工智能工具撰写的贡献时,推荐的做法是贡献者注明创建贡献时使用的工具。这应作为一个标记包含在源代码控制提交信息中,例如包含 "Generated-by: " 短语。这样就可以考虑在未来发布工具时,将这些内容提取到一个机器可解析的工具证明文件中。

When providing contributions authored using generative AI tooling, a recommended practice is for contributors to indicate the tooling used to create the contribution. This should be included as a token in the source control commit message, for example including the phrase “Generated-by: ”. This allows for future release tooling to be considered that pulls this content into a machine parsable Tooling-Provenance file.

最后,请注意,尽管上述内容在 2023 年 6 月看来是一套合理的指导方针,但这是一个快速发展的领域。无论我们今天向项目管理委员会(PMC)建议什么,相关政策都需要适时重新评估和更新,以适应以下情况:

Finally, please note that while the above seems like a reasonable set of guidelines in June 2023, this is a rapidly evolving area. Whatever we recommend to PMCs today, policies will need to be re-evaluated and updated in response to:

  • 法律的变化

    • Changes in the law

  • 人工智能技术和相关工具的变化(例如,随着人工智能模型的发展,(1) 能够在复制其训练所依据的部分资料时发出通知,或 (2) 能够根据指示复制经过许可(或以其他方式兼容 Apache 许可协议 2.0)的授权源材料
    • Changes in AI technology and related tools (e.g., as AI models evolve that are (1) are able to provide notice when they reproduce portions of the materials they were trained on or (2) can be instructed to reproduce only permissively (or otherwise Apache-2.0 compatible) licensed source materials

  • 开源软件使用者对风险和模糊性容忍度的变化

    • Changes in tolerance for risk and ambiguity among adopters of OSS

我们将继续与项目管理委员会(PMC)和 ASF 成员沟通,讨论以及更新常见问题(FAQ)的内容。

We will continue communicating with PMC and ASF members as updates to this FAQ get discussed and merged in.

文档如何处理?
上述内容同样适用于文档。不过,最流行的文档工具 ChatGPT 有限制性许可,因此应谨慎使用。

The above text should apply to documentation as well. However, the most popular tooling for documentation, ChatGPT, has restrictive licensing, so caution should be applied.

图像如何处理?
与文档一样,上述原则仍然适用。尽管图像是一种非文本形式,但其细节很快就会变得复杂。我们预计这将继续是一个快速发展的领域。

As with documentation, the above principles would still apply. Though with images being a non-textual form, the details quickly become complex. We expect this to continue to be a rapidly evolving area.

如果贡献中包含人工智能生成的内容,而该内容被识别出抄袭或复制的材料,我们该怎么办?

与其他任何贡献一样,请参考第三方许可政策[5]

Refer to the 3rd Party Licensing Policy as with any other contribution.


注:

[1]https://www.apache.org/legal/resolved.html

[2]https://www.apache.org/legal/src-headers.html#3party

[3]https://zh.wikipedia.org/zh-hans/%E7%8C%B4%E5%AD%90%E8%87%AA%E6%8B%8D%E7%85%A7%E8%91%97%E4%BD%9C%E6%AC%8A%E7%88%AD%E8%AD%B0

[4]https://www.federalregister.gov/documents/2023/03/16/2023-05321/copyright-registration-guidance-works-containing-material-generated-by-artificial-intelligence

[5]https://www.apache.org/legal/resolved.html



原文链接:

https://www.apache.org/legal/generative-tooling.html


相关阅读



拯救开源:《网络韧性法案》即将带来的悲剧

ASF 法律委员会发布贡献者生成式 AI 指南




如果您有新的想法,欢迎加入开源雨林交流群,一起探讨。

小助手微信:osrainforest(添加时请备注“交流群”)
 什么是开源雨林?


开源雨林围绕开源通识、开源使用、开源贡献三大方面构建知识体系,愿把长期积累的经验系统化分享给企业,在团队、机制、项目三方面提供合作,推动各企业更高效地使用开源、贡献开源,提升全行业开源技术与应用水平。 


开源雨林的内容已开源,并托管在 https://github.com/opensource-rainforest/osr ,欢迎通过 Pull Request 的形式贡献内容,通过 Issue 的形式展开讨论,共同维护开源雨林的内容。

您可能也对以下帖子感兴趣

文章有问题?点此查看未经处理的缓存