查看原文
其他

红杉美国最新AI观察:新的语言模型技术栈,企业如何将AI应用落地

李榜主 AIhackathon 2023-12-23
2023年6月14日红杉美国发布的一篇新文章《全新的语言模型技术栈》。这篇文章总结了红杉美国对他们投资组合中的33家公司进行访谈后得出的结论,这些公司涵盖了从种子轮到已上市的各个阶段。整篇文章共有8个分析要点,每个要点都结合了最新的情况,并提供了对未来的预测。
对于从事AI创业的人来说,其中的很多内容都将是非常有帮助的。我借助 沉浸式翻译 Chrome 插件 + GPT-4 快速将这篇文章翻译成了中英文双语版,希望它能给大家带来最新的思考和启发。
我创建了AI应用层交流群,里面会实时分享AI最新有用的信息,群里信息密度和新鲜度非常高,交流氛围也很好,欢迎感兴趣的朋友可以后台回复关键词【微信】扫码入群。

BY MICHELLE FRADIN AND LAUREN REEDER
PUBLISHED JUNE 14, 2023
ChatGPT通过大型语言模型(LLM)引发了一股创新浪潮。比以往任何时候都多的公司正在将自然语言交互的能力引入其产品中。语言模型API的采用正在创造一种新的技术堆栈。为了更好地了解人们正在构建的应用程序和他们用来构建这些应用程序的技术堆栈,我们与红衫的 33 家公司进行了交流,这些公司包括从初创阶段的创业公司到大型上市企业。我们在两个月前和上周与他们进行了交流,以捕捉变化的速度。由于许多创始人和开发者正在努力制定他们自己的人工智能战略,我们希望在这个领域迅速发展的同时分享我们的发现。
ChatGPT unleashed a tidal wave of innovation with large language models (LLMs). More companies than ever before are bringing the power of natural language interaction to their products. The adoption of language model APIs is creating a new stack in its wake. To better understand the applications people are building and the stacks they are using to do so, we spoke with 33 companies across the Sequoia network, from seed stage startups to large public enterprises. We spoke with them two months ago and last week to capture the pace of change. As many founders and builders are in the midst of figuring out their AI strategies themselves, we wanted to share our findings even as this space is rapidly evolving. 

1. 几乎所有在红杉网络中的公司都在将语言模型融入到他们的产品中。
1. Nearly every company in the Sequoia network is building language models into their products.
我们已经看到了许多神奇的自动完成功能,从代码(Sourcegraph、Warp、Github)到数据科学(Hex)。我们已经看到了更好的聊天机器人,无论是用于客户支持、员工支持还是消费者娱乐。其他人则以AI为先的视角重新构想了整个工作流程:视觉艺术(Midjourney)、营销(Hubspot、Attentive、Drift、Jasper、Copy、Writer)、销售(Gong)、联系中心(Cresta)、法律(Ironclad、Harvey)、会计(Pilot)、生产力(Notion)、数据工程(dbt)、搜索(Glean、Neeva)、杂货购物(Instacart)、消费者支付(Klarna)和旅行规划(Airbnb)。这只是一些例子,而且它们只是个开始。
We’ve seen magical auto-complete features for everything from code (Sourcegraph, Warp, Github) to data science (Hex). We’ve seen better chatbots for everything from customer support to employee support to consumer entertainment. Others are reimagining entire workflows with an AI-first lens: visual art (Midjourney), marketing (Hubspot, Attentive, Drift, Jasper, Copy, Writer), sales (Gong), contact centers (Cresta), legal (Ironclad, Harvey), accounting (Pilot), productivity (Notion), data engineering (dbt), search (Glean, Neeva), grocery shopping (Instacart), consumer payments (Klarna), and travel planning (Airbnb). These are just a few examples and they’re only the beginning. 

2. 这些应用程序的新技术栈主要集中在语言模型API、检索和编排上,但开源使用也在增长。
2. The new stack for these applications centers on language model APIs, retrieval, and orchestration, but open source usage is also growing. 
  • 65%的应用程序已经投入生产使用,比两个月前的50%有所增加,而其余的仍在进行试验阶段。

    65% had applications in production, up from 50% two months ago, while the remainder are still experimenting.


  • 94% 正在使用基础模型 API。OpenAI 的 GPT 显然是我们样本中最受欢迎的,占 91%,但 Anthropic 的兴趣在上个季度增长到 15%。(一些公司正在使用多种模型)。

    94% are using a foundation model API. OpenAI’s GPT was the clear favorite in our sample at 91%, however Anthropic interest grew over the last quarter to 15%. (Some companies are using multiple models).


  • 88%的人认为,检索机制(如向量数据库)将继续是他们技术栈中的关键部分。为模型提供相关的上下文以进行推理有助于提高结果的质量,减少“幻觉”(不准确性)并解决数据新鲜度问题。一些人使用专门构建的向量数据库(如Pinecone、Weaviate、Chroma、Qdrant、Milvus等),而其他人则使用pgvector或AWS提供的解决方案。

    88% believe a retrieval mechanism, such as a vector database, would remain a key part of their stack. Retrieving relevant context for a model to reason about helps increase the quality of results, reduce “hallucinations” (inaccuracies), and solve data freshness issues. Some use purpose-built vector databases (Pinecone, Weaviate, Chroma, Qdrant, Milvus, and many more), while others use pgvector or AWS offerings.


  • 38% 的人对像 LangChain 这样的 LLM 编排和应用程序开发框架感兴趣。有些将其用于原型设计,而另一些则将其用于生产。在过去几个月中,采用率有所增加。

    38% were interested in an LLM orchestration and application development framework like LangChain. Some use it for prototyping, while others use it in production. Adoption increased in the last few months.


  • 不到 10% 的人正在寻找工具来监控 LLM 输出、成本或性能以及 A/B 测试提示。我们认为,随着越来越多的大公司和受监管行业采用语言模型,人们对这些领域的兴趣可能会增加。

    Sub-10% were looking for tools to monitor LLM outputs, cost, or performance and A/B test prompts. We think interest in these areas may increase as more large companies and regulated industries adopt language models.


  • 少数公司正在研究互补的生成技术,例如结合生成文本和语音。我们也相信这是一个令人兴奋的增长领域。

    A handful of companies are looking into complementary generative technologies, such as combining generative text and voice. We also believe this is an exciting growth area.


  • 15%的用户从头开始或使用开源代码构建自定义语言模型,通常还会使用LLM API。自定义模型训练在几个月前有了显著增长。这需要来自Hugging Face、Replicate、Foundry、Tecton、Weights & Biases、PyTorch、Scale等知名公司的计算资源、模型库、托管服务、训练框架和实验跟踪等一整套工具。

    15% built custom language models from scratch or open source, often in addition to using LLM APIs. Custom model training increased meaningfully from a few months ago. This requires its own stack of compute, model hub, hosting, training frameworks, experiment tracking and more from beloved companies like Hugging Face, Replicate, Foundry, Tecton, Weights & Biases, PyTorch, Scale, and more.



我们与每位从业者交谈后得出的结论是,人工智能的发展速度过快,使得对最终状态的技术堆栈缺乏高度的信心。然而,大家普遍认为,语言理解模型(LLM)API将继续是关键支柱,其次是检索机制和开发框架,如LangChain。开源和定制模型训练和调优似乎也在不断增长。技术堆栈的其他领域也很重要,但在成熟度上较为靠后。
Every practitioner we spoke with said AI is moving too quickly to have high confidence in the end-state stack, but there was consensus that LLM APIs will remain a key pillar, followed in popularity by retrieval mechanisms and development frameworks like LangChain. Open source and custom model training and tuning also seem to be on the rise. Other areas of the stack are important, but earlier in maturity. 

3. 公司希望根据自身独特的背景定制语言模型。
3. Companies want to customize language models to their unique context. 
通用语言模型功能强大,但对于许多使用情况来说,缺乏差异化和充分性。公司希望能够在其数据上实现自然语言交互,包括开发者文档、产品库存、人力资源或信息技术规则等。在某些情况下,公司还希望根据用户的数据定制模型,例如个人笔记、设计布局、数据指标或代码库。
Generalized language models are powerful, but not differentiating or sufficient for many use cases. Companies want to enable natural language interactions on their data—their developer docs, product inventory, HR or IT rules, etc. In some cases, companies want to customize their models to their users’ data as well: your personal notes, design layouts, data metrics or code base. 
目前,有三种主要的方式来定制语言模型(有关更深入的技术解释,请参阅安德烈在微软Build大会上最近的GPT现状演讲)
Right now, there are three main ways to customize language models (for a deeper technical explanation, see Andrej’s recent State of GPT talk at Microsoft Build): 
  • 从零开始训练一个定制模型。难度最高。

    Train a custom model from scratch. Highest degree of difficulty.

    这是解决这个问题的经典且最困难的方式。通常需要高技能的机器学习科学家、大量相关数据、训练基础设施和计算能力。这也是为什么在历史上,自然语言处理创新主要发生在大型科技公司的主要原因之一。BloombergGPT是一个很好的例子,它是在大型科技公司之外进行的自定义模型工作,利用了Hugging Face和其他开源工具的资源。随着开源工具的改进和更多公司使用LLM进行创新,我们预计会看到更多自定义和预训练模型的使用。

    This is the classical and hardest way to solve this problem. It typically requires highly skilled ML scientists, lots of relevant data, training infrastructure and compute. This is one of the primary reasons why historically much natural language processing innovation occurred within mega-cap tech companies. BloombergGPT is a great example of a custom model effort outside of a mega-cap tech company, which used resources on Hugging Face and other open source tooling. As open source tooling improves and more companies innovate with LLMs, we expect to see more custom and pre-trained model usage.


  • 微调基础模型。中等难度。

    Fine-tune a base model. Medium degree of difficulty.

    这是通过使用专有或领域特定数据进行进一步训练,来更新预训练模型的权重。开源创新也使得这种方法越来越易于使用,但它通常仍需要一个精通的团队。一些从业者私下承认微调比听起来要困难得多,并且可能会产生意想不到的后果,如模型漂移和“破坏”模型的其他技能而没有警告。虽然这种方法有更大的机会变得更加普遍,但目前对大多数公司来说仍然难以实现。但再次强调,这种情况正在迅速改变。

    This is updating the weights of a pre-trained model through additional training with further proprietary or domain-specific data. Open source innovation is also making this approach increasingly accessible, but it still often requires a sophisticated team. Some practitioners privately admit fine-tuning is much harder than it sounds and can have unintended consequences like model drift and “breaking” the model’s other skills without warning. While this approach has a greater chance of becoming more common, it is currently still out of reach for most companies. But again, this is changing quickly.


  • 使用预训练模型并检索相关上下文。难度最低。

    Use a pre-trained model and retrieve relevant context. Lowest degree of difficulty.

    人们常常认为他们想要一个专门为他们调整的模型,但实际上他们只是希望模型能在适当的时候推理他们的信息。有许多方法可以在适当的时候向模型提供正确的信息:向SQL数据库发出结构化查询、搜索产品目录、调用一些外部API或使用嵌入检索。嵌入检索的好处在于它可以使用自然语言轻松搜索非结构化数据。从技术上讲,这是通过将数据转化为嵌入向量,将其存储在向量数据库中,当发生查询时,搜索这些嵌入向量以找到最相关的上下文,并将其提供给模型。这种方法有助于突破模型的有限上下文窗口,成本较低,解决了数据新鲜度问题(例如,ChatGPT不知道2021年9月之后的世界),并且可以由一个独立的开发者完成,无需正式的机器学习培训。向量数据库非常有用,因为在大规模情况下,它们使存储、搜索和更新嵌入向量变得更加容易。到目前为止,我们观察到大型公司通常会遵守他们的企业云协议,并使用云服务提供商提供的工具,而初创公司则倾向于使用专门构建的向量数据库。然而,这个领域变化非常快。上下文窗口正在扩大(OpenAI刚刚扩展到16K,Anthropic推出了一个100K令牌的上下文窗口)。基础模型和云数据库可能会直接将检索功能嵌入到他们的服务中。我们密切关注这个市场的发展。

    People often think they want a model fine-tuned just for them, when really they just want the model to reason about their information at the right time. There are many ways to provide the model the right information at the right time: make a structured query to a SQL database, search across a product catalog, call some external API or use embeddings retrieval. The benefit of embeddings retrieval is that it makes unstructured data easily searchable using natural language. Technically, this is done by taking data, turning it into embeddings, storing those in a vector database, and when a query occurs, searching those embeddings for the most relevant context, and providing that to the model. This approach helps you hack the model’s limited context window, is less expensive, solves the data freshness problem (e.g. ChatGPT doesn’t know about the world after September 2021), and it can be done by a solo developer without formal machine learning training. Vector databases are useful because at high scale they make storing, searching and updating embeddings easier. So far, we’ve observed larger companies stay within their enterprise cloud agreements and use tools from their cloud provider, while startups tend to use purpose-built vector databases. However, this space is highly dynamic. Context windows are growing (hot off the presses, OpenAI just expanded to 16K, and Anthropic has launched a 100K token context window). Foundational models and cloud databases may embed retrieval directly into their services. We’re watching closely as this market evolves.

4. 今天,LLM API的技术栈与自定义模型训练的技术栈可能感觉是分开的,但随着时间的推移,它们会逐渐融合在一起。
4. Today, the stack for LLM APIs can feel separate from the custom model training stack, but these are blending together over time.
有时候我们会感觉好像有两种技术栈:一种是利用LLM API的技术栈(更多是闭源的,面向开发人员),另一种是训练自定义语言模型的技术栈(更多是开源的,历史上面向更复杂的机器学习团队)。有些人想知道LLM API是否意味着公司会减少自己的自定义训练。到目前为止,我们看到的情况恰恰相反。随着对人工智能的兴趣增长和开源开发的加速,许多公司对训练和微调自己的模型越来越感兴趣。我们认为LLM API和自定义模型技术栈将随着时间的推移越来越融合。例如,一家公司可能会从开源中训练自己的语言模型,但通过向量数据库进行检索来解决数据新鲜度的问题。为自定义模型技术栈构建工具的智能初创企业也在努力将其产品扩展为更符合LLM API革命的需求。
It can sometimes feel like we have a tale of two stacks: the stack to leverage LLM APIs (more closed source, and geared towards developers) versus the stack to train custom language models (more open source, and historically geared towards more sophisticated machine learning teams). Some have wondered whether LLMs being readily available via API meant companies would do less of their own custom training. So far, we’re seeing the opposite. As interest in AI grows and open source development accelerates, many companies become increasingly interested in training and fine-tuning their own models. We think the LLM API and custom model stacks will increasingly converge over time. For example, a company might train its own language model from open source, but supplement with retrieval via a vector database to solve data freshness issues. Smart startups building tools for the custom model stack are also working on extending their products to become more relevant to the LLM API revolution. 
5. 技术栈对开发人员越来越友好。
5. The stack is becoming increasingly developer-friendly.
语言模型API将强大的现成模型交到了普通开发者手中,而不仅仅是机器学习团队。现在,与语言模型一起工作的人群已经显著扩大到所有开发者,我们相信会看到更多面向开发者的工具。例如,LangChain通过抽象出常见问题,帮助开发者构建LLM应用程序:将模型组合成更高级的系统,将多个模型调用链接在一起,将模型连接到工具和数据源,构建能够操作这些工具的代理,并通过简化切换语言模型来帮助避免供应商锁定。一些人使用LangChain进行原型设计,而其他人则继续在生产中使用它。
Language model APIs put powerful ready-made models in the hands of the average developer, not just machine learning teams. Now that the population working with language models has meaningfully expanded to all developers, we believe we’ll see more developer-oriented tooling. For example, LangChain helps developers build LLM applications by abstracting away commonly occurring problems: combining models into higher-level systems, chaining together multiple calls to models, connecting models to tools and data sources, building agents that can operate those tools, and helping avoid vendor lock-in by making it easier to switch language models. Some use LangChain for prototyping, while others continue to use it in production. 

6. 语言模型需要变得更加可靠(输出质量、数据隐私、安全性),以便得到全面采用。
6. Language models need to become more trustworthy (output quality, data privacy, security) for full adoption. 
在充分应用LLM之前,许多公司希望拥有更好的工具来处理数据隐私、隔离、安全、版权和监控模型输出。从金融科技到医疗保健等受监管行业的公司尤其关注此问题,并报告称难以找到解决方案的软件(这是创始人的一个有潜力的领域)。理想情况下,应该有软件能够在模型生成错误/幻觉、歧视性内容、危险内容或其他问题时发出警报,甚至防止其发生。一些公司还担心与模型共享的数据在训练中的使用:例如,很少有人了解ChatGPT Consumer数据默认用于训练,而ChatGPT Business和API数据则不是。随着政策的明确和更多的监管措施的出台,语言模型将会更受信任,我们可能会看到采用率的另一个重大变化。
Before fully unleashing LLMs in their applications, many companies want better tools for handling data privacy, segregation, security, copyright, and monitoring model outputs. Companies in regulated industries from fintech to healthcare are especially focused on this and report having trouble finding software solutions to address it (a ripe area for founders). Ideally there would be software to alert, if not prevent, models from generating errors/hallucinations, discriminatory content, dangerous content, or other issues. Some companies are also concerned about how data shared with models is used for training: for instance, few understand that ChatGPT Consumer data is default used for training, while ChatGPT Business and the API data are not. As policies get clarified and more guardrails go into place, language models will be better trusted, and we may see another step change in adoption. 

7. 语言模型应用将会越来越广泛。
7. Language model applications will become increasingly 
公司已经找到了一些有趣的方法来将多个生成模型结合起来,取得了很好的效果:结合文本和语音生成的聊天机器人开启了一种全新的对话体验。文本和语音模型可以结合起来,帮助您快速覆盖视频录制错误,而不必重新录制整个视频。模型本身也越来越多模态。我们可以想象未来丰富的消费者和企业人工智能应用,结合文本、语音/音频和图像/视频生成,创造更具吸引力的用户体验,并完成更复杂的任务。
Companies are already finding interesting ways to combine multiple generative models to great effect: Chatbots that combine text and speech generation unlock a new level of conversational experience. Text and voice models can be combined to help you to quickly overdub a video recording mistake instead of re-recording the whole thing. Models themselves are becoming increasingly multi-modal. We can imagine a future of rich consumer and enterprise AI applications that combine text, speech/audio, and image/video generation to create more engaging user experiences and accomplish more complex tasks. 

8. 时间还早。
8. It’s still early.
人工智能正开始渗透到科技的方方面面。在受访者中,只有65%的人目前正在进行生产,而且其中许多是相对简单的应用。随着更多公司推出LLM应用,新的障碍将会出现,为创始人们创造更多机会。未来几年,基础设施层将继续快速发展。如果我们看到的演示中只有一半能够投入生产,那么我们将迎来一个令人兴奋的旅程。看到我们最早期的Arc投资到Zoom等创始人们都专注于同一件事情,即通过人工智能来让用户感到愉悦,这真是令人激动。
AI is just beginning to seep into every crevice of technology. Only 65% of those surveyed were in production today, and many of these are relatively simple applications. As more companies launch LLM applications, new hurdles will arise—creating more opportunities for founders. The infrastructure layer will continue to evolve rapidly for the next several years. If only half the demos we see make it to production, we’re in for an exciting ride ahead. It’s thrilling to see founders from our earliest-stage Arc investment to Zoom all laser focused on the same thing: delighting users with AI. 
如果你正在创办一家将成为语言模型堆栈或以人工智能为核心的应用的关键支柱的公司,Sequoia很愿意与你见面。
If you’re founding a company that will become a key pillar of the language model stack or an AI-first application, Sequoia would love to meet you. 
感谢所有为这项工作做出贡献的创始人和建设者,以及红杉合伙人 Charlie Curnin、Pat Grady、Sonya Huang、Andrew Reed、Bill Coughran 和 OpenAI 的朋友们的投入和审阅。
Thank you to all the founders and builders who contributed to this work, and Sequoia Partners Charlie Curnin, Pat Grady, Sonya Huang, Andrew Reed, Bill Coughran and friends at OpenAI for their input and review

参考材料

[1]https://www.sequoiacap.com/article/llm-stack-perspective/



我创建了AI应用层交流群,里面会实时分享AI最新有用的信息,群里信息密度和新鲜度非常高,交流氛围也很好,欢迎感兴趣的朋友可以后台回复关键词【微信】扫码入群。


往期文章:

白话OpenAI函数调用,完全不懂技术都能看懂
AI需要产品化,一个对话框是不够的
每个应用都会有一个副驾驶,但跟你无关。
Poe 大模型时代的浏览器
发布仅10天的ChatGPT APP下载量是多少?
每月狂收1890万的套壳Stable Diffusion应用
每月狂收$1000万美金的套壳ChatGPT应用

Prompt中文指南(一)结构与编写原则
Prompt中文指南(二)思维链(CoT)技巧触发了模型思考能力
Prompt中文指南(三)ChatGPT中的3种角色
Prompt中文指南(四)self-ask让模型学会自我提问
Prompt中文指南(五)ReAct让模型循环思考行动
一文看懂最火的AutoGPT、HuggingGPT、Visual ChatGPT到底是什么逻辑?
ChatGPT终于出官方App,Plus订阅更方便了

ChatGPT到底改变了什么?
ChatGPT的工作原理,这篇文章说清楚了
OpenAI的创始人之一Andrej揭秘OpenAI大模型原理和训练过程
继续滑动看下一个

您可能也对以下帖子感兴趣

文章有问题?点此查看未经处理的缓存