150 行代码,复刻「草莓」,青春版支持联网
前置一个有争议的个人观点:
o1 与其说是一个模型,不如说是一个自带任务规划和反思的 agent
这类的 agent 的最大优势,就是推理能力,以时间换性能,拿 token 换准确,有兴趣的同学可以读一下我之前写的一些内容:
《实用至上:智能体/Agent 是什么》:在这一篇里,我解释了 agent 的由来,以及探索路径
《OpenAI「草莓」今秋发布,随后是「猎户座」》:在这一篇里,我预测了 o1 的形态以及行为(agent based program)
我必然会认为 o1 很强,也很有用:在大模型进展缓慢的前提下,这种思路能有效提高模型的输出水平。对于最广大的 ai 用户来说,能有效提升模型使用效率。(更广大的用户,不用 ai)
但我也必然认为,拿 o1 去进行大模型参数比拼是极其不合适的,尤其是进行 0-shot 比较。
换一种说法: 拿一个反复检查 2 年半的试卷,和按时提交的试卷,去比准确率,很不合适。
在这篇文章里,我尝试用 150 行代码,构建一个能联网的、青春版的「草莓」
所谓青春版,是因为:
这里包含了最基础的项目规划和反思
没做任何微调,甚至没有用 openai 的模型,选了免费的智谱 glm-4-flash
我在里面加了 WebSearch,这样对于已知问题,可以更快的求解
注意:原版 o1 无法联网搜索,也无法使用任何的 tool
性能远没草莓好,没有内置 COT,仅作为 demo,用土方法模仿其功能
效果如下(回答 9.8 和 9.11 谁大):
接下来,我将先展示代码,然后说一下实现原理。
代码展示
先说一下,这里我用的 colab,所以 api_key=userdata.get('Key_Zhipu')。
联网这里,我用的 WebPilot 的搜索 api,所以有一个 {watt(problem)}
这两个东西,你可以根据需求来改
from openai import OpenAI
from dataclasses import dataclass, field
from typing import List, Optional
from IPython.display import display, Markdown
from google.colab import userdata
# Set your OpenAI API key securely
client = OpenAI(
api_key=userdata.get('Key_Zhipu'),
base_url="https://open.bigmodel.cn/api/paas/v4/"
)
model = "glm-4-flash"
# Define data models
@dataclass
class ThoughtStep:
step_answer: str
is_completed: bool
hint: str
@dataclass
class ReasoningProcess:
initial_problem: str
steps: List[ThoughtStep] = field(default_factory=list)
final_answer: Optional[str] = None
def solve_problem(problem: str, max_attempts: int = 10) -> ReasoningProcess:
"""
Solve a problem using multi-step reasoning, planning, and intelligent thinking.
"""
reasoning_process = ReasoningProcess(initial_problem=problem)
attempts = 0
is_completed = False
# Step 1: Analyze the problem and plan
analysis_prompt = f"""
You are an AI assistant that excels at solving complex STEM problems using multi-step reasoning.
When given a problem, first analyze it, think about possible solution methods, and plan the subsequent steps to solve it.
Problem:
{problem}
Web Search:
{watt(problem)}
Provide your analysis and step-by-step plan in plain text.
"""
display(Markdown("**大聪明正在思考...**"))
messages = [{"role": "user", "content": analysis_prompt}]
response = client.chat.completions.create(
model=model,
messages=messages
).choices[0].message.content.strip()
# Display AI's initial analysis
display(Markdown(f"### AI Initial Analysis:\n{response}\n"))
hint = response
analysis_step = ThoughtStep(step_answer="", is_completed=False, hint=hint)
reasoning_process.steps.append(analysis_step)
messages = [{"role": "system", "content": "You are an AI assistant continuing the problem-solving process."},
{"role": "user", "content": "Giving a thought about this problem: " + problem},
{"role": "assistant", "content": hint},
{"role": "user", "content": f"Solve it with this thought, and give the final answer"}]
# Continue with the plan and attempt to solve the problem
while not is_completed and attempts < max_attempts:
attempts += 1
# Phase 1: Generate the step answer based on the thought
response = client.chat.completions.create(
model=model,
messages=messages
).choices[0].message.content.strip()
# Extract step answer
step_answer = response.strip()
display(Markdown(f"### Step Answer (Attempt {attempts}):\n{step_answer}\n"))
# Phase 2: Validate the step answer using XML format
validation_prompt = f"""
You are an AI validator. Check if the following step answer solves the problem correctly:
Problem:
{problem}
Step Answer:
{step_answer}
Respond in XML format as follows:
<response>
<is_correct>Is this answer 100% correct? Return true or false</is_correct>
<hint>If the answer is incorrect, provide a new thought or hint.</hint>
</response>
"""
display(Markdown(f"**AI is validating step answer (Attempt {attempts})...**"))
messages_validation = [{"role": "user", "content": validation_prompt}]
response = client.chat.completions.create(
model=model,
messages=messages_validation
).choices[0].message.content.strip()
# Parse the XML response
try:
is_correct = 'true' in response.lower()
hint_start = response.find('<hint>') + len('<hint>')
hint_end = response.find('</hint>')
hint = response[hint_start:hint_end].strip() if hint_start != -1 and hint_end != -1 else "No hint provided"
except:
is_correct = False
hint = "Error parsing validation response."
# Update reasoning process
step = ThoughtStep(step_answer=step_answer, is_completed=is_correct, hint=hint)
reasoning_process.steps.append(step)
messages += [{"role": "assistant", "content": step_answer}]
if is_correct:
break # Exit loop if the step answer is correct
messages += [{"role": "user", "content": "Not correct, try with this: " + hint}]
# Final answer step
messages += [{"role": "user", "content": f"Based on your reasoning, provide the final answer to the problem and return it in the same language as the following: {reasoning_process.initial_problem}"}]
response = client.chat.completions.create(
model=model,
messages=messages
).choices[0].message.content.strip()
# Extract the final answer
reasoning_process.final_answer = response
# Display the final answer
display(Markdown(f"## Final Answer:\n{response}"))
return reasoning_process
def display_reasoning_process(process: ReasoningProcess) -> None:
"""
Display the reasoning process details.
"""
display(Markdown(f"## Problem:\n{process.initial_problem}\n"))
for idx, step in enumerate(process.steps, 1):
display(Markdown(f"### Step {idx}:\n**Hint**: {step.hint}\n**Is Completed**: {step.is_completed}\n"))
if process.final_answer:
display(Markdown(f"## Final Answer:\n{process.final_answer}"))
else:
display(Markdown("## Final Answer: Not determined yet."))
# Example usage
if __name__ == "__main__":
problem_text = """9.8 和 9.11 谁大"""
# Solve the problem
reasoning = solve_problem(problem_text)
原理解读
首先,这里我用的是 glm-4-flash,原因无他:免费。
整个实现的流程分几步:
第一步:任务规划。这个 agent 会先上网查阅有关问题的材料,并结合用户给到的问题进行分析,输出这个问题的解答规划
第二步:任务尝试。在收到规划后,这个 agent 会对问题进行尝试解决:
如果解决掉了(或者超出最大重试次数),则跳到第三步;
如果没解决,则反思一下自己为啥没解决好,然后自己 PUA 自己,并重试
第二步:任务收束。总结上面的问题解答,输出正式答案
最终,对于问题「回答 9.8 和 9.11 谁大」,输出这个(包含思考过程):
这类程序,其方法就是让 ai 反复 PUA 自己,或者在找一个 ai 来 PUA 干活的 ai,让他不断尝试、检查和改进,直到交工(是不是很熟悉)
说明了什么
从几个角度,我来说这件事:
o1 不神秘,你也可以做(青春版限定)
调成 o1 这个效果,还是得从多角度下功夫,无论是 agent 的工程化,还是对模型进行一些训练(cot 内化)
o1 会很有用,尤其是在合成数据,以及解决复杂任务这块
一定程度上,说明了模型本身训练遇到了一些瓶颈
prompt 工程会逐渐式微
以及,欢迎讨论下这个:《对于 AI & AGI,我有 3 个问题》
再以及,回头我来筹办个正式的「o1 算法挑战赛」,欢迎届时参加
(先让我去化缘点奖金,ahhhhhh