查看原文
其他

150 行代码,复刻「草莓」,青春版支持联网

金色传说大聪明 赛博禅心
2024-12-08

前置一个有争议的个人观点:

o1 与其说是一个模型,不如说是一个自带任务规划和反思的 agent


这类的 agent 的最大优势,就是推理能力,以时间换性能,拿 token 换准确,有兴趣的同学可以读一下我之前写的一些内容:


我必然会认为 o1 很强,也很有用:在大模型进展缓慢的前提下,这种思路能有效提高模型的输出水平。对于最广大的 ai 用户来说,能有效提升模型使用效率。(更广大的用户,不用 ai)

但我也必然认为,拿 o1 去进行大模型参数比拼是极其不合适的,尤其是进行 0-shot 比较。

换一种说法: 拿一个反复检查 2 年半的试卷,和按时提交的试卷,去比准确率,很不合适。


在这篇文章里,我尝试用 150 行代码,构建一个能联网的、青春版的「草莓」

所谓青春版,是因为:

  • 这里包含了最基础的项目规划和反思

  • 没做任何微调,甚至没有用 openai 的模型,选了免费的智谱 glm-4-flash

  • 我在里面加了 WebSearch,这样对于已知问题,可以更快的求解

    • 注意:原版 o1 无法联网搜索,也无法使用任何的 tool

  • 性能远没草莓好,没有内置 COT,仅作为 demo,用土方法模仿其功能


效果如下(回答 9.8 和 9.11 谁大):


接下来,我将先展示代码,然后说一下实现原理。


代码展示

先说一下,这里我用的 colab,所以 api_key=userdata.get('Key_Zhipu')。

联网这里,我用的 WebPilot 的搜索 api,所以有一个 {watt(problem)}

这两个东西,你可以根据需求来改

from openai import OpenAIfrom dataclasses import dataclass, fieldfrom typing import List, Optionalfrom IPython.display import display, Markdownfrom google.colab import userdata
# Set your OpenAI API key securelyclient = OpenAI( api_key=userdata.get('Key_Zhipu'), base_url="https://open.bigmodel.cn/api/paas/v4/") model = "glm-4-flash"
# Define data models@dataclassclass ThoughtStep: step_answer: str is_completed: bool hint: str
@dataclassclass ReasoningProcess: initial_problem: str steps: List[ThoughtStep] = field(default_factory=list) final_answer: Optional[str] = None
def solve_problem(problem: str, max_attempts: int = 10) -> ReasoningProcess: """ Solve a problem using multi-step reasoning, planning, and intelligent thinking. """ reasoning_process = ReasoningProcess(initial_problem=problem) attempts = 0 is_completed = False
# Step 1: Analyze the problem and plan analysis_prompt = f"""You are an AI assistant that excels at solving complex STEM problems using multi-step reasoning.When given a problem, first analyze it, think about possible solution methods, and plan the subsequent steps to solve it.
Problem:{problem}
Web Search:{watt(problem)}
Provide your analysis and step-by-step plan in plain text."""
display(Markdown("**大聪明正在思考...**")) messages = [{"role": "user", "content": analysis_prompt}] response = client.chat.completions.create( model=model, messages=messages ).choices[0].message.content.strip()
# Display AI's initial analysis display(Markdown(f"### AI Initial Analysis:\n{response}\n"))
hint = response analysis_step = ThoughtStep(step_answer="", is_completed=False, hint=hint) reasoning_process.steps.append(analysis_step)
messages = [{"role": "system", "content": "You are an AI assistant continuing the problem-solving process."}, {"role": "user", "content": "Giving a thought about this problem: " + problem}, {"role": "assistant", "content": hint}, {"role": "user", "content": f"Solve it with this thought, and give the final answer"}]
# Continue with the plan and attempt to solve the problem while not is_completed and attempts < max_attempts: attempts += 1 # Phase 1: Generate the step answer based on the thought response = client.chat.completions.create( model=model, messages=messages ).choices[0].message.content.strip()
# Extract step answer step_answer = response.strip() display(Markdown(f"### Step Answer (Attempt {attempts}):\n{step_answer}\n"))
# Phase 2: Validate the step answer using XML format validation_prompt = f"""You are an AI validator. Check if the following step answer solves the problem correctly:
Problem:{problem}
Step Answer:{step_answer}
Respond in XML format as follows:<response> <is_correct>Is this answer 100% correct? Return true or false</is_correct> <hint>If the answer is incorrect, provide a new thought or hint.</hint></response>"""
display(Markdown(f"**AI is validating step answer (Attempt {attempts})...**")) messages_validation = [{"role": "user", "content": validation_prompt}] response = client.chat.completions.create( model=model, messages=messages_validation ).choices[0].message.content.strip()
# Parse the XML response try: is_correct = 'true' in response.lower() hint_start = response.find('<hint>') + len('<hint>') hint_end = response.find('</hint>') hint = response[hint_start:hint_end].strip() if hint_start != -1 and hint_end != -1 else "No hint provided" except: is_correct = False hint = "Error parsing validation response."
# Update reasoning process step = ThoughtStep(step_answer=step_answer, is_completed=is_correct, hint=hint) reasoning_process.steps.append(step)
messages += [{"role": "assistant", "content": step_answer}]
if is_correct: break # Exit loop if the step answer is correct
messages += [{"role": "user", "content": "Not correct, try with this: " + hint}]
# Final answer step messages += [{"role": "user", "content": f"Based on your reasoning, provide the final answer to the problem and return it in the same language as the following: {reasoning_process.initial_problem}"}] response = client.chat.completions.create( model=model, messages=messages ).choices[0].message.content.strip()
# Extract the final answer reasoning_process.final_answer = response
# Display the final answer display(Markdown(f"## Final Answer:\n{response}"))
return reasoning_process
def display_reasoning_process(process: ReasoningProcess) -> None: """ Display the reasoning process details. """ display(Markdown(f"## Problem:\n{process.initial_problem}\n")) for idx, step in enumerate(process.steps, 1): display(Markdown(f"### Step {idx}:\n**Hint**: {step.hint}\n**Is Completed**: {step.is_completed}\n")) if process.final_answer: display(Markdown(f"## Final Answer:\n{process.final_answer}")) else: display(Markdown("## Final Answer: Not determined yet."))
# Example usageif __name__ == "__main__": problem_text = """9.8 和 9.11 谁大"""
# Solve the problem reasoning = solve_problem(problem_text)


原理解读

首先,这里我用的是 glm-4-flash,原因无他:免费。

整个实现的流程分几步:

  • 第一步:任务规划。这个 agent 会先上网查阅有关问题的材料,并结合用户给到的问题进行分析,输出这个问题的解答规划

  • 第二步:任务尝试。在收到规划后,这个 agent 会对问题进行尝试解决:

    • 如果解决掉了(或者超出最大重试次数),则跳到第三步;

    • 如果没解决,则反思一下自己为啥没解决好,然后自己 PUA 自己,并重试

  • 第二步:任务收束。总结上面的问题解答,输出正式答案

最终,对于问题「回答 9.8 和 9.11 谁大」,输出这个(包含思考过程):


这类程序,其方法就是让 ai 反复 PUA 自己,或者在找一个 ai 来 PUA 干活的 ai,让他不断尝试、检查和改进,直到交工(是不是很熟悉)


说明了什么

从几个角度,我来说这件事:

  • o1 不神秘,你也可以做(青春版限定)

  • 调成 o1 这个效果,还是得从多角度下功夫,无论是 agent  的工程化,还是对模型进行一些训练(cot 内化)

  • o1 会很有用,尤其是在合成数据,以及解决复杂任务这块

  • 一定程度上,说明了模型本身训练遇到了一些瓶颈

  • prompt 工程会逐渐式微


以及,欢迎讨论下这个:《对于 AI & AGI,我有 3 个问题

再以及,回头我来筹办个正式的「o1 算法挑战赛」,欢迎届时参加

(先让我去化缘点奖金,ahhhhhh

修改于
继续滑动看下一个
赛博禅心
向上滑动看下一个

您可能也对以下帖子感兴趣

文章有问题?点此查看未经处理的缓存