引言 #
Deep Research 是 OpenAI 在 2 月 2 号推出的一个基于 o3 模型的研究型 Agent 智能体。
OpenAI 的官方表示用户只需要给一个提示词,Agent 就会像一个研究员分析师那样,产出研究报告。
如果用户的研究细节不够明确,此时模型会提醒用户对细节进行澄清;如果用户的问题足够清晰,那么此时 Agent 会开始联网分析。
不过其实 Deep Research 这个概念 Google Gemini 在 12 月份就提到过。JinaAI 在 1 月份也开始尝试实现它。相继地,perplexity 也推出了 Deep Research 的 Agent。除此之外,LangChain、Dify、HuggingFace 等等也都开源了自己的实践方案。
今天我们从这些开源的案例来看看具体的实践方法。
Deep Research #
我们先大体介绍一下它的流程,从工程侧的角度是如何使用大模型驱动起来的。
首先明确这里会需要至少两个 Agent:
- 研究 Agent:用来决定是否进行下一个课题的研究,如果继续那么需要给出具体的研究方向
- 报告 Agent:用于把研究院整理的所有内容进行总结,并产出最终的研究报告
基于这两个 Agent,应用到 Deep Research 中的大体流程如下:
- 用户给出提问
- 研究员 Agent 会出给一个具体的课题
- 通过搜索引擎、向量库检索等等手段获取相关的信息
- 再次询问研究研究员 Agent,看看有没有下一个研究课题,如果有则回到步骤2,否则继续
- 将所有的研究数据给到报告员 Agent,让报告员 Agent 产出最终的报告
整体上类似一个 Reason + Act 模型,然后以这个为基础添加一下模型的反思 Reflection、需求的澄清等等。
不过在部分严格命名的实现中,上面这个流程只能被定义为 Deep Search,真正的 Deep Research 还需要在最开始的地方添加一个规划 Agent,用来生成解决这个问题的 COT 思维链,思维链给出的每一步都需要经过和上面一样的步骤,有点类似报告的大纲和每个章节;除此之外,还需要拥有反思能力,以便在每章生成的过程中保持整个报表的贴近课题。最终收集每一步的完整报告,再给出最终的超长报告。
Dify 的实现 #
概要 #
Dify 这里是通过 Dify 的 Workflow 功能实现的,具体可以看这篇文章 DeepResearch: Building a Research Automation App with Dify 。
搜索这部分是走的 Tavily 搜索。调用模型的地方大概分为了两处:
- 以 gpt-4o 实现的研究员:判断课题方向和是否结束
- 以 deepseek-r1 实现的报告员:总结搜索结果并产出最终报告
gpt-4o 研究 #
研究员 Agent 是 Workflow 迭代器中的 LLM 节点,使用的是gpt-4o。每次请求会让 gpt-4o 给出 nextSearchTopic
和 shouldContinue
。具体的提示词如下:
You are a research agent investigating the following topic.
What have you found? What questions remain unanswered? What specific aspects should be investigated next?
## Output
- Do not output topics that are exactly the same as already searched topics.
- If further information search is needed, set nextSearchTopic.
- If sufficient information has been obtained, set shouldContinue to false.
- Please output in json format
```json
nextSearchTopic: str | None
shouldContinue: bool
deepseek-r1 书写报告 #
产出报告阶段使用的是推理模型,这里的实现是直接用了 deepsekk-r1 来做这部分工作。
我们可以具体看一下人设的提示词。user
部分的提示词只是传了用户的 query
和 搜索接口中查询出来的 findings
。重点是 system
人设部分。
LangChain 的实现 #
流程概要 #
LangChain 这里主要是源码,它们实现了一整套规划、研究、产出报告的逻辑。
执行步骤 #
LangChain 这里将模型大体分为了两种类型:
planner_model
:规划模型,一般是推理模型,用于 ① 在最开始的时候规划需要生成的章节和课题 ② 在递归调用的时候,用于决定是否执行递归或执行下一个章节writer_model
:写作模型,一般是非推理模型,用于 ① 在调用查询之前生成 Query 列表,便于查询 ② 针对课题的每个章节,书写这部分的报告
其次看一下这里整体的调度逻辑吧,和我们之前说到的流程有很多细节的补充,十分值得一看、
根据下面的代码,整体的执行流程大致如下:
- 用户输入 Query
generate_report_plan
:生成报告产出的规划或计划,主要产出需要的课题并组织报告的目录结构。这里会先根据课题使用writer_model
来生成 Query 列表,其次使用 Query 列表去搜索引擎查询,最后让planner_model
根据查询结果生成整个报告的目录结构、章节、课题等。human_feedback
:提醒用户对产出的规划进行审阅,如果审批失败回到步骤 2 重新规划,通过则继续执行。build_section_with_web_research
:根据规划中的章节列表步骤,构建每个章节的书写任务。每个章节的具体执行在section_builder
中定义。generate_queries
:根据课题和当前的章节生成 query 列表,这里调用的是writer_model
search_web
:根据上一步产出的 Query 列表,查询搜索引擎,这里 LangChain 支持了 tavily、perplexity、exa、arxiv、pubmed 和 linkup 等write_section
:调用writer_model
模型总结这部分内容;然后让planner_model
作 reflection, 即判断当前章节的报告产出是否通过。如果未通过重新回到步骤 a 重新书写当;如果通过那么产出下一章节。
gather_completed_sections
:收集已经执行完成的所有章节的报告产出,转化为最终上下文initiate_final_section_writing
:针对第 2 步中不需要研究的章节,在这一步统一调用write_final_sections
方法生成这一章的具体内容write_final_sections
:根据之前研究并生成的所有章节,调用writer_model
来生成不需要研究的章节的内容,理论上这里也包含整个报告中的结语部分。compile_final_report
:将步骤 c 和 步骤6 中的结语部分拼接成最终的报告产出。
# Report section sub-graph --
# Add nodes
section_builder = StateGraph(SectionState, output=SectionOutputState)
section_builder.add_node("generate_queries", generate_queries)
section_builder.add_node("search_web", search_web)
section_builder.add_node("write_section", write_section)
# Add edges
section_builder.add_edge(START, "generate_queries")
section_builder.add_edge("generate_queries", "search_web")
section_builder.add_edge("search_web", "write_section")
# Outer graph --
# Add nodes
builder = StateGraph(ReportState, input=ReportStateInput, output=ReportStateOutput, config_schema=Configuration)
builder.add_node("generate_report_plan", generate_report_plan)
builder.add_node("human_feedback", human_feedback)
builder.add_node("build_section_with_web_research", section_builder.compile())
builder.add_node("gather_completed_sections", gather_completed_sections)
builder.add_node("write_final_sections", write_final_sections)
builder.add_node("compile_final_report", compile_final_report)
# Add edges
builder.add_edge(START, "generate_report_plan")
builder.add_edge("generate_report_plan", "human_feedback")
builder.add_edge("build_section_with_web_research", "gather_completed_sections")
builder.add_conditional_edges("gather_completed_sections", initiate_final_section_writing, ["write_final_sections"])
builder.add_edge("write_final_sections", "compile_final_report")
builder.add_edge("compile_final_report", END)
graph = builder.compile()
书写 Agent #
Query 生成 #
这部分我们在前面的步骤中说到,是用来帮助搜索引擎检索的 Query 列表。具体在步骤 2 和处理每个章节的步骤 a 中使用到了。但提示词还是略有不同。
我们先看看生成步骤 2 中生成用于查询 报告计划相关 数据的Query。system
人设提示词和 user
提示词如下:
### system
You are performing research for a report.
<Report topic>
{topic}
</Report topic>
<Report organization>
{report_organization}
</Report organization>
<Task>
Your goal is to generate {number_of_queries} web search queries that will help gather information for planning the report sections.
The queries should:
1. Be related to the Report topic
2. Help satisfy the requirements specified in the report organization
Make the queries specific enough to find high-quality, relevant sources while covering the breadth needed for the report structure.
</Task>
### user
Generate search queries that will help with planning the sections of the report.
其次就是在处理具体的某个章节的时候,需要查询当前章节的相关资料,这里也需要生成调用搜索引擎时的 Query 列表。system
人设提示词和 user
提示词如下:
### system
You are an expert technical writer crafting targeted web search queries that will gather comprehensive information for writing a technical report section.
<Report topic>
{topic}
</Report topic>
<Section topic>
{section_topic}
</Section topic>
<Task>
Your goal is to generate {number_of_queries} search queries that will help gather comprehensive information above the section topic.
The queries should:
1. Be related to the topic
2. Examine different aspects of the topic
Make the queries specific enough to find high-quality, relevant sources.
</Task>
### user
Generate search queries on the provided topic.
需要研究的章节生成 #
这部分是在处理每个需要进行研究的章节的时候,生成当前章节的报告。输入主要是整个报告的课题、章节名称、本章的子课题等等。除此之外就是对生成的报告字数、风格等等做了一些限制。
### system
You are an expert technical writer crafting one section of a technical report.
<Report topic>
{topic}
</Report topic>
<Section name>
{section_name}
</Section name>
<Section topic>
{section_topic}
</Section topic>
<Existing section content (if populated)>
{section_content}
</Existing section content>
<Source material>
{context}
</Source material>
<Guidelines for writing>
1. If the existing section content is not populated, write a new section from scratch.
2. If the existing section content is populated, write a new section that synthesizes the existing section content with the Source material.
</Guidelines for writing>
<Length and style>
- Strict 150-200 word limit
- No marketing language
- Technical focus
- Write in simple, clear language
- Start with your most important insight in **bold**
- Use short paragraphs (2-3 sentences max)
- Use ## for section title (Markdown format)
- Only use ONE structural element IF it helps clarify your point:
* Either a focused table comparing 2-3 key items (using Markdown table syntax)
* Or a short list (3-5 items) using proper Markdown list syntax:
- Use `*` or `-` for unordered lists
- Use `1.` for ordered lists
- Ensure proper indentation and spacing
- End with ### Sources that references the below source material formatted as:
* List each source with title, date, and URL
* Format: `- Title : URL`
</Length and style>
<Quality checks>
- Exactly 150-200 words (excluding title and sources)
- Careful use of only ONE structural element (table or list) and only if it helps clarify your point
- One specific example / case study
- Starts with bold insight
- No preamble prior to creating the section content
- Sources cited at end
</Quality checks>
### user
Generate a report section based on the provided sources.
无需研究的章节生成 #
这部分主要是在所有需要研究的章节都生成完毕之后,生成不需要研究的章节内容,理论上这里也包含整个报告的结语部分。
### system
You are an expert technical writer crafting a section that synthesizes information from the rest of the report.
<Report topic>
{topic}
</Report topic>
<Section name>
{section_name}
</Section name>
<Section topic>
{section_topic}
</Section topic>
<Available report content>
{context}
</Available report content>
<Task>
1. Section-Specific Approach:
For Introduction:
- Use # for report title (Markdown format)
- 50-100 word limit
- Write in simple and clear language
- Focus on the core motivation for the report in 1-2 paragraphs
- Use a clear narrative arc to introduce the report
- Include NO structural elements (no lists or tables)
- No sources section needed
For Conclusion/Summary:
- Use ## for section title (Markdown format)
- 100-150 word limit
- For comparative reports:
* Must include a focused comparison table using Markdown table syntax
* Table should distill insights from the report
* Keep table entries clear and concise
- For non-comparative reports:
* Only use ONE structural element IF it helps distill the points made in the report:
* Either a focused table comparing items present in the report (using Markdown table syntax)
* Or a short list using proper Markdown list syntax:
- Use `*` or `-` for unordered lists
- Use `1.` for ordered lists
- Ensure proper indentation and spacing
- End with specific next steps or implications
- No sources section needed
3. Writing Approach:
- Use concrete details over general statements
- Make every word count
- Focus on your single most important point
</Task>
<Quality Checks>
- For introduction: 50-100 word limit, # for report title, no structural elements, no sources section
- For conclusion: 100-150 word limit, ## for section title, only ONE structural element at most, no sources section
- Markdown format
- Do not include word count or any preamble in your response
</Quality Checks>
### user
Generate a report section based on the provided sources.
规划 Agent #
生成报告计划 #
这里的规划,指的是在上面第二步中的 planner_model
模型。
通过上面的执行步骤我们知道了,planner_model
会根据检索出来的相关内容,生成整个报告整体的章节、子课题、目录结构等等。
下面看看提示词。system
人设提示词中具体的输入有:
topic
:用户传递的课题report_organization
:报告组织,可选项context
:从搜索引擎中查询出来的上下文信息feedback
:这是在第 3 步中,用户对生成的规划不满意后的具体反馈内容,首次生成是没有的
I want a plan for a report that is concise and focused.
<Report topic>
The topic of the report is:
{topic}
</Report topic>
<Report organization>
The report should follow this organization:
{report_organization}
</Report organization>
<Context>
Here is context to use to plan the sections of the report:
{context}
</Context>
<Task>
Generate a list of sections for the report. Your plan should be tight and focused with NO overlapping sections or unnecessary filler.
For example, a good report structure might look like:
1/ intro
2/ overview of topic A
3/ overview of topic B
4/ comparison between A and B
5/ conclusion
Each section should have the fields:
- Name - Name for this section of the report.
- Description - Brief overview of the main topics covered in this section.
- Research - Whether to perform web research for this section of the report.
- Content - The content of the section, which you will leave blank for now.
Integration guidelines:
- Include examples and implementation details within main topic sections, not as separate sections
- Ensure each section has a distinct purpose with no content overlap
- Combine related concepts rather than separating them
Before submitting, review your structure to ensure it has no redundant sections and follows a logical flow.
</Task>
<Feedback>
Here is feedback on the report structure from review (if any):
{feedback}
</Feedback>
输出是在 user
用户提示词中指定的,具体有每个章节对应的名称、描述、计划、是否需要研究和相关的上下文。
Generate the sections of the report. Your response must include a 'sections' field containing a list of sections.
Each section must have: name, description, plan, research, and content fields.
判断报告内容 Reflection #
这是每个section_builder
的核心,在书写 Agent 生成完当前部分的章节后,判断是否需要重写这章节,也就是流程图中的 reflection 反思。这里的提示词加上了当前生成章节的内容。
让大模型返回是否通过,如果未通过则需要返回继续或者说是重写的 Query 列表,为重新生成章节内容给出一个大致的方向。具体的提示词如下:
### system
Review a report section relative to the specified topic:
<Report topic>
{topic}
</Report topic>
<section topic>
{section_topic}
</section topic>
<section content>
{section}
</section content>
<task>
Evaluate whether the section content adequately addresses the section topic.
If the section content does not adequately address the section topic, generate {number_of_follow_up_queries} follow-up search queries to gather missing information.
</task>
<format>
grade: Literal["pass","fail"] = Field(
description="Evaluation result indicating whether the response meets requirements ('pass') or needs revision ('fail')."
)
follow_up_queries: List[SearchQuery] = Field(
description="List of follow-up search queries.",
)
</format>
### user
Grade the report and consider follow-up questions for missing information.
If the grade is 'pass', return empty strings for all follow-up queries.
If the grade is 'fail', provide specific search queries to gather missing information.
HuggingFace smolagents 的实现 #
概要 #
HuggingFace 在 smolagents
库的示例中给了一个基础的 DeepResearch 实现。
这个实现主要是基于经典的 ReAct 提示框架,即 Reason + Act——给 Agent 设定好特定的工具集后,让 Agent 自行根据问题触发多工具执行。示例这里封装了两个 Agent:
text_webbrowser_agent
:用于触发浏览器工具 function call 的 ToolCallingAgent(MultiStepAgent) ,封装了谷歌查询、浏览器访问特定地址、页面向下、页面向上、浏览器查询、archive 查询等浏览器相关的工具,还有解析文件中文本信息的工具manager_agent
: CodeAgent(MultiStepAgent) 实例,封装了 CodeAct 的主要执行框架,这块是smolagents
之前就实现好的,传参这部分主要是传了text_webbrowser_agent
智能体。相当于是这个 agent 将另一个 agent 作为工具来驱动它的执行
🤗 smolagents: a barebones library for agents. Agents write python code to call tools and orchestrate other agents.
MultiStepAgent #
HuggingFace 的实现主要是基于 MultiStepAgent 实现的。CodeAgent(MultiStepAgent) 和 ToolCallingAgent(MultiStepAgent) 都是 MultiStepAgent 的子类,大体流程基本是类似的。
所以这里我们先看看 MultiStepAgent 的执行流程:
调用路径是:MultiStepAgent**.**run
→ MultiStepAgent**.**_run
→ MultiStepAgent**.**_execute_step
→ MultiStepAgent**.**planning_step
→ CodeAgent/ToolCallingAgent.step
→ MultiStepAgent**.**_run
→ … 直到找到 final_answer
或者递归的次数达到 max_steps
# https://github.com/huggingface/smolagents/blob/main/src/smolagents/agents.py
class MultiStepAgent:
def run(
self,
task: str,
stream: bool = False,
reset: bool = True,
images: Optional[List[str]] = None,
additional_args: Optional[Dict] = None,
max_steps: Optional[int] = None,
):
"""
Run the agent for the given task.
Args:
task (`str`): Task to perform.
stream (`bool`): Whether to run in a streaming way.
reset (`bool`): Whether to reset the conversation or keep it going from previous run.
images (`list[str]`, *optional*): Paths to image(s).
additional_args (`dict`, *optional*): Any other variables that you want to pass to the agent run, for instance images or dataframes. Give them clear names!
max_steps (`int`, *optional*): Maximum number of steps the agent can take to solve the task. if not provided, will use the agent's default value.
"""
...
if stream:
# The steps are returned as they are executed through a generator to iterate on.
return self._run(task=self.task, max_steps=max_steps, images=images)
# Outputs are returned only at the end. We only look at the last step.
return deque(self._run(task=self.task, max_steps=max_steps, images=images), maxlen=1)[0]
def _run(
self, task: str, max_steps: int, images: List[str] | None = None
) -> Generator[ActionStep | AgentType, None, None]:
final_answer = None
self.step_number = 1
while final_answer is None and self.step_number <= max_steps:
step_start_time = time.time()
memory_step = self._create_memory_step(step_start_time, images)
try:
final_answer = self._execute_step(task, memory_step)
except AgentError as e:
memory_step.error = e
finally:
self._finalize_step(memory_step, step_start_time)
yield memory_step
self.step_number += 1
if final_answer is None and self.step_number == max_steps + 1:
final_answer = self._handle_max_steps_reached(task, images, step_start_time)
yield memory_step
yield handle_agent_output_types(final_answer)
def _execute_step(self, task: str, memory_step: ActionStep) -> Union[None, Any]:
if self.planning_interval is not None and self.step_number % self.planning_interval == 1:
self.planning_step(task, is_first_step=(self.step_number == 1), step=self.step_number)
self.logger.log_rule(f"Step {self.step_number}", level=LogLevel.INFO)
final_answer = self.step(memory_step)
if final_answer is not None and self.final_answer_checks:
self._validate_final_answer(final_answer)
return final_answer
CodeAgent + ToolCallingAgent #
前面我们大概了解了一下 MultiStepAgent,现在看看这里的 DeepResearch 实现是怎么创建 CodeAgent
实例的。
这里设定了text_webbrowser_agent
和 manager_agent
;并指定 text_webbrowser_agent
的 max_steps 为 20,manager_agent
的 max_steps
为 12。
基本上就是基于 gpt-o1 封装了一个 CodeAgent,而 CodeAgent 里面又封装了一个 ToolCallingAgent 以此来驱动另一个智能体执行。
ToolCallingAgent 里面主要封装很多浏览器、搜索和文件解析相关的操作。
这两个 Agent 都是基于 Reason + Act 推理执行的方式运转。
# https://github.com/huggingface/smolagents/blob/main/examples/open_deep_research/run_gaia.py
model_params = {
"model_id": model_id,
"custom_role_conversions": custom_role_conversions,
"max_completion_tokens": 8192,
}
if model_id == "o1":
model_params["reasoning_effort"] = "high"
model = LiteLLMModel(**model_params)
# web tools
WEB_TOOLS = [
SearchInformationTool(browser),
VisitTool(browser),
PageUpTool(browser),
PageDownTool(browser),
FinderTool(browser),
FindNextTool(browser),
ArchiveSearchTool(browser),
# inspect file as text tool
TextInspectorTool(model, text_limit),
]
# tool calling agent for web tools
text_webbrowser_agent = ToolCallingAgent(
model=model,
tools=WEB_TOOLS,
max_steps=20,
verbosity_level=2,
planning_interval=4,
name="search_agent",
description="""A team member that will search the internet to answer your question.
Ask him for all your questions that require browsing the web.
Provide him as much context as possible, in particular if you need to search on a specific timeframe!
And don't hesitate to provide him with a complex search task, like finding a difference between two webpages.
Your request must be a real sentence, not a google search! Like "Find me this information (...)" rather than a few keywords.
""",
provide_run_summary=True,
)
text_webbrowser_agent.prompt_templates["managed_agent"]["task"] += """You can navigate to .txt online files.
If a non-html page is in another format, especially .pdf or a Youtube video, use tool 'inspect_file_as_text' to inspect it.
Additionally, if after some searching you find out that you need more information to answer the question, you can use `final_answer` with your request for clarification as argument to request for more information."""
# ReAct code agent instance
manager_agent = CodeAgent(
model=model,
tools=[visualizer, ti_tool],
max_steps=12,
verbosity_level=2,
additional_authorized_imports=AUTHORIZED_IMPORTS,
planning_interval=4,
managed_agents=[text_webbrowser_agent],
)
最后看看提示词,这里就相对普通一下。具体如下:
You have one question to answer. It is paramount that you provide a correct answer.
Give it all you can: I know for a fact that you have access to all the relevant tools to solve it and find the correct answer (the answer does exist). Failure or 'I cannot answer' or 'None found' will not be tolerated, success will be rewarded.
Run verification steps if that's needed, you must make sure you find the correct answer!
Here is the task:
{question}
To solve the task above, you will have to use these attached files:
{files}
Zilliz Cloud 的实现 #
概要 #
Zilliz 实现的是一个 DeepSearch,搜索部分也从搜索引擎换成了知识库检索。
流程图大概如下,整体比较清晰,我这里感觉没必要再复述一遍了。有三点值得关注一下:
- 这里的搜索改成了基于向量库做近似查询
- Reflection 的的内容不是模型生成的内容而是向量库查询出来的内容
- Query 列表查询向量库的过程中,会判断知识库内容和原始 Query 、Query 列表是否匹配,代码中被称为 Rerank,但这里不是重排序的意思
Open Source Deep Research Alternative to Reason and Search on Private Data. Written in Python.
生成 Query 列表 #
这里的生成 Query 只是生成第一步的 Query 列表,后续未通过的 Query 列表是在上面生成的。
所以这里的输入也比较简单,只有原始 Query 即 original_query
。
To answer this question more comprehensively, please break down the original question into up to four sub-questions. Return as list of str.
If this is a very simple question and no decomposition is necessary, then keep the only one original question in the python code list.
Original Question: {original_query}
<EXAMPLE>
Example input:
"Explain deep learning"
Example output:
[
"What is deep learning?",
"What is the difference between deep learning and machine learning?",
"What is the history of deep learning?"
]
</EXAMPLE>
Provide your response in a python code list of str format:
Rerank 判断 #
这里主要是判断检索结果和 Query 的关联度。输入的 query
是原始 Query + Query 列表,retrieved_chunk
是具体的向量查询结果。
Based on the query questions and the retrieved chunk, to determine whether the chunk is helpful in answering any of the query question, you can only return "YES" or "NO", without any other information.
Query Questions: {query}
Retrieved Chunk: {retrieved_chunk}
Is the chunk helpful in answering the any of the questions?
Reflection 反思判断 #
和之前看到的实现不同,这步主要是判断从知识库查询的内容是否符合。
如果符合那么根据这些内容来生成最终报告;如果不符合则重新生成 Query 列表并查询向量库。
这里的输入有 原始 Query question
、Query 列表 mini_questions
和 向量库查询出来的 mini_chunk_str
。
输出这里,如果通过则输出一个空列表,如果没通过则输出新一轮的 Query 列表以便下一次查询。
Determine whether additional search queries are needed based on the original query, previous sub queries, and all retrieved document chunks. If further research is required, provide a Python list of up to 3 search queries. If no further research is required, return an empty list.
If the original query is to write a report, then you prefer to generate some further queries, instead return an empty list.
Original Query: {question}
Previous Sub Queries: {mini_questions}
Related Chunks:
{mini_chunk_str}
Respond exclusively in valid List of str format without any other text.
产出结论 #
生成最终的结论,输入有原始 Query question
、Query 列表 mini_questions
和向量库查询的结果 mini_chunk_str
You are a AI content analysis expert, good at summarizing content. Please summarize a specific and detailed answer or report based on the previous queries and the retrieved document chunks.
Original Query: {question}
Previous Sub Queries: {mini_questions}
Related Chunks:
{mini_chunk_str}
总结 #
综观上面的这些时间,实现起来其实大同小异,核心主要还是生成计划或Query 列表 + 反思的这套模板。
- Dify 主要是基于目前现有的界面化功能实现的,实现的算是一个小的 DeepSearch,看起来简单好用有反思,但是缺少生成计划且最终报告是最后一个模型总结的
- LangChain 的实现个人认为是最完善的,有生成计划、逐章节生成,并在章节生成完成之后会进行反思,这种方案确实是一个完整的 DeepResearch 的实现,可以生成一个很长的报告,缺点就是耗时且费钱
- HuggingFace 的 smolagents 示例是基于 MultiStepAgent 实现的,有计划步骤、也会根据计划步骤逐步去生成,但是缺少反思
- Ziiliz Cloud 的实现算是实现了一个 DeepSearch,流程上比 DeepResearch 会简化不少,同时有反思,并将检索改为了向量检索,但是缺少了类似逐章节生成的功能
总计来说个人觉得,LangChain 的实现更为全面, Zilliz Cloud 和 HuggingFace 的实现也挺有意思的。
最后还有一些像 JinaAI 的实现和 deep-research 这里没有细看,感兴趣的也可以了解下。
参考 #
Open-source DeepResearch – Freeing our search agents
DeepResearch: Building a Research Automation App with Dify - Dify Blog
A Practical Guide to Implementing DeepSearch/DeepResearch
Try Deep Research and our new experimental model in Gemini, your AI assistant
smolagents/examples/open_deep_research at main · huggingface/smolagents
🤗 smolagents: a barebones library for agents. Agents write python code to call tools and orchestrate other agents.
Open Source Deep Research Alternative to Reason and Search on Private Data. Written in Python.
An AI-powered research assistant that performs iterative, deep research on any topic by combining search engines, web scraping, and large language models. The goal of this repo is to provide the simplest implementation of a deep research agent - e.g. an agent that can refine its research direction overtime and deep dive into a topic.
Keep searching, reading webpages, reasoning until it finds the answer (or exceeding the token budget)