
AutoGen 从入门到高阶系列二:详解AutoGen框架常见的内置智能体 原创
上一篇文章我分享了AutoGen的基础使用AutoGen从入门到高阶系列一:如何从零开始构建你的第一个智能体?,这一篇文章来分享一下AutoGen里面的智能体。
内置了哪些智能体
-
UserProxyAgent
: 一个智能体,接收用户输入并将其作为响应返回 - AssistantAgent:这是一个通用的智能体,它利用大型语言模型(LLM)进行文本生成与推理,还可通过工具(tools)扩展其能力。
-
CodeExecutorAgent
: 一个可以执行代码的智能体。 -
OpenAIAssistantAgent
: 一个由OpenAI Assistant支持的智能体,能够使用自定义工具。 -
MultimodalWebSurfer
: 一个多模态智能体,可以搜索网页并访问网页以获取信息。 -
FileSurfer
: 一个可以搜索和浏览本地文件以获取信息的智能体。 -
VideoSurfer
: 一个可以观看视频以获取信息的智能体。
通用方法
所有智能体都继承自ChatAgent,然后各自有各自的处理消息的不同方式。
class ChatAgent(ABC, TaskRunner, ComponentBase[BaseModel]):
"""Protocol for a chat agent."""
component_type = "agent"
@property
@abstractmethod
def name(self) -> str:
"""The name of the agent. This is used by team to uniquely identify
the agent. It should be unique within the team."""
...
@property
@abstractmethod
def description(self) -> str:
"""The description of the agent. This is used by team to
make decisions about which agents to use. The description should
describe the agent's capabilities and how to interact with it."""
...
@property
@abstractmethod
def produced_message_types(self) -> Sequence[type[BaseChatMessage]]:
"""The types of messages that the agent produces in the
:attr:`Response.chat_message` field. They must be :class:`BaseChatMessage` types."""
...
@abstractmethod
asyncdef on_messages(self, messages: Sequence[BaseChatMessage], cancellation_token: CancellationToken) -> Response:
"""Handles incoming messages and returns a response."""
...
@abstractmethod
def on_messages_stream(
self, messages: Sequence[BaseChatMessage], cancellation_token: CancellationToken
) -> AsyncGenerator[BaseAgentEvent | BaseChatMessage | Response, None]:
"""Handles incoming messages and returns a stream of inner messages and
and the final item is the response."""
...
@abstractmethod
asyncdef on_reset(self, cancellation_token: CancellationToken) -> None:
"""Resets the agent to its initialization state."""
...
@abstractmethod
asyncdef on_pause(self, cancellation_token: CancellationToken) -> None:
"""Called when the agent is paused. The agent may be running in :meth:`on_messages` or
:meth:`on_messages_stream` when this method is called."""
...
@abstractmethod
asyncdef on_resume(self, cancellation_token: CancellationToken) -> None:
"""Called when the agent is resumed. The agent may be running in :meth:`on_messages` or
:meth:`on_messages_stream` when this method is called."""
...
@abstractmethod
asyncdef save_state(self) -> Mapping[str, Any]:
"""Save agent state for later restoration"""
...
@abstractmethod
asyncdef load_state(self, state: Mapping[str, Any]) -> None:
"""Restore agent from saved state"""
...
@abstractmethod
asyncdef close(self) -> None:
"""Release any resources held by the agent."""
...
-
name
: 智能体的唯一名称。 -
description
: 智能体的描述文本。 -
on_messages()
: 向智能体发送一系列ChatMessage
,并获取一个Response
。需要注意的是,智能体预计是有状态的,此方法应使用新消息调用,而不是完整的历史记录。 -
on_messages_stream()
: 与on_messages()
相同,但返回一个迭代器,其中包含AgentEvent
或ChatMessage
,最后一项是Response
。 -
on_reset()
: 将智能体重置为其初始状态。 -
run()
和run_stream()
: 这些是便捷方法,分别调用on_messages()
和on_messages_stream()
智能体的工作流程
下图展示了智能体的工作方式:
工具调用行为:
- 如果模型没有返回工具调用,那么响应将立即作为
TextMessage
在chat_message
中返回。 - When the model returns tool calls, they will be executed right away:
当reflect_on_tool_use为False(默认值)时,工具调用结果将作为ToolCallSummaryMessage
返回到chat_message
中。tool_call_summary_format可用于自定义工具调用摘要。当reflect_on_tool_use为True时,使用工具调用和结果进行另一个模型推理,并将文本响应作为TextMessage
返回在chat_message
中。 - 如果模型返回多个工具调用,它们将并发执行。要禁用并行工具调用,你需要配置模型客户端。例如,为
OpenAIChatCompletionClient
和AzureOpenAIChatCompletionClient
设置 parallel_tool_calls=False。
AssistantAgent
Tools
AssistantAgent 的核心功能是接收消息,利用 LLM 进行处理并生成响应。通过设置 system_message,能够定义智能体的角色与行为;设置 model_client 可指定使用的 LLM 模型;通过 tools 参数添加工具函数,扩展智能体功能;设置 reflect_on_tool_use=True,可让智能体反思工具的使用结果,并提供自然语言响应
FunctionTool
AssistantAgent
自动将 Python 函数转换为 FunctionTool
,该工具可以被智能体使用,并自动从函数签名和文档字符串生成工具架构。
import asyncio
from autogen_agentchat.agents import AssistantAgent
from autogen_agentchat.messages import TextMessage
from autogen_core import CancellationToken
from autogen_ext.models.openai import OpenAIChatCompletionClient
# Define a tool that searches the web for information.
asyncdef web_search(query: str) -> str:
"""Find information on the web"""
return"AutoGen is a programming framework for building multi-agent applications."
# Create an agent that uses the OpenAI GPT-4o model.
model_client = OpenAIChatCompletionClient(model="gpt-4o-mini", api_key=api_key, base_url=base_url)
agent = AssistantAgent(
name="assistant",
model_client=model_client,
tools=[web_search],
system_message="Use tools to solve tasks.",
)
query = "Find information on AutoGen"
asyncdef assistant_run() -> None:
response = await agent.on_messages(
[TextMessage(cnotallow=query, source="user")],
cancellation_token=CancellationToken(),
)
print(response.inner_messages)
print(response.chat_message)
调用 on_messages()
方法 返回一个 Response
, 其中包含智能体的最终响应在 chat_message
属性中, 以及一系列内部消息在 inner_messages
属性中, 该属性存储了智能体生成最终响应的“思考过程”:
[ToolCallRequestEvent(source='assistant', models_usage=RequestUsage(prompt_tokens=61, completion_tokens=16), metadata={}, cnotallow=[FunctionCall(id='call_Vcv22g0vCDaMd7mtdSpbB1zf', arguments='{"query":"AutoGen"}', name='web_search')], type='ToolCallRequestEvent'), ToolCallExecutionEvent(source='assistant', models_usage=None, metadata={}, cnotallow=[FunctionExecutionResult(cnotallow='AutoGen is a programming framework for building multi-agent applications.', name='web_search', call_id='call_Vcv22g0vCDaMd7mtdSpbB1zf', is_error=False)], type='ToolCallExecutionEvent')]
source='assistant' models_usage=None metadata={} cnotallow='AutoGen is a programming framework for building multi-agent applications.' type='ToolCallSummaryMessage'
当我们询问一个非web检索的问题:who are you? 的时候,inner_messages 返回空,chat_message 返回:
source='assistant' models_usage=RequestUsage(prompt_tokens=59, completion_tokens=42) metadata={} cnotallow="I am an AI language model designed to assist with a wide range of questions and tasks. I'm here to provide information, answer queries, and help with problem-solving! How can I assist you today?" type='TextMessage'
需要注意的是,
on_messages()
将会更新智能体的内部状态——它会将消息添加到智能体的历史记录中
默认情况下,当AssistantAgent
执行一个工具时,它会将工具的输出作为字符串返回到响应中的ToolCallSummaryMessage
中。如果您的工具没有返回一个自然语言中格式良好的字符串,您可以通过在AssistantAgent
构造函数中设置reflect_on_tool_use=True
参数,添加一个反思步骤,让模型总结工具的输出。
MCP
AssistantAgent
也可以使用从模型上下文协议(MCP)服务器提供的工具,通过 mcp_server_tools()
来实现。
from autogen_agentchat.agents import AssistantAgent
from autogen_ext.models.openai import OpenAIChatCompletionClient
from autogen_ext.tools.mcp import StdioServerParams, mcp_server_tools
# Get the fetch tool from mcp-server-fetch.
fetch_mcp_server = StdioServerParams(command="uvx", args=["mcp-server-fetch"])
tools = await mcp_server_tools(fetch_mcp_server)
# Create an agent that can use the fetch tool.
model_client = OpenAIChatCompletionClient(model="gpt-4o")
agent = AssistantAgent(name="fetcher", model_client=model_client, tools=tools, reflect_on_tool_use=True) # type: ignore
# Let the agent fetch the content of a URL and summarize it.
result = await agent.run(task="Summarize the content of https://en.wikipedia.org/wiki/Seattle")
print(result.messages[-1].content)
多模态输入
AssistantAgent
可以通过提供一个 MultiModalMessage
来处理多模态输入
from io import BytesIO
import PIL
import requests
from autogen_agentchat.messages import MultiModalMessage
from autogen_core import Image
# Create a multi-modal message with random image and text.
pil_image = PIL.Image.open(BytesIO(requests.get("https://picsum.photos/300/200").content))
img = Image(pil_image)
multi_modal_message = MultiModalMessage(cnotallow=["Can you describe the content of this image?", img], source="user")
img
经过测试,发现如果图片尺寸太大,openai会报请求体太大的错误:
openai.APIStatusError: <html>
<head><title>413 Request Entity Too Large</title></head>
<body>
<center><h1>413 Request Entity Too Large</h1></center>
<hr><center>nginx/1.18.0 (Ubuntu)</center>
</body>
</html>
流式
你可以通过设置model_client_stream=True
来流式传输模型客户端生成的标记。 这将使智能体生成ModelClientStreamingChunkEvent
消息 在on_messages_stream()
和run_stream()
中。
model_client = OpenAIChatCompletionClient(model="gpt-4o")
streaming_assistant = AssistantAgent(
name="assistant",
model_client=model_client,
system_message="You are a helpful assistant.",
model_client_stream=True, # Enable streaming tokens.
)
# Use an async function and asyncio.run() in a script.
async for message in streaming_assistant.on_messages_stream( # type: ignore
[TextMessage(cnotallow="Name two cities in South America", source="user")],
cancellation_token=CancellationToken(),
):
print(message)
结构化输出
结构化输出允许模型返回带有应用程序提供的预定义模式的JSON格式文本。与JSON模式不同,模式可以作为Pydantic BaseModel类提供,该类还可以用于验证输出。
结构化输出仅适用于支持它的模型。它还需要模型客户端也支持结构化输出
from typing import Literal
from pydantic import BaseModel
# The response format for the agent as a Pydantic base model.
class AgentResponse(BaseModel):
thoughts: str
response: Literal["happy", "sad", "neutral"]
# Create an agent that uses the OpenAI GPT-4o model with the custom response format.
model_client = OpenAIChatCompletionClient(
model="gpt-4o",
response_format=AgentResponse, # type: ignore
)
agent = AssistantAgent(
"assistant",
model_client=model_client,
system_message="Categorize the input as happy, sad, or neutral following the JSON format.",
)
await Console(agent.run_stream(task="I am happy."))
输出:
---------- TextMessage (user) ----------
I am happy.
---------- TextMessage (assistant) ----------
{"thoughts":"The user explicitly states that they are happy, giving a clear indication of their emotional state.","response":"happy"}
模型上下文
AssistantAgent
有一个 model_context
参数,可以用于传递一个 ChatCompletionContext
对象。这使得智能体能够使用不同的模型上下文,例如 BufferedChatCompletionContext
来 限制发送到模型的上下文。
默认情况下,AssistantAgent
使用UnboundedChatCompletionContext
,它将完整的对话历史发送给模型。要将上下文限制为最后n
条消息,您可以使用BufferedChatCompletionContext
。
from autogen_core.model_context import BufferedChatCompletionContext
# Create an agent that uses only the last 5 messages in the context to generate responses.
agent = AssistantAgent(
name="assistant",
model_client=model_client,
tools=[web_search],
system_message="Use tools to solve tasks.",
model_cnotallow=BufferedChatCompletionContext(buffer_size=5), # Only use the last 5 messages in the context.
)
UserProxyAgent
该智能体主要用于接收用户输入,并将其发送给其他智能体,可看作是用户与多智能体系统交互的桥梁。其核心功能是接收用户输入,然后将其转换为消息发送给其他智能体。通过 input_func 参数,可自定义输入函数,例如使用 input () 从控制台接收用户输入。
async def assistant_run_stream() -> None:
assistant = AssistantAgent("assistant", model_client=model_client)
user_proxy = UserProxyAgent("user_proxy", input_func=input)
termination = TextMentionTermination("APPROVE")
team = RoundRobinGroupChat([assistant, user_proxy], termination_cnotallow=termination)
stream = team.run_stream(task="写一首关于明月的七言绝句")
await Console(stream)
返回结果如下:
---------- TextMessage (user) ----------
写一首关于明月的七言绝句
---------- TextMessage (assistant) ----------
明月高悬夜色空,清辉洒落万家同。
庭前花影随风舞,独坐窗前思古风。
TERMINATE
Enter your response: APPROVE
---------- TextMessage (user_proxy) ----------
APPROVE
MultimodalWebSurfer
MultimodalWebSurfer是一个多模态智能体,它扮演着网页浏览者的角色,可以搜索网页并访问网页。
安装:
pip install "autogen-ext[web-Surfer]"
playwright install
它启动一个chromium浏览器,并允许playwright与网页浏览器进行交互,可以执行多种操作。浏览器在第一次调用智能体时启动,并在后续调用中重复使用。
必须与支持函数/工具调用的多模态模型客户端一起使用,目前理想的选择是GPT-4o。
import asyncio
from autogen_agentchat.ui import Console
from autogen_agentchat.teams import RoundRobinGroupChat
from autogen_ext.models.openai import OpenAIChatCompletionClient
from autogen_ext.agents.web_Surfer import MultimodalWebSurfer
asyncdef main() -> None:
# Define an agent
web_Surfer_agent = MultimodalWebSurfer(
name="MultimodalWebSurfer",
model_client=OpenAIChatCompletionClient(model="gpt-4o-2024-08-06"),
)
# Define a team
agent_team = RoundRobinGroupChat([web_Surfer_agent], max_turns=3)
# Run the team and stream messages to the console
stream = agent_team.run_stream(task="Navigate to the AutoGen readme on GitHub.")
await Console(stream)
# Close the browser controlled by the agent
await web_Surfer_agent.close()
asyncio.run(main())
返回结果如下:
---------- TextMessage (user) ----------
Navigate to the AutoGen readme on GitHub.
---------- MultiModalMessage (MultimodalWebSurfer) ----------
I typed 'AutoGen readme site:github.com' into the browser search bar.
The web browser is open to the page [AutoGen readme site:github.com - 搜索](https://cn.bing.com/search?q=AutoGen+readme+site%3Agithub.com&FORM=QBLH&rdr=1&rdrig=19A3F496D9D246FD8DEF5EEA2303513E).
The viewport shows 36% of the webpage, and is positioned at the top of the page
The following text is visible in the viewport:
跳至内容
国内版
国际版
6
手机版网页
图片
视频
学术
词典
地图
更多
工具
约 495 个结果Github
https://github.com › ... › autogen › blob › …
翻译此结果
autogen/README.md at main · …2024年8月26日 · AutoGen is a framework for creating multi-agent AI applications that can act autonomously or work alongside humans. AutoGen requires Python 3.10 or later. The current stable version is v0.4. If you are upgrading from …
Github
https://github.com › NanGePlus
GitHub - NanGePlus/AutoGenV04Test: AutoGen …2025年1月13日 · 主要内容:AutoGen与DeepSeek R1模型集成(Ollama方式本地部署deepseek-r1:14b大模型)、AutoGen与MCP服务器集成、AutoGen与HTTP API工具集成 https://www.bilibili.com/video/BV1weKFeGEMX/
Github
https://github.com › ... › autogen › blob › main › README.md
autogen/README.md at main · liteli1987gmail/autogen · …AutoGen是一个由Microsoft开源的框架,专为构建和优化大型语言模型(LLM)工作流程而设计。 它提供了多智能体会话框架、应用程序构建工具以及推理性能优化的支持。
你可能喜欢的搜索
speechgenAddgeneAutoglymrunway genGithub
https://github.com
翻译此结果
GitHub - ag2ai/ag2: AG2 (formerly AutoGen): The Open …
The following metadata was extracted from the webpage:
{
"meta_tags": {
"referrer": "origin-when-cross-origin",
"SystemEntropyOriginTrialToken": "A5is4nwJJVnhaJpUr1URgj4vvAXSiHoK0VBbM9fawMskbDUj9WUREpa3JzGAo6xd1Cp2voQEG1h6NQ71AsMznU8AAABxeyJvcmlnaW4iOiJodHRwczovL3d3dy5iaW5nLmNvbTo0NDMiLCJmZWF0dXJlIjoiTXNVc2VyQWdlbnRMYXVuY2hOYXZUeXBlIiwiZXhwaXJ5IjoxNzUzNzQ3MjAwLCJpc1N1YmRvbWFpbiI6dHJ1ZX0=",
"ConfidenceOriginTrialToken": "Aqw360MHzRcmtEVv55zzdIWcTk2BBYHcdBAOysNJZP4qkN8M+5vUq36ITHFVst8LiX36KBZJXB8xvyBgdK2z5Q0AAAB6eyJvcmlnaW4iOiJodHRwczovL2JpbmcuY29tOjQ0MyIsImZlYXR1cmUiOiJQZXJmb3JtYW5jZU5hdmlnYXRpb25UaW1pbmdDb25maWRlbmNlIiwiZXhwaXJ5IjoxNzYwNDAwMDAwLCJpc1N1YmRvbWFpbiI6dHJ1ZX0=",
"og:description": "\u901a\u8fc7\u5fc5\u5e94\u7684\u667a\u80fd\u641c\u7d22\uff0c\u53ef\u4ee5\u66f4\u8f7b\u677e\u5730\u5feb\u901f\u67e5\u627e\u6240\u9700\u5185\u5bb9\u5e76\u83b7\u5f97\u5956\u52b1\u3002",
"og:site_name": "\u5fc5\u5e94",
"og:title": "AutoGen readme site:github.com - \u5fc5\u5e94",
"og:url": "https://cn.bing.com/search?q=AutoGen+readme+site%3Agithub.com&FORM=QBLH&rdr=1&rdrig=19A3F496D9D246FD8DEF5EEA2303513E",
"fb:app_id": "3732605936979161",
"og:image": "http://www.bing.com/sa/simg/facebook_sharing_5.png",
"og:type": "website",
"og:image:width": "600",
"og:image:height": "315"
}
}
Here is a screenshot of the page.
<image>
---------- MultiModalMessage (MultimodalWebSurfer) ----------
I clicked 'autogen/README.md at main · …'.
The web browser is open to the page [AutoGen readme site:github.com - 搜索](https://cn.bing.com/search?q=AutoGen+readme+site%3Agithub.com&FORM=QBLH&rdr=1&rdrig=19A3F496D9D246FD8DEF5EEA2303513E).
The viewport shows 36% of the webpage, and is positioned at the top of the page
The following text is visible in the viewport:
跳至内容
国内版
国际版
6
手机版网页
图片
视频
学术
词典
地图
更多
工具
约 495 个结果Github
https://github.com › ... › autogen › blob › …
翻译此结果
autogen/README.md at main · …2024年8月26日 · AutoGen is a framework for creating multi-agent AI applications that can act autonomously or work alongside humans. AutoGen requires Python 3.10 or later. The current stable version is v0.4. If you are upgrading from …
Github
https://github.com › NanGePlus
GitHub - NanGePlus/AutoGenV04Test: AutoGen …2025年1月13日 · 主要内容:AutoGen与DeepSeek R1模型集成(Ollama方式本地部署deepseek-r1:14b大模型)、AutoGen与MCP服务器集成、AutoGen与HTTP API工具集成 https://www.bilibili.com/video/BV1weKFeGEMX/
Github
https://github.com › ... › autogen › blob › main › README.md
autogen/README.md at main · liteli1987gmail/autogen · …AutoGen是一个由Microsoft开源的框架,专为构建和优化大型语言模型(LLM)工作流程而设计。 它提供了多智能体会话框架、应用程序构建工具以及推理性能优化的支持。
你可能喜欢的搜索
speechgenAddgeneAutoglymrunway genGithub
https://github.com
翻译此结果
GitHub - ag2ai/ag2: AG2 (formerly AutoGen): The Open …
Here is a screenshot of the page.
<image>
---------- MultiModalMessage (MultimodalWebSurfer) ----------
I clicked 'autogen/README.md at main · …'.
The web browser is open to the page [AutoGen readme site:github.com - 搜索](https://cn.bing.com/search?q=AutoGen+readme+site%3Agithub.com&FORM=QBLH&rdr=1&rdrig=19A3F496D9D246FD8DEF5EEA2303513E).
The viewport shows 36% of the webpage, and is positioned at the top of the page
The following text is visible in the viewport:
跳至内容
国内版
国际版
6
手机版网页
图片
视频
学术
词典
地图
更多
工具
约 495 个结果Github
https://github.com › ... › autogen › blob › …
翻译此结果
autogen/README.md at main · …2024年8月26日 · AutoGen is a framework for creating multi-agent AI applications that can act autonomously or work alongside humans. AutoGen requires Python 3.10 or later. The current stable version is v0.4. If you are upgrading from …
Github
https://github.com › NanGePlus
GitHub - NanGePlus/AutoGenV04Test: AutoGen …2025年1月13日 · 主要内容:AutoGen与DeepSeek R1模型集成(Ollama方式本地部署deepseek-r1:14b大模型)、AutoGen与MCP服务器集成、AutoGen与HTTP API工具集成 https://www.bilibili.com/video/BV1weKFeGEMX/
Github
https://github.com › ... › autogen › blob › main › README.md
autogen/README.md at main · liteli1987gmail/autogen · …AutoGen是一个由Microsoft开源的框架,专为构建和优化大型语言模型(LLM)工作流程而设计。 它提供了多智能体会话框架、应用程序构建工具以及推理性能优化的支持。
你可能喜欢的搜索
speechgenAddgeneAutoglymrunway genGithub
https://github.com
翻译此结果
GitHub - ag2ai/ag2: AG2 (formerly AutoGen): The Open …
Here is a screenshot of the page.
<image>
CodeExecutorAgent
一个提取并执行接收消息中找到的代码片段并返回输出的智能体。它通常在团队中与另一个生成要执行的代码片段的智能体一起使用。
代码执行器仅处理在markdown代码块中使用三个反引号正确格式化的代码。 例如:
```python
print("Hello World")
```
# or
```sh
echo "Hello World"
```
建议CodeExecutorAgent智能体使用Docker容器来执行代码。这确保了模型生成的代码在隔离的环境中执行
新版本已经考虑将PythonCodeExecutionTool
作为此智能体的替代方案。该工具允许在单个智能体内执行Python代码,而不是将其发送到单独的智能体执行。但是,智能体的模型必须生成正确转义的代码字符串作为工具的参数。
async def assistant_run_stream() -> None:
tool = PythonCodeExecutionTool(LocalCommandLineCodeExecutor(work_dir="coding"))
agent = AssistantAgent(
"assistant", model_client, tools=[tool], reflect_on_tool_use=True
)
await Console(
agent.run_stream(
task="写一段二分查询的经典代码,并将其保存到一个文件中"
)
)
执行完代码之后,我们可以看到在本地的coding目录下多了一个名为binary_search.py的文件,文件内容如下:
def binary_search(arr, target):
left, right = 0, len(arr) - 1
while left <= right:
mid = (left + right) // 2
if arr[mid] == target:
return mid
elif arr[mid] < target:
left = mid + 1
else:
right = mid - 1
return -1
总结
在这篇文章中,我展示AutoGen的常用内置智能体的使用方式,这会帮助我们初步理解其整体构成。接下来,我还会通过另一篇文章来详细讲解各个智能体的源码执行流程。感兴趣的朋友可以点赞收藏关注,获取实时更新。
本文转载自AI 博物院 作者:longyunfeigu
