
多阶段工作流加持,高级RAG代理如何实现智能对话与精准回答 原创
在当今数字化时代,智能问答系统已经渗透到我们生活的方方面面,从在线客服到智能语音助手,它们都在努力为我们提供便捷的信息获取方式。然而,传统的问答系统在面对复杂的对话场景时,常常显得力不从心。今天,就让我们一起探索如何构建一个能够应对复杂对话的高级RAG(Retrieval-Augmented Generation)代理,让它具备改写用户问题、分类、验证文档相关性等强大功能,为用户提供更加智能、精准的回答。
一、为什么需要高级RAG代理
传统的RAG系统在处理简单问题时表现尚可,但一旦遇到复杂的对话场景,比如用户连续提出多个相关问题,或者问题涉及多个领域时,就容易出现回答不准确甚至无法回答的情况。这是因为传统系统缺乏对话记忆和智能查询处理能力。例如,当用户在询问某个产品的功能后,紧接着问“那它的价格是多少呢?”传统系统可能就无法准确理解这个问题的上下文,从而给出不准确的回答。
为了解决这些问题,我们需要构建一个更加智能的高级RAG代理。它能够通过以下几个关键功能来提升对话的质量和准确性:
- 智能问题改写:将用户后续提出的问题转化为独立的查询,使其能够更好地被系统理解和处理。
- 智能主题检测:确保查询的问题都在我们的知识领域内,避免回答与主题无关的内容。
- 文档质量评估:在生成回答之前,验证检索到的内容是否准确、相关,确保回答的质量。
- 自适应查询增强:当初始查询未能成功时,能够迭代地改进搜索策略,提高检索的成功率。
- 持久对话记忆:在多次交流中保持上下文的连贯性,让对话更加自然流畅。
接下来,让我们通过一个实际的场景——构建一个技术支援知识库,来逐步实现这个高级RAG代理系统。
二、系统架构设计
我们的高级RAG代理采用了复杂的多阶段工作流程,主要包括以下几个核心组件:
- 查询增强器(Query Enhancer):利用对话历史改写问题,使其更适合向量搜索。
- 主题验证器(Topic Validator):判断查询是否与我们的知识领域相关。
- 内容检索器(Content Retriever):从知识库中检索与问题相关的文档。
- 相关性评估器(Relevance Assessor):评估检索到的文档的质量和相关性。
- 回答生成器(Response Generator):根据对话历史和相关文档生成上下文相关的回答。
- 查询优化器(Query Optimizer):当需要时优化搜索查询,提高检索效率。
这种架构使得系统能够处理复杂的对话,同时保持回答的质量和相关性,为用户提供更加准确、有用的信息。
三、环境搭建与知识库构建
(一)环境搭建
为了快速搭建我们的开发环境,我们使用了uv
这个快速的Python包管理器。以下是具体的搭建步骤:
1.创建并激活虚拟环境:
uv venv rag-env
source rag-env/bin/activate
这一步创建了一个名为rag-env
的虚拟环境,并将其激活,为后续的开发提供了一个独立的环境。
2.安装所需包:
uv pip install \
langchain \
langgraph \
langchain-google-genai \
langchain-community \
python-dotenv \
jupyterlab \
ipykernel
这里安装了构建RAG代理所需的核心依赖包,包括langchain
、langgraph
等,为系统的运行提供了必要的支持。
3.将虚拟环境注册为Jupyter内核:
python -m ipykernel install --user --name=rag-env --display-name "RAG Agent (uv)"
通过这一步,我们可以在Jupyter Notebook或JupyterLab中选择RAG Agent (uv)
作为内核,方便后续的开发和调试。
4.添加LLM API密钥: 在项目根目录下创建一个.env
文件,并添加你的Gemini API密钥:
GOOGLE_API_KEY=your_google_gemini_api_key_here
这样,我们就可以在系统中使用Gemini的高级推理能力来处理复杂的查询了。
5.加载依赖:
from dotenv import load_dotenv
load_dotenv()
# Core LangChain components
from langchain.schema import Document
from langchain_google_genai import ChatGoogleGenerativeAI
from langchain_huggingface import HuggingFaceEmbeddings
from langchain_community.vectorstores import Chroma
from langchain_core.prompts import ChatPromptTemplate
# Graph and state management
from typing import TypedDict, List
from langchain_core.messages import BaseMessage, HumanMessage, AIMessage, SystemMessage
from pydantic import BaseModel, Field
from langgraph.graph import StateGraph, END
from langgraph.checkpoint.memory import MemorySaver
这里导入了构建系统所需的各种模块和类,为后续的功能实现提供了基础。
(二)知识库构建
为了更好地展示系统的功能,我们构建了一个名为“TechFlow Solutions”的技术支援知识库。以下是构建知识库的代码:
# 初始化嵌入模型
embedding_model = HuggingFaceEmbeddings(model_name="all-MiniLM-L6-v2")
# 创建全面的技术支援知识库
knowledge_documents = [
Document(
page_cnotallow="TechFlow Solutions offers three main service tiers: Basic Support ($29/month) includes email support and basic troubleshooting, Professional Support ($79/month) includes priority phone support and advanced diagnostics, Enterprise Support ($199/month) includes 24/7 dedicated support and custom integrations.",
metadata={"source": "pricing_guide.pdf", "category": "pricing"},
),
Document(
page_cnotallow="Our cloud infrastructure services include: Virtual Private Servers starting at $15/month, Managed Databases from $45/month, Content Delivery Network at $0.08/GB, and Load Balancing services at $25/month. All services include 99.9% uptime guarantee.",
metadata={"source": "infrastructure_pricing.pdf", "category": "services"},
),
Document(
page_cnotallow="TechFlow Solutions was founded in 2018 by Maria Rodriguez, a former Google engineer with 15 years of experience in cloud architecture. The company has grown from 3 employees to over 150 team members across 12 countries, specializing in enterprise cloud solutions.",
metadata={"source": "company_history.pdf", "category": "company"},
),
Document(
page_cnotallow="Our technical support team operates 24/7 for Enterprise customers, business hours (9 AM - 6 PM EST) for Professional customers, and email-only support for Basic customers. Average response times: Enterprise (15 minutes), Professional (2 hours), Basic (24 hours).",
metadata={"source": "support_procedures.pdf", "category": "support"},
)
]
# 构建向量数据库
vector_store = Chroma.from_documents(knowledge_documents, embedding_model)
document_retriever = vector_store.as_retriever(search_kwargs={"k": 2})
这个知识库涵盖了定价、服务、公司信息、支持流程等多个方面,能够满足用户在技术支援方面的多种查询需求。同时,通过为每个文档添加丰富的元数据,我们可以更好地组织和过滤文档,提高检索的效率和准确性。
四、核心组件实现
(一)查询增强器——智能问题改写
查询增强器的作用是根据对话历史改写用户的问题,使其成为一个独立的、适合向量搜索的查询。以下是其实现代码:
def enhance_user_query(state: ConversationState):
"""
根据对话历史改写用户问题,创建适合向量搜索的独立查询。
"""
print(f"Enhancing query: {state['current_query'].content}")
# 初始化新查询处理的状态
state["retrieved_documents"] = []
state["topic_relevance"] = ""
state["enhanced_query"] = ""
state["should_generate"] = False
state["optimization_attempts"] = 0
# 确保存在对话历史
if "conversation_history" not in state or state["conversation_history"] is None:
state["conversation_history"] = []
# 如果当前问题不在对话历史中,则将其添加进去
if state["current_query"] not in state["conversation_history"]:
state["conversation_history"].append(state["current_query"])
# 检查是否存在对话上下文
if len(state["conversation_history"]) > 1:
# 提取上下文和当前问题
previous_messages = state["conversation_history"][:-1]
current_question = state["current_query"].content
# 构建上下文感知的提示
context_messages = [
SystemMessage(
cnotallow="""You are an expert query reformulator. Transform the user's question into a standalone,
search-optimized query that incorporates relevant context from the conversation history.
Guidelines:
- Make the question self-contained and clear
- Preserve the user's intent while adding necessary context
- Optimize for vector database retrieval
- Keep the reformulated query concise but comprehensive"""
)
]
context_messages.extend(previous_messages)
context_messages.append(HumanMessage(cnotallow=f"Current question: {current_question}"))
# 生成增强后的查询
enhancement_prompt = ChatPromptTemplate.from_messages(context_messages)
llm = ChatGoogleGenerativeAI(model="gemini-2.0-flash-exp", temperature=0.1)
formatted_prompt = enhancement_prompt.format()
response = llm.invoke(formatted_prompt)
enhanced_question = response.content.strip()
print(f"Enhanced query: {enhanced_question}")
state["enhanced_query"] = enhanced_question
else:
# 对话中的第一个问题 - 直接使用原问题
state["enhanced_query"] = state["current_query"].content
print(f"First query - using original: {state['enhanced_query']}")
return state
这个组件通过结合对话历史和当前问题,生成一个更加清晰、独立的查询,提高了问题在向量数据库中的检索效果,同时保留了用户原始问题的意图。
(二)主题验证器——智能领域分类
主题验证器的作用是判断用户的问题是否在我们的知识领域内。以下是其实现代码:
class TopicRelevance(BaseModel):
"""主题分类的结构化输出"""
classification: str = Field(
descriptinotallow="问题是否与TechFlow Solutions的服务、定价、公司信息等有关?回答'RELEVANT'或'IRRELEVANT'"
)
confidence: str = Field(
descriptinotallow="置信度:'HIGH'、'MEDIUM'或'LOW'"
)
def validate_topic_relevance(state: ConversationState):
"""
判断用户问题是否在我们的知识领域内。
使用增强后的查询以提高分类准确性。
"""
print("Validating topic relevance...")
classification_prompt = SystemMessage(
cnotallow="""You are a topic classifier for TechFlow Solutions support system.
RELEVANT topics include:
- TechFlow Solutions services (cloud infrastructure, migration, DevOps)
- Pricing for any TechFlow Solutions products or services
- Company information (history, team, locations)
- Support procedures and response times
- Security and compliance features
- Technical specifications and capabilities
IRRELEVANT topics include:
- General technology questions not specific to TechFlow
- Other companies' products or services
- Personal questions unrelated to business
- Weather, news, or general knowledge queries
根据包含对话上下文的增强查询进行分类。"""
)
user_question = HumanMessage(
cnotallow=f"Enhanced query to classify: {state['enhanced_query']}"
)
# 创建分类链
classification_chain = ChatPromptTemplate.from_messages([classification_prompt, user_question])
llm = ChatGoogleGenerativeAI(model="gemini-2.0-flash-exp", temperature=0)
structured_llm = llm.with_structured_output(TopicRelevance)
classifier = classification_chain | structured_llm
result = classifier.invoke({})
state["topic_relevance"] = result.classification.strip()
print(f"Topic classification: {state['topic_relevance']} (Confidence: {result.confidence})")
return state
这个组件通过明确定义系统能够处理的问题范围,并结合增强后的查询进行分类,提高了分类的准确性。同时,它还提供了置信度评分,让我们对分类结果有更多的了解。
(三)内容检索器——智能文档检索
内容检索器的作用是从知识库中检索与问题相关的文档。以下是其实现代码:
def fetch_relevant_content(state: ConversationState):
"""
使用增强后的查询从知识库中检索文档。
"""
print("Fetching relevant documents...")
# 使用增强后的查询进行检索
retrieved_docs = document_retriever.invoke(state["enhanced_query"])
print(f"Retrieved {len(retrieved_docs)} documents")
for i, doc in enumerate(retrieved_docs):
print(f" Document {i+1}: {doc.page_content[:50]}...")
state["retrieved_documents"] = retrieved_docs
return state
这个组件通过向量数据库,根据增强后的查询快速检索出与问题相关的文档,为后续的回答生成提供了基础。
(四)相关性评估器——文档质量控制
相关性评估器的作用是评估检索到的文档的质量和相关性。以下是其实现代码:
class DocumentRelevance(BaseModel):
"""文档相关性评估的结构化输出"""
relevance: str = Field(
descriptinotallow="该文档是否与回答问题相关?回答'RELEVANT'或'IRRELEVANT'"
)
reasoning: str = Field(
descriptinotallow="简要说明文档相关或不相关的原因"
)
def assess_document_relevance(state: ConversationState):
"""
评估每个检索到的文档,确定它是否与回答用户问题相关。
"""
print("Assessing document relevance...")
assessment_prompt = SystemMessage(
cnotallow="""You are a document relevance assessor. Evaluate whether each document
contains information that can help answer the user's question.
A document is RELEVANT if it contains:
- Direct answers to the question
- Supporting information that contributes to a complete answer
- Context that helps understand the topic
A document is IRRELEVANT if it:
- Contains no information related to the question
- Discusses completely different topics
- Provides no value for answering the question
在评估时要严格但公正。"""
)
llm = ChatGoogleGenerativeAI(model="gemini-2.0-flash-exp", temperature=0)
structured_llm = llm.with_structured_output(DocumentRelevance)
relevant_documents = []
for i, doc in enumerate(state["retrieved_documents"]):
assessment_query = HumanMessage(
cnotallow=f"""Question: {state['enhanced_query']}
Document to assess:
{doc.page_content}
Is this document relevant for answering the question?"""
)
assessment_chain = ChatPromptTemplate.from_messages([assessment_prompt, assessment_query])
assessor = assessment_chain | structured_llm
result = assessor.invoke({})
print(f"Document {i+1}: {result.relevance} - {result.reasoning}")
if result.relevance.strip().upper() == "RELEVANT":
relevant_documents.append(doc)
# 更新状态,保留相关文档
state["retrieved_documents"] = relevant_documents
state["should_generate"] = len(relevant_documents) > 0
print(f"Final relevant documents: {len(relevant_documents)}")
return state
这个组件通过严格评估每个文档的相关性,过滤掉不相关的文档,确保回答的质量和准确性。同时,它还提供了相关性评估的原因,让我们对评估结果有更深入的了解。
(五)回答生成器——上下文感知的回答创建
回答生成器的作用是根据对话历史和相关文档生成上下文相关的回答。以下是其实现代码:
def generate_contextual_response(state: ConversationState):
"""
根据对话历史和相关文档生成最终的回答。
"""
print("Generating contextual response...")
if "conversation_history" not in state or state["conversation_history"] is None:
raise ValueError("对话历史是回答生成所必需的")
# 提取回答生成所需的组件
conversation_context = state["conversation_history"]
relevant_docs = state["retrieved_documents"]
enhanced_question = state["enhanced_query"]
# 创建全面的回答模板
response_template = """You are a knowledgeable TechFlow Solutions support agent. Generate a helpful,
accurate response based on the conversation history and retrieved documents.
Guidelines:
- Use information from the provided documents to answer the question
- Maintain conversation context and refer to previous exchanges when relevant
- Be conversational and helpful in tone
- If the documents don't fully answer the question, acknowledge limitations
- Provide specific details when available (prices, timeframes, etc.)
Conversation History:
{conversation_history}
Retrieved Knowledge:
{document_context}
Current Question: {current_question}
Generate a helpful response:"""
response_prompt = ChatPromptTemplate.from_template(response_template)
llm = ChatGoogleGenerativeAI(model="gemini-2.0-flash-exp", temperature=0.3)
# 创建回答生成链
response_chain = response_prompt | llm
# 生成回答
response = response_chain.invoke({
"conversation_history": conversation_context,
"document_context": relevant_docs,
"current_question": enhanced_question
})
generated_response = response.content.strip()
# 将回答添加到对话历史中
state["conversation_history"].append(AIMessage(cnotallow=generated_response))
print(f"Generated response: {generated_response[:100]}...")
return state
这个组件通过结合对话历史和相关文档,生成一个既符合上下文又准确的回答,使对话更加自然流畅。
(六)查询优化器——自适应搜索改进
查询优化器的作用是在初始检索未能成功时优化搜索查询。以下是其实现代码:
def optimize_search_query(state: ConversationState):
"""
当初始检索未能成功时,优化搜索查询。
包含循环预防机制,避免无限优化循环。
"""
print("Optimizing search query...")
current_attempts = state.get("optimization_attempts", 0)
# 防止无限优化循环
if current_attempts >= 2:
print("⚠Maximum optimization attempts reached")
return state
current_query = state["enhanced_query"]
optimization_prompt = SystemMessage(
cnotallow="""You are a search query optimizer. The current query didn't retrieve relevant documents.
Create an improved version that:
- Uses different keywords or synonyms
- Adjusts the query structure for better matching
- Maintains the original intent while improving searchability
- Considers alternative ways to express the same concept
只提供优化后的查询,无需解释。"""
)
optimization_request = HumanMessage(
cnotallow=f"Current query that needs optimization: {current_query}"
)
optimization_chain = ChatPromptTemplate.from_messages([optimization_prompt, optimization_request])
llm = ChatGoogleGenerativeAI(model="gemini-2.0-flash-exp", temperature=0.2)
formatted_prompt = optimization_chain.format()
response = llm.invoke(formatted_prompt)
optimized_query = response.content.strip()
# 更新状态
state["enhanced_query"] = optimized_query
state["optimization_attempts"] = current_attempts + 1
print(f"Optimized query (attempt {current_attempts + 1}): {optimized_query}")
return state
这个组件通过优化搜索查询,提高了检索的成功率,同时通过限制优化尝试次数,避免了无限循环的问题。
五、工作流编排与智能路由
为了实现系统的整体功能,我们需要将各个组件按照一定的顺序连接起来,并根据不同的情况做出智能的路由决策。以下是工作流的编排代码:
def route_by_topic(state: ConversationState):
"""根据主题相关性分类进行路由"""
print("Routing based on topic relevance...")
relevance = state.get("topic_relevance", "").strip().upper()
if relevance == "RELEVANT":
print(" → Proceeding to content retrieval")
return "fetch_content"
else:
print(" → Routing to off-topic handler")
return "handle_off_topic"
def route_by_document_quality(state: ConversationState):
"""根据文档相关性评估进行路由"""
print("Routing based on document quality...")
optimization_attempts = state.get("optimization_attempts", 0)
if state.get("should_generate", False):
print(" → Generating response with relevant documents")
return "generate_response"
elif optimization_attempts >= 2:
print(" → Maximum optimization attempts reached")
return "handle_no_results"
else:
print(" → Optimizing query for better results")
return "optimize_query"
# 边缘情况的辅助函数
def handle_off_topic_queries(state: ConversationState):
"""处理超出知识领域的问题"""
print("Handling off-topic query...")
if "conversation_history" not in state or state["conversation_history"] is None:
state["conversation_history"] = []
off_topic_response = """我专注于帮助解答有关TechFlow Solutions服务、定价和公司信息的问题。
您的问题似乎超出了我的专业领域。
我可以帮助您了解:
- 我们的云基础设施服务和定价
- 支持流程和响应时间
- 公司信息和团队详情
- 安全和合规特性
您是否有关于TechFlow Solutions的具体问题需要帮助?"""
state["conversation_history"].append(AIMessage(cnotallow=off_topic_response))
return state
def handle_no_relevant_results(state: ConversationState):
"""处理在优化后仍未找到相关文档的情况"""
print("No relevant results found after optimization...")
if "conversation_history" not in state or state["conversation_history"] is None:
state["conversation_history"] = []
no_results_response = """抱歉,我在当前知识库中未能找到能够回答您问题的具体信息。
这可能是因为:
- 该信息在我们的文档中不可用
- 您的问题可能需要进一步明确
- 您可能需要直接联系我们的支持团队
如需立即帮助,您可以联系我们的支持团队,邮箱为support@techflow.com,或拨打1-800-TECHFLOW。"""
state["conversation_history"].append(AIMessage(cnotallow=no_results_response))
return state
通过这些路由函数,我们可以根据问题的主题相关性和文档的质量,智能地将问题引导到不同的处理流程中,确保系统能够高效、准确地处理各种情况。
六、完整工作流组装
最后,我们将所有的组件和路由逻辑组装成一个完整的工作流。以下是组装代码:
# 初始化对话记忆
conversation_memory = MemorySaver()
# 创建工作流图
workflow = StateGraph(ConversationState)
# 添加所有处理节点
workflow.add_node("enhance_query", enhance_user_query)
workflow.add_node("validate_topic", validate_topic_relevance)
workflow.add_node("handle_off_topic", handle_off_topic_queries)
workflow.add_node("fetch_content", fetch_relevant_content)
workflow.add_node("assess_relevance", assess_document_relevance)
workflow.add_node("generate_response", generate_contextual_response)
workflow.add_node("optimize_query", optimize_search_query)
workflow.add_node("handle_no_results", handle_no_relevant_results)
# 定义工作流连接
workflow.add_edge("enhance_query", "validate_topic")
# 根据主题相关性进行条件路由
workflow.add_conditional_edges(
"validate_topic",
route_by_topic,
{
"fetch_content": "fetch_content",
"handle_off_topic": "handle_off_topic",
},
)
# 内容处理流程
workflow.add_edge("fetch_content", "assess_relevance")
# 根据文档质量进行条件路由
workflow.add_conditional_edges(
"assess_relevance",
route_by_document_quality,
{
"generate_response": "generate_response",
"optimize_query": "optimize_query",
"handle_no_results": "handle_no_results",
},
)
# 优化循环
workflow.add_edge("optimize_query", "fetch_content")
# 终止节点
workflow.add_edge("generate_response", END)
workflow.add_edge("handle_no_results", END)
workflow.add_edge("handle_off_topic", END)
# 设置入口点
workflow.set_entry_point("enhance_query")
# 编译工作流
advanced_rag_agent = workflow.compile(checkpointer=conversation_memory)
通过这个完整的工作流,我们的高级RAG代理能够高效地处理各种复杂的对话场景,为用户提供准确、有用的信息。
七、测试我们的高级RAG代理
现在,让我们通过几个测试场景来检验我们的系统。以下是测试代码和结果:
测试1:超出主题范围的问题
print("🧪 Testing Advanced RAG Agent\n")
# 测试1:超出主题范围的问题
print("=== Test 1: Off-Topic Query ===")
test_input = {"current_query": HumanMessage(cnotallow="What's the weather like today?")}
result = advanced_rag_agent.invoke(
input=test_input,
cnotallow={"configurable": {"thread_id": "test_session_1"}}
)
print(f"Response: {result['conversation_history'][-1].content}\n")
输出结果:
Testing Advanced RAG Agent
=== Test 1: Off-Topic Query ===
Enhancing query: What's the weather like today?
First query - using original: What's the weather like today?
Validating topic relevance...
Topic classification: IRRELEVANT (Confidence: HIGH)
Routing based on topic relevance...
→ Routing to off-topic handler
Handling off-topic query...
Response: 我专注于帮助解答有关TechFlow Solutions服务、定价和公司信息的问题。
您的问题似乎超出了我的专业领域。
我可以帮助您了解:
- 我们的云基础设施服务和定价
- 支持流程和响应时间
- 公司信息和团队详情
- 安全和合规特性
您是否有关于TechFlow Solutions的具体问题需要帮助?
测试2:关于定价的在主题范围内的问题
# 测试2:关于定价的在主题范围内的问题
print("=== Test 2: Service Pricing Query ===")
test_input = {"current_query": HumanMessage(cnotallow="What are your support service pricing options?")}
result = advanced_rag_agent.invoke(
input=test_input,
cnotallow={"configurable": {"thread_id": "test_session_2"}}
)
print(f"Response: {result['conversation_history'][-1].content}\n")
输出结果:
=== Test 2: Service Pricing Query ===
Enhancing query: What are your support service pricing options?
📝 First query - using original: What are your support service pricing options?
🎯 Validating topic relevance...
🏷️ Topic classification: RELEVANT (Confidence: HIGH)
🚦 Routing based on topic relevance...
→ Proceeding to content retrieval
📚 Fetching relevant documents...
📄 Retrieved 2 documents
Document 1: TechFlow Solutions offers three main service tiers...
Document 2: Our cloud infrastructure services include: Virtual...
🔍 Assessing document relevance...
📋 Document 1: RELEVANT - The document directly answers the question by listing the names, features, and prices of the support service tiers offered by TechFlow Solutions.
📋 Document 2: IRRELEVANT - The document describes pricing options for cloud infrastructure services, not support services. Therefore, it's not relevant to the question about support service pricing.
✅ Final relevant documents: 1
🚦 Routing based on document quality...
→ Generating response with relevant documents
💬 Generating contextual response...
📝 Generated response: We have three support service tiers available. Basic Support is $29 per month and includes email sup...
Response: We have three support service tiers available. Basic Support is $29 per month and includes email support and basic troubleshooting. Professional Support is $79 per month, providing priority phone support and advanced diagnostics. Finally, Enterprise Support, at $199 per month, includes 24/7 dedicated support and custom integrations.
通过这两个测试,我们可以看到我们的高级RAG代理能够准确地处理不同类型的用户问题,无论是超出主题范围的问题,还是在主题范围内的复杂问题,都能给出准确、有用的回答。
八、总结
我们成功构建了一个能够处理复杂对话的高级RAG代理。这个系统通过多个AI技术的协同工作,实现了更加智能、上下文感知、可靠的对话AI。关键的创新点包括:
- 上下文感知的问题改写:使对话更加自然流畅。
- 多层质量控制:通过分类和分级确保回答的质量。
- 迭代改进检索:提高检索的成功率。
- 强大的工作流管理:具备完善的错误处理机制。
这个架构为构建能够处理复杂、多轮对话的生产级RAG应用提供了坚实的基础,能够在保持高质量和相关性的同时,为用户提供准确、有用的信息。
本文转载自Halo咯咯 作者:基咯咯
