2025顶流RAG重排器盘点:告别“信息噪音”,让AI回答更精准! 原创

发布于 2025-7-28 08:53
浏览
0收藏

当咱们在RAG(Retrieval-Augmented Generation,检索增强生成)的世界里摸爬滚打时,是不是经常遇到这样的困惑:明明已经“海量”检索了相关文档,为啥LLM(大语言模型)给出的答案还是“差强人意”?这背后,其实藏着一个容易被忽视,但又至关重要的环节——初次检索的“噪音”问题!

为什么初次检索总是“差强人意”?

2025顶流RAG重排器盘点:告别“信息噪音”,让AI回答更精准!-AI.x社区

你是不是也觉得,RAG的第一步——根据用户查询找到相关文档,听起来挺简单的?现在常用的方法,比如关键词搜索或者向量相似度匹配,确实能很快拉出一堆文档来。但问题也恰恰出在这里:它们“太会”找了,找回来一大堆,但真正有用的可能就那么几篇,甚至还混杂了不少“无关紧要”的垃圾信息。

为啥会这样呢?

  1. 嵌入模型不够“懂你”:咱们用的那些嵌入(embedding)模型,虽然能理解语义,但在面对一些特别细致或者专业的问题时,往往“力不从心”,无法精准把握住那些“微言大义”,导致检索结果不够精细。
  2. 短查询和专业术语的“坑”:向量搜索虽然好,但遇到短查询或者特别专业的术语时,它就容易“懵圈”。比如你搜个“心肌梗死最新治疗方案”,它可能给你推一堆关于心脏病的普及文章,而不是最前沿的临床研究。
  3. LLM的“记忆力”有限:大语言模型虽然强大,但它们的上下文窗口(context window)是有限的!你一股脑地把一大堆文档,哪怕是“沾点边”的文档都喂给它,反而会让它“消化不良”,注意力分散,最终影响答案的质量。这就好比你给一个专家提供了一大堆未经筛选的资料,专家反而可能被这些“噪音”干扰,无法快速抓住重点。

所以,你看,这些“噪音”检索,就像是给LLM的“大脑”里塞了一堆杂乱无章的信息,不仅稀释了它的专注度,还可能导致它“胡思乱想”,也就是我们常说的“幻觉”(hallucination)。我们需要一个“清道夫”,来帮我们把这些初步检索回来的信息好好“洗洗牌”!

救星驾到:重排器(Rerankers)闪亮登场!

2025顶流RAG重排器盘点:告别“信息噪音”,让AI回答更精准!-AI.x社区

各位看官,是时候请出我们今天的“主角”了——重排器(Rerankers)!

重排器,顾名思义,就是对搜索结果进行“二次排序”的工具。它就像一个精明的“信息侦探”,在初次检索把一大堆文档拉出来之后,重排器会再次出马,运用更高级的算法,深入分析这些文档与用户查询之间的关联度,然后把最最相关的那些文档“提溜”到最前面。

在RAG的流程里,重排器扮演的角色,就是那个“质量守门员”。 它仔细审视第一批检索结果,然后根据文档对用户查询的“匹配度”和“信息量”,进行优先级排序。我们的目标很简单粗暴:把最有价值的信息,狠狠地往上顶!

你可以把重排器想象成一个“专业校对员”,它对初始搜索的结果进行二次核查,凭借对语言更深层次的理解,找出文档和问题之间最完美的契合点。

重排器如何让RAG“脱胎换骨”?

2025顶流RAG重排器盘点:告别“信息噪音”,让AI回答更精准!-AI.x社区

重排器的加入,可不仅仅是锦上添花,它能让RAG的效果发生质的飞跃!

  1. 精准度飙升:重排器不仅仅是做关键词匹配,它会深入分析用户问题和每个文档之间的“语义”关系。这种“深度理解”能帮助它识别出最有用的信息,确保给到LLM的上下文是高度精准的。
  2. 答案更“对味儿”:当LLM接收到的是一个更小、更精炼、质量更高的文档集合时,它自然能给出更精确、更直接的答案。这就好比,你给一个厨师提供了最上乘的食材,他自然能做出更美味的佳肴。重排器通过计算一个得分,显示文档与查询的语义距离,从而实现更优的最终排序。就算没有完全匹配的关键词,它也能找到相关的宝藏信息。
  3. 告别“胡说八道”(Hallucination):前面提到LLM“幻觉”的问题,很大一部分原因就是喂给它的信息不够“纯净”。而经过重排器筛选和验证的文档,能给LLM提供一个更坚实的基础,大大降低它“一本正经地胡说八道”的概率,让最终的输出更值得信赖。

所以,标准的RAG流程是“检索”然后“生成”。而一个增强版的RAG流程,会在中间加一个“重排”的步骤:

  • 检索(Retrieve):先拉出一批初步的候选文档。
  • 重排(Rerank):用重排模型对这些文档根据查询的相关性进行重新排序。
  • 生成(Generate):只把排在最前面、最相关的文档喂给LLM,让它生成答案。

这种“两阶段”的方法,让初始检索可以“广撒网”(注重召回率),而重排器则负责从这张大网里“精挑细选”(注重准确率)。这种分工合作,能显著提升整个RAG流程的效率和效果,给LLM提供最佳的输入。

2025年,哪些重排器模型值得关注?

2025顶流RAG重排器盘点:告别“信息噪音”,让AI回答更精准!-AI.x社区

既然重排器这么给力,那市面上都有哪些好用的重排模型呢?2025年,一些顶级的重排模型已经崭露头角,各有所长:

重排器模型

类型

来源

优势

劣势

最佳应用场景

Cohere

Cross-encoder (API)

商业

高精度、多语言支持、易用性、速度快(Nimble版)

费用(API调用费)、闭源

通用RAG、企业级搜索、多语言应用、追求易用性

bge-reranker

Cross-encoder

开源

高精度、开源、可在中等配置硬件上运行

需要自行部署

通用RAG、开源偏好者、预算有限、乐于自部署

Voyage

Cross-encoder (API)

商业

顶尖的关联度/精度表现

费用(API调用费)、可能更高的延迟(顶级模型)

极高精度需求(金融、法律)、关联度关键型应用

Jina

Cross-encoder / ColBERT 变体

混合

性能均衡、成本效益高、支持长文档(Jina-ColBERT)

可能达不到最高精度

通用RAG、长文档处理、平衡成本与性能

FlashRank

轻量级 Cross-encoder

开源

速度极快、资源消耗低、易于集成

精度低于大型模型

速度关键型应用、资源受限环境

ColBERT

Multi-vector (Late Interaction)

开源

大规模高效检索、对大型数据集高效

索引计算/存储密集

极大型文档集、追求规模化效率

MixedBread (mxbai-rerank-v2)

Cross-encoder

开源

SOTA级性能(宣称)、推理速度快、多语言、长上下文、多功能

需要自行部署、相对较新

高性能RAG、多语言、长文档/代码/JSON处理、开源偏好者、LLM工具选择

接下来,我们来具体看看其中几个有代表性的模型:

1. Cohere Rerank

Cohere Rerank是Cohere公司推出的一款强大的重排模型,它基于先进的神经网络,很可能是Transformer架构的交叉编码器(Cross-encoder)。它的工作原理是同时处理查询和文档,从而精准判断它们的相关性。这是一个闭源的商业模型,通过API提供服务。

  • 核心功能:它最亮眼的特点就是支持100多种语言,这让它在国际化应用中如鱼得水。作为托管服务,它集成起来非常方便。Cohere还推出了“Rerank 3 Nimble”版本,这个版本在保持高精度的同时,显著提升了生产环境下的运行速度。
  • 性能表现:Cohere Rerank在各种嵌入模型初始检索的场景下,都能提供稳定且高精度的表现。Nimble版本能大大缩短响应时间。费用方面,当然是按API调用量来计算。
  • 优点:通过API集成简单,性能强大可靠,多语言支持出色,还有速度优化的Nimble版本。
  • 缺点:闭源的商业服务,按使用量付费,无法自行修改模型。
  • 适用场景:通用RAG应用、企业级搜索平台、客服聊天机器人,以及需要广泛语言支持但又不想管理模型基础设施的场景。

示例代码:

首先安装Cohere库:

%pip install --upgrade --quiet  cohere

然后设置Cohere和ContextualCompressionRetriever:

from langchain.retrievers.contextual_compression import ContextualCompressionRetriever
from langchain_cohere import CohereRerank
from langchain_community.llms import Cohere
from langchain.chains import RetrievalQA

llm = Cohere(temperature=0)
compressor = CohereRerank(model="rerank-english-v3.0")
compression_retriever = ContextualCompressionRetriever(
   base_compressor=compressor, base_retriever=retriever
)
chain = RetrievalQA.from_chain_type(
   llm=Cohere(temperature=0), retriever=compression_retriever
)
# 假设 retriever 已经定义好
# chain.invoke({'query': 'What did the president say about Ketanji Brown Jackson'})

输出示例:

{'query': 'What did the president say about Ketanji Brown Jackson',
'result': " The president speaks highly of Ketanji Brown Jackson, stating that she
 is one of the nation's top legal minds, and will continue the legacy of excellence
 of Justice Breyer. The president also mentions that he worked with her family and
 that she comes from a family of public school educators and police officers. Since
 her nomination, she has received support from various groups, including the
 Fraternal Order of Police and judges from both major political parties. \n\nWould
 you like me to extract another sentence from the provided text? "}

2. bge-reranker (Base/Large)

bge-reranker系列模型来自北京智源人工智能研究院(BAAI),是开源(Apache 2.0许可) 的模型。它们基于Transformer架构,很可能也是交叉编码器,专为重排任务设计。这个系列提供了不同尺寸的模型,比如Base和Large版本。

  • 核心功能:作为开源模型,它给予用户部署和修改的自由。例如,bge-reranker-v2-m3模型参数量不到6亿,可以在普通硬件(包括消费级GPU)上高效运行。
  • 性能表现:这些模型表现非常出色,尤其是大型版本,其结果通常接近顶级的商业模型。它们在平均倒数排名(MRR)分数上表现强劲。主要成本是自托管所需的计算资源。
  • 优点:无需许可费(开源),精度高,自托管灵活,在中等硬件上也能表现良好。
  • 缺点:需要用户自行管理部署、基础设施和更新。性能取决于托管硬件。
  • 适用场景:通用RAG任务、研究项目、偏好开源工具的团队、预算敏感型应用,以及对自托管技术栈比较熟悉的开发者。

示例代码:

from langchain.retrievers import ContextualCompressionRetriever
from langchain.retrievers.document_compressors import CrossEncoderReranker
from langchain_community.cross_encoders import HuggingFaceCrossEncoder

# 假设 retriever 已经定义好
model = HuggingFaceCrossEncoder(model_name="BAAI/bge-reranker-base")
compressor = CrossEncoderReranker(model=model, top_n=3)
compression_retriever = ContextualCompressionRetriever(
   base_compressor=compressor, base_retriever=retriever
)

# compressed_docs = compression_retriever.invoke("What is the plan for the economy?")
# pretty_print_docs(compressed_docs)

输出示例:

Document 1:
More infrastructure and innovation in America.
More goods moving faster and cheaper in America.
More jobs where you can earn a good living in America.
And instead of relying on foreign supply chains, let’s make it in America.
Economists call it “increasing the productive capacity of our economy.”
I call it building a better America.
My plan to fight inflation will lower your costs and lower the deficit.

----------------------------------------------------------------------------------------------------

Document 2:

Second – cut energy costs for families an average of $500 a year by combatting
climate change.

Let’s provide investments and tax credits to weatherize your homes and businesses to
be energy efficient and you get a tax credit; double America’s clean energy
production in solar, wind, and so much more;  lower the price of electric vehicles,
saving you another $80 a month because you’ll never have to pay at the gas pump
again.

----------------------------------------------------------------------------------------------------

Document 3:

Look at cars.
Last year, there weren’t enough semiconductors to make all the cars that people
wanted to buy.
And guess what, prices of automobiles went up.
So—we have a choice.
One way to fight inflation is to drive down wages and make Americans poorer.
I have a better plan to fight inflation.
Lower your costs, not your wages.
Make more cars and semiconductors in America.
More infrastructure and innovation in America.
More goods moving faster and cheaper in America.

3. Voyage Rerank

Voyage AI提供的专有神经网络模型(voyage-rerank-2, voyage-rerank-2-lite)通过API访问。它们很可能是为最大化关联性评分而精心调优的高级交叉编码器

  • 核心功能:它们最主要的区别在于在基准测试中达到了顶级的关联度分数。Voyage提供了一个简单的Python客户端库,方便集成。lite版本在性能和速度/成本之间取得了平衡。
  • 性能表现:voyage-rerank-2在纯关联度精度方面通常领先于基准测试。lite模型与其他的强力竞争者表现相当。高精度的rerank-2模型可能会比一些竞争对手有略高的延迟。费用与API使用量挂钩。
  • 优点:状态最佳的关联度,可能是目前最准确的选择。通过Python客户端易于使用。
  • 缺点:专有的API服务,有相关成本。最高精度的模型可能比其他模型稍慢。
  • 适用场景:最适合那些对关联度最大化要求极高的应用,比如金融分析、法律文档审查,或者其他精度比微小速度差异更重要的关键问答场景。

示例代码:

首先安装voyage库:

%pip install --upgrade --quiet  voyageai
%pip install --upgrade --quiet  langchain-voyageai

然后设置相关组件:

import os
from langchain_community.document_loaders import TextLoader
from langchain_community.vectorstores import FAISS
from langchain.retrievers import ContextualCompressionRetriever
from langchain_openai import OpenAI
from langchain_voyageai import VoyageAIRerank
from langchain_text_splitters import RecursiveCharacterTextSplitter
from langchain_voyageai import VoyageAIEmbeddings

# 假设 State of the Union 文本文件在正确路径
# documents = TextLoader("../../how_to/state_of_the_union.txt").load()
# text_splitter = RecursiveCharacterTextSplitter(chunk_size=500, chunk_overlap=100)
# texts = text_splitter.split_documents(documents)
# retriever = FAISS.from_documents(
#    texts, VoyageAIEmbeddings(model="voyage-law-2")
# ).as_retriever(search_kwargs={"k": 20})

# llm = OpenAI(temperature=0)
# compressor = VoyageAIRerank(
# model="rerank-lite-1", voyageai_api_key=os.environ["VOYAGE_API_KEY"], top_k=3
# )
# compression_retriever = ContextualCompressionRetriever(
# base_compressor=compressor, base_retriever=retriever
# )
# compressed_docs = compression_retriever.invoke("What did the president say about Ketanji Jackson Brown")
# pretty_print_docs(compressed_docs)

输出示例:

Document 1:

One of the most serious constitutional responsibilities a President has is
nominating someone to serve on the United States Supreme Court.
And I did that 4 days ago, when I nominated Circuit Court of Appeals Judge Ketanji
Brown Jackson. One of our nation’s top legal minds, who will continue Justice
Breyer’s legacy of excellence.

----------------------------------------------------------------------------------------------------

Document 2:

So let’s not abandon our streets. Or choose between safety and equal justice.
Let’s come together to protect our communities, restore trust, and hold law
enforcement accountable.
That’s why the Justice Department required body cameras, banned chokeholds, and
restricted no-knock warrants for its officers.

----------------------------------------------------------------------------------------------------

Document 3:

I spoke with their families and told them that we are forever in debt for their
sacrifice, and we will carry on their mission to restore the trust and safety every
community deserves.

I’ve worked on these issues a long time.

I know what works: Investing in crime prevention and community police officers
who’ll walk the beat, who’ll know the neighborhood, and who can restore trust and
safety.

So let’s not abandon our streets. Or choose between safety and equal justice.

4. Jina Reranker

Jina提供包括Jina Reranker v2和Jina-ColBERT在内的重排解决方案。Jina Reranker v2很可能是一种交叉编码器模型。Jina-ColBERT则使用Jina的基础模型实现了ColBERT架构(后面会解释)。

  • 核心功能:Jina提供了性价比高且性能良好的选择。一个突出特点是Jina-ColBERT能够处理超长文档,支持最长8000个Token的上下文长度。这大大减少了对长文本进行激进分块的需求。Jina的生态系统中也包含开源组件。
  • 性能表现:Jina Reranker v2在速度、成本和关联度之间取得了良好的平衡。Jina-ColBERT在处理长源文档时表现出色。成本通常具有竞争力。
  • 优点:性能均衡,成本效益高,通过Jina-ColBERT能出色处理长文档,灵活运用现有开源部分。
  • 缺点:标准的Jina重排器可能无法达到Voyage等专业模型的绝对最高精度。
  • 适用场景:通用RAG系统、处理长文档(技术手册、研究论文、书籍)的应用,以及需要在成本和性能之间取得良好平衡的项目。

示例代码:

from langchain_community.document_loaders import TextLoader
from langchain_community.embeddings import JinaEmbeddings
from langchain_community.vectorstores import FAISS
from langchain_text_splitters import RecursiveCharacterTextSplitter

# 假设 State of the Union 文本文件在正确路径
# documents = TextLoader(
#    "../../how_to/state_of_the_union.txt",
# ).load()
# text_splitter = RecursiveCharacterTextSplitter(chunk_size=500, chunk_overlap=100)
# texts = text_splitter.split_documents(documents)

# embedding = JinaEmbeddings(model_name="jina-embeddings-v2-base-en")
# retriever = FAISS.from_documents(texts, embedding).as_retriever(search_kwargs={"k": 20})

# query = "What did the president say about Ketanji Brown Jackson"
# docs = retriever.get_relevant_documents(query)

# Doing Reranking with Jina
from langchain.retrievers import ContextualCompressionRetriever
from langchain_community.document_compressors import JinaRerank

# compressor = JinaRerank()
# compression_retriever = ContextualCompressionRetriever(
#    base_compressor=compressor, base_retriever=retriever
# )
# compressed_docs = compression_retriever.get_relevant_documents(
#    "What did the president say about Ketanji Jackson Brown"
# )
# pretty_print_docs(compressed_docs)

输出示例:

Document 1:

So let’s not abandon our streets. Or choose between safety and equal justice.
Let’s come together to protect our communities, restore trust, and hold law
enforcement accountable.
That’s why the Justice Department required body cameras, banned chokeholds, and
restricted no-knock warrants for its officers.

----------------------------------------------------------------------------------------------------

Document 2:

I spoke with their families and told them that we are forever in debt for their
sacrifice, and we will carry on their mission to restore the trust and safety every
community deserves.
I’ve worked on these issues a long time.
I know what works: Investing in crime prevention and community police officers
who’ll walk the beat, who’ll know the neighborhood, and who can restore trust and
safety.
So let’s not abandon our streets. Or choose between safety and equal justice.

5. ColBERT

ColBERT (Contextualized Late Interaction over BERT) 是一种多向量模型。它不像传统的模型那样用一个向量表示整个文档,而是为文档中的每个token(或短语)创建多个上下文相关的向量。它采用一种“晚期交互”(late interaction)机制,即在编码后,查询向量才与多个文档向量进行比较。这使得文档向量可以预先计算和索引。

  • 核心功能:其架构允许在文档被索引后,从大型集合中进行非常高效的检索。多向量方法能够实现查询词和文档内容之间细粒度的比较。这是一种开源的方法。
  • 性能表现:ColBERT在检索效率和有效性之间取得了强大的平衡,尤其是在大规模应用中。在初始索引步骤完成后,检索延迟很低。主要成本是索引和自托管所需的计算资源。
  • 优点:对大型文档集高效,可扩展检索,开源灵活。
  • 缺点:初始索引过程可能计算密集且需要大量存储。
  • 适用场景:大规模RAG应用,需要从数百万或数十亿文档中快速检索的系统,以及可以接受预计算时间的场景。

示例代码:

安装Ragtouille库以使用ColBERT重排器:

pip install -U ragatouille

现在设置ColBERT重排器:

from ragatouille import RAGPretrainedModel
from langchain.retrievers import ContextualCompressionRetriever

# 假设 retriever 已经定义好
# RAG = RAGPretrainedModel.from_pretrained("colbert-ir/colbertv2.0")
# compression_retriever = ContextualCompressionRetriever(
#     base_compressor=RAG.as_langchain_document_compressor(), base_retriever=retriever
# )
# compressed_docs = compression_retriever.invoke(
#     "What animation studio did Miyazaki found"
# )
# print(compressed_docs[0])

输出示例:

Document(page_cnotallow='In June 1985, Miyazaki, Takahata, Tokuma and Suzuki founded
the animation production company Studio Ghibli, with funding from Tokuma Shoten.
Studio Ghibli\'s first film, Laputa: Castle in the Sky (1986), employed the same
production crew of Nausicaä. Miyazaki\'s designs for the film\'s setting were
inspired by Greek architecture and "European urbanistic templates". Some of the
architecture in the film was also inspired by a Welsh mining town; Miyazaki
witnessed the mining strike upon his first', metadata={'relevance_score':
26.5194149017334})

6. FlashRank

FlashRank被设计为一个非常轻量级且快速的重排库,通常利用更小、更优化的Transformer模型(通常是大型模型的精简或裁剪版本)。它旨在以最小的计算开销,实现比简单相似度搜索显著的关联度提升。它像一个交叉编码器,但使用了加速处理的技术。它通常作为开源Python库提供。

  • 核心功能:其主要特点是速度和效率。它被设计为易于集成且资源消耗低(CPU或中等GPU使用)。通常只需少量代码即可实现。
  • 性能表现:虽然不能达到Cohere或Voyage等最大型交叉编码器的最高精度,但FlashRank旨在实现比无重排或基本双编码器重排更显著的提升。其速度使其适用于实时或高吞吐量场景。成本极低(自托管所需的计算)。
  • 优点:推理速度极快,计算要求低,易于集成,开源。
  • 缺点:精度可能低于更大、更复杂的重排模型。模型选择可能比更广泛的框架更受限。
  • 适用场景:需要在资源受限硬件(如CPU或边缘设备)上进行快速重排的应用,延迟关键的高容量搜索系统,以及寻求简单“聊胜于无”的重排步骤且复杂度最小的项目。

示例代码:

from langchain.retrievers import ContextualCompressionRetriever
from langchain.retrievers.document_compressors import FlashrankRerank
from langchain_openai import ChatOpenAI

# 假设 retriever 已经定义好
# llm = ChatOpenAI(temperature=0)
# compressor = FlashrankRerank()
# compression_retriever = ContextualCompressionRetriever(
#    base_compressor=compressor, base_retriever=retriever
# )
# compressed_docs = compression_retriever.invoke(
#    "What did the president say about Ketanji Jackson Brown"
# )
# print([doc.metadata["id"] for doc in compressed_docs])
# pretty_print_docs(compressed_docs)

这个代码片段利用ContextualCompressionRetriever中的FlashrankRerank来提高检索文档的关联度。它专门根据文档与查询“What did the president say about Ketanji Jackson Brown”的相关性,重新排序由基础检索器(由retriever表示)获得的文档。最后,它打印文档ID和经过压缩、重排的文档。

输出示例:

[0, 5, 3]

Document 1:

One of the most serious constitutional responsibilities a President has is
nominating someone to serve on the United States Supreme Court.
And I did that 4 days ago, when I nominated Circuit Court of Appeals Judge Ketanji
Brown Jackson. One of our nation’s top legal minds, who will continue Justice
Breyer’s legacy of excellence.
----------------------------------------------------------------------------------------------------

Document 2:

He met the Ukrainian people.
From President Zelenskyy to every Ukrainian, their fearlessness, their courage,
their determination, inspires the world.
Groups of citizens blocking tanks with their bodies. Everyone from students to
retirees teachers turned soldiers defending their homeland.
In this struggle as President Zelenskyy said in his speech to the European
Parliament “Light will win over darkness.” The Ukrainian Ambassador to the United
States is here tonight.
----------------------------------------------------------------------------------------------------

Document 3:

And tonight, I’m announcing that the Justice Department will name a chief prosecutor
for pandemic fraud.
By the end of this year, the deficit will be down to less than half what it was
before I took office.
The only president ever to cut the deficit by more than one trillion dollars in a
single year.
Lowering your costs also means demanding more competition.
I’m a capitalist, but capitalism without competition isn’t capitalism
It’s exploitation—and it drives up prices.

输出显示它根据关联性重新排列了检索到的块。

7. MixedBread

Mixedbread AI提供,这个家族包括mxbai-rerank-base-v2(0.5B参数)和mxbai-rerank-large-v2(1.5B参数)。它们是开源(Apache 2.0许可)的交叉编码器,基于Qwen-2.5架构。一个关键的区别在于它们的训练过程,在初始训练的基础上,融入了三阶段强化学习(RL)方法(GRPO、对比学习、偏好学习)。

  • 核心功能:声称在基准测试(如BEIR)中实现最先进的性能。支持100多种语言。处理长上下文高达8k个Token(并兼容32k)。设计用于与各种数据类型良好配合,包括文本、代码、JSON,以及用于LLM工具选择。可通过Hugging Face和Python库获取。
  • 性能表现:Mixedbread发布的基准测试显示,这些模型在BEIR上超越了其他顶级的开源和闭源竞争对手(如Cohere和Voyage),大型模型达到57.49,基础模型达到55.57。它们还在延迟测试中显示出显著的速度优势,1.5B参数的模型比其他大型开源重排器快得多。成本是自托管所需的计算资源。
  • 优点:高基准性能(宣称SOTA),开源许可,相对于精度而言推理速度快,广泛的语言支持,超长上下文窗口,适用于多种数据类型(代码、JSON)。
  • 缺点:需要自托管和基础设施管理。由于是相对较新的模型,长期性能和社区验证仍在进行中。
  • 适用场景:需要顶级性能的通用RAG,多语言应用,处理代码、JSON或长文档的系统,LLM工具/函数调用选择,以及偏好高性能开源模型的团队。

示例代码:

!pip install mxbai_rerank

from mxbai_rerank import MxbaiRerankV2

# Load the model, here we use our base sized model
model = MxbaiRerankV2("mixedbread-ai/mxbai-rerank-base-v2")

# Example query and documents
query = "Who wrote To Kill a Mockingbird?"

documents = ["To Kill a Mockingbird is a novel by Harper Lee published in 1960. It was immediately successful, winning the Pulitzer Prize, and has become a classic of modern American literature.","The novel Moby-Dick was written by Herman Melville and first published in 1851. It is considered a masterpiece of American literature and deals with complex themes of obsession, revenge, and the conflict between good and evil.","Harper Lee, an American novelist widely known for her novel To Kill a Mockingbird, was born in 1926 in Monroeville, Alabama. She received the Pulitzer Prize for Fiction in 1961.","Jane Austen was an English novelist known primarily for her six major novels, which interpret, critique and comment upon the British landed gentry at the end of the 18th century.","The Harry Potter series, which consists of seven fantasy novels written by British author J.K. Rowling, is among the most popular and critically acclaimed books of the modern era.",

 "The Great Gatsby, a novel written by American author F. Scott Fitzgerald, was published in 1925. The story is set in the Jazz Age and follows the life of millionaire Jay Gatsby and his pursuit of Daisy Buchanan."
]

# Calculate the scores
results = model.rank(query, documents)
print(results)

输出示例:

[RankResult(index=0, score=9.847987174987793, document='To Kill a Mockingbird is a
novel by Harper Lee published in 1960. It was immediately successful, winning the
Pulitzer Prize, and has become a classic of modern American literature.'),

RankResult(index=2, score=8.258672714233398, document='Harper Lee, an American
novelist widely known for her novel To Kill a Mockingbird, was born in 1926 in
Monroeville, Alabama. She received the Pulitzer Prize for Fiction in 1961.'),

RankResult(index=3, score=3.579845428466797, document='Jane Austen was an English
novelist known primarily for her six major novels, which interpret, critique and
comment upon the British landed gentry at the end of the 18th century.'),

RankResult(index=4, score=2.716982841491699, document='The Harry Potter series,
which consists of seven fantasy novels written by British author J.K. Rowling, is
among the most popular and critically acclaimed books of the modern era.'),

RankResult(index=1, score=2.233165740966797, document='The novel Moby-Dick was
written by Herman Melville and first published in 1851. It is considered a
masterpiece of American literature and deals with complex themes of obsession,
revenge, and the conflict between good and evil.'),

RankResult(index=5, score=1.8150043487548828, document='The Great Gatsby, a novel
written by American author F. Scott Fitzgerald, was published in 1925. The story is
set in the Jazz Age and follows the life of millionaire Jay Gatsby and his pursuit
of Daisy Buchanan.')]


如何判断你的重排器是否“给力”?

用了重排器,总得知道它到底有没有效果吧?评估重排器的工作效果非常重要,通常我们会用到这些“硬核”指标:

  1. Accuracy@k:在前个结果中,相关文档出现的频率。
  2. Precision@k:前个结果中,相关文档的比例。
  3. Recall@k:在前个结果中,找到所有相关文档的比例。
  4. Normalized Discounted Cumulative Gain (NDCG):衡量排序质量,它不仅考虑了相关性,还考虑了文档的位置。排名越靠前的相关文档,得分贡献越大。它被标准化(0到1之间),便于比较。
  5. Mean Reciprocal Rank (MRR):关注找到第一个相关文档的排名。它是多个查询中的平均值。当你需要快速找到一个好的结果时,这个指标非常有用。
  6. F1-score:精确率和召回率的调和平均值,提供了一个平衡的视角。

如何为你的RAG选择合适的重排器?

选择最适合你的重排器,就像选对象一样,得综合考虑好几个方面:

  • 关联度需求:你的应用对结果的准确性要求有多高?是“差不多就行”,还是“毫厘不差”?
  • 延迟:重排器需要多快返回结果?是实时性要求极高(比如在线客服),还是可以接受一些延迟(比如离线数据分析)?
  • 成本:是选择免费开源的,还是愿意为商业API付费?
  • 部署复杂度:你是否有能力自行部署和维护模型,还是希望有托管服务一步到位?
  • 数据类型和长度:你处理的文档是普通文本,还是代码、JSON等复杂数据?文档的平均长度如何?

总之,重排器在RAG系统中扮演着越来越重要的角色。它就像是你的AI助手的“秘密武器”,能让你的大模型回答更精准、更可靠、更贴心。还在被初次检索的“噪音”困扰吗?赶紧把重排器安排上吧!

本文转载自​Halo咯咯​    作者:基咯咯

©著作权归作者所有,如需转载,请注明出处,否则将追究法律责任
收藏
回复
举报
回复
相关推荐