本文介绍: 本文介绍LlamaIndex和Metaphor集成实现RAG:将LlamaIndex数据代理功能与Metaphor作为一种本地LLM搜索工具结合,使知识作者能够回答任何数据上的任何问题,无论是最近还是复杂的。我们的LoadAndSearchToolSpec嵌入了任何可能返回大量数据的工具,并将其分为两个工具一个是将数据动态存储索引中的加载工具,另一个是允许在该索引上进行搜索的搜索工具。根据人们互联网上谈论事物的方式,Metaphor训练预测互联网上的链接

      最先进的大型语言模型(LLM),如ChatGPT、GPT-4、Claude 2,具有令人难以置信的推理能力可以解锁各种用例——从洞察力提取问答,再到通用工作自动化。然而,他们检索上下文相关信息能力有限检索增强生成(RAG)系统可以将LLM与静态知识源上的外部存储解决方案结合

        RAG通常需要两个核心组件

  1. 通用抽象,允许LLM以“读取”和“写入”的方式智能地对数据执行各种任务

  2. 一个适合LLM使用的好搜索引擎

      LlamaIndex数据代理抽象有助于满足第一个核心组件一个完整的数据代理一个推理循环一组工具组成。这些工具可以用于搜索/检索接口,或者更一般地是任何外部API。给定一个查询代理执行推理循环,并动态计算完成手头任务所需的工具集。

       数据代理可以访问LlamaHub上提供的一套丰富的工具,从Gmail API到SQL数据库API,再到Bing搜索形式的基本工具。我们已经证明,他们能够执行e2e任务,从发送电子邮件、安排会议到自动化定制支持洞察力提取。然而,从来没有专门为LLM使用设计的工具。

      本文介绍LlamaIndex和Metaphor集成实现RAG:将LlamaIndex数据代理功能与Metaphor作为一种本地LLM搜索工具相结合,使知识作者能够回答任何数据上的任何问题,无论是最近还是复杂的。示例可以参考https://github.com/emptycrown/llamahub/blob/main/llama_hub/tools/notebooks/metaphor.ipynb

一、Metaphor介绍

      Metaphor API旨在将你的LLM连接互联网,它允许你在互联网上进行完全神经化、高度语义化的搜索,还可以结果中获得干净的HTML内容

       根据人们互联网上谈论事物的方式,Metaphor被训练预测互联网上的链接例如,有人可能会这样发布他们读到的一篇很棒的文章

Found an amazing article I read about the history of Rome’s architecture: [LINK]

       通过训练一个模型预测人们谈论这些链接方式,最终的结果是一种完全不同互联网搜索方式——就像你要分享你想要的链接一样进行搜索。虽然一开始有点不直观,但以这种方式搜索可以返回极高质量的结果。但就LlamaIndex而言,您不必担心这一点,因为默认情况下,查询转换为Metaphor Prompt

为什么你会在Bing/Google上使用Metaphor搜索?主要有三个原因

PS:要了解更多信息,您可以阅读完整的Metaphor API博客文章https://platform.metaphor.systems/blog/buildingsearchfor-the-postchatgptworld

二、LlamaIndex和Metaphor的集成原理

LlamaHub提供了Metaphor API接口,包括如下5个工具可供Agent使用

在下一节中,让我们了解数据代理如何通过各种用例使用这些端点。

三、LlamaIndex和Metaphor集成示例

让我们看一下LlamaIndex数据Agent是如何与Metaphor一起使用的。

3.1 Metaphor工具测试

一步导入MetaphorToolSpec:

# Set up Metaphor toolfrom llama_hub.tools.metaphor.base import MetaphorToolSpecmetaphor_tool = MetaphorToolSpec(api_key='your-key',)# convert tool spec to a list of toolsmetaphor_tool_list = metaphor_tool.to_tool_list()for tool in metaphor_tool_list:print(tool.metadata.name)

输入

metaphor_tool.search('machine learning transformers', num_results=3)

输出

[{'title': 'On the potential of Transformers in Reinforcement Learning','url': 'https://lorenzopieri.com/rl_transformers/','id': 'ysJlYSgeGW3l4zyOBoSGcg'},{'title': 'Transformers: Attention in Disguise','url': 'https://www.mihaileric.com/posts/transformers-attention-in-disguise/','id': 'iEYMai5rS9k0hN5_BH0VZg'},{'title': 'Transformers in Computer Vision: Farewell Convolutions!','url': 'https://towardsdatascience.com/transformers-in-computer-vision-farewell-convolutions-f083da6ef8ab?gi=a1d0a9a2896c','id': 'kX1Z89DdjSvBrH1S1XLvwg'}]

3.2 使用Metaphor设置OpenAI Agent

我们可以创建一个可以访问上述所有工具的代理,并开始测试它:

from llama_index.agent import OpenAIAgent# We don't give the Agent our unwrapped retrieve document tools, instead passing the wrapped toolsagent = OpenAIAgent.from_tools(  metaphor_tool_list,  verbose=True,)

下面看一个直接查询例子

print(agent.chat('What are the best restaurants in toronto?"))

了解一下该例子中Metaphor工具的执行细节

=== Calling Function ===Calling function: search with args: {  "query": "best restaurants in Toronto"}[Metaphor Tool] Autoprompt string: Here's a link to the best restaurant in Toronto:Got output: [{'title': 'Via Allegro Ristorante - Toronto Fine Dining Restaurant', 'url': 'https://viaallegroristorante.com/', 'id': 'EVlexzJh-lzkVr4tb2y_qw'}, {'title': 'The Senator – Home', 'url': 'https://thesenator.com/', 'id': 'dA3HVr5P8E0Bs7nH2gH7ZQ'}, {'title': 'Home - The Rushton', 'url': 'https://therushton.com/', 'id': '6Je-igG-i-ApqISC5XXmGQ'}, {'title': 'Location', 'url': 'https://osteriagiulia.ca/', 'id': 'HjP5c54vqb3n3UNa3HevSA'}, {'title': 'StockYards | Stockyards Toronto', 'url': 'https://www.thestockyards.ca/', 'id': 'Pffz-DQlOepqVgKQDmW5Ig'}, {'title': 'Select A Restaurant', 'url': 'https://www.torontopho.com/', 'id': 'DiQ1hU1gmrIzpKnOaVvZmw'}, {'title': 'Home | Kit Kat Italian Bar & Grill', 'url': 'http://www.kitkattoronto.com/', 'id': 'kdAcLioBgnwzuHyd0rWS1w'}, {'title': 'La Fenice', 'url': 'https://www.lafenice.ca/', 'id': 'M-LHQZP6V40V81fqLFAQxQ'}, {'title': 'Le Phénix', 'url': 'https://www.lephenixto.com/', 'id': 'spCTcFr0GHlFUTzyngfRVw'}, {'title': 'ITALIAN, INSPIRED.', 'url': 'https://figotoronto.com/', 'id': 'OvBcTqEo1tCSywr4ATptCg'}]========================Here are some of the best restaurants in Toronto:1. [Via Allegro Ristorante](https://viaallegroristorante.com/)2. [The Senator](https://thesenator.com/)3. [The Rushton](https://therushton.com/)4. [Osteria Giulia](https://osteriagiulia.ca/)5. [Stockyards](https://www.thestockyards.ca/)6. [Toronto Pho](https://www.torontopho.com/)7. [Kit Kat Italian Bar & Grill](http://www.kitkattoronto.com/)8. [La Fenice](https://www.lafenice.ca/)9. [Le Phénix](https://www.lephenixto.com/)10. [Figo](https://figotoronto.com/)You can visit their websites for more information. Enjoy your dining experience in Toronto!

可以看到agent执行了”search操作,结果返回了Toronto最好的饭店列表

继续追问进行多轮对话

print(agent.chat('tell me more about Osteria Giulia'))
=== Calling Function ===Calling function: retrieve_documents with args: {"ids": ["HjP5c54vqb3n3UNa3HevSA"]}Got output: […]========================Osteria Giulia is a restaurant located at 134 Avenue Road in Toronto, Ontario. You can contact them at 416.964.8686 or via email at info@osteriagiulia.ca (for general inquiries only, no reservation requests via email).The restaurant's operating hours are from Monday to Saturday, from 5:00pm to 11:00pm. On Sundays, the restaurant is available for private bookings.Parking is available on Avenue Road and Davenport Road.You can follow Osteria Giulia on Instagram [@osteriagiulia](https://www.instagram.com/osteriagiulia). They also have a sister restaurant called Giulietta, which you can visit at [giu.ca](https://giu.ca) or on Instagram [@giulietta972](https://www.instagram.com/giulietta972).Please note that the information provided is based on the available document and may be subject to change. It is recommended to visit their official website or contact them directly for the most up-to-date information.

3.3 避免上下文窗口问题(高级

       使用retrieve的一个问题是内容可能很长。如果内容被直接地附加到会话历史并转储到LLM上下文窗口中,那么我们可能会遇到上下文窗口限制

    LlamaIndex提供了工具抽象来帮助处理这一问题。我们的LoadAndSearchToolSpec嵌入了任何可能返回大量数据的工具,并将其分为两个工具:一个是将数据动态存储索引中的加载工具,另一个是允许在该索引上进行搜索的搜索工具。

       在Metaphor方面,我们定义search_and_recovere_documents端点来结合search和retrieve。这允许代理进行单个查询以检索大量文档,当这些文档与LoadAndSearchToolSpec结合使用时,这些文档将直接存储索引中。如果代理分别调用search和retrieve,那么将搜索结果写入会话历史记录然后再次将其传递提示中,以调用retrieve覆盖所有文档ID,将花费更长的时间,并消耗更多的token

       创建LoadAndSearchToolSpec:

from llama_index.tools.tool_spec.load_and_search.base import LoadAndSearchToolSpec# The search_and_retrieve_documents tool is the third in the tool list, as seen abovewrapped_retrieve = LoadAndSearchToolSpec.from_defaults(  metaphor_tool_list[2],)

         下面展示一个完整例子

# Just pass the wrapped tools and the get_date utilityagent = OpenAIAgent.from_tools(  [*wrapped_retrieve.to_tool_list(), metaphor_tool_list[4]],  verbose=True,)print(agent.chat('Can you summarize everything published in the last month regarding news on superconductors'))

        下面看一个agent调用多个工具的详细过程

=== Calling Function ===Calling function: current_date with args: {}Got output: 2023-08-20=========================== Calling Function ===Calling function: search_and_retrieve_documents with args: {  "query": "superconductors",  "start_published_date": "2023-07-20",  "end_published_date": "2023-08-20"}[Metaphor Tool] Autoprompt: "Here is an interesting article about superconductors:Got output: Content loaded! You can now search the information using read_search_and_retrieve_documents=========================== Calling Function ===Calling function: read_search_and_retrieve_documents with args: {  "query": "superconductors"}Got output: Superconductors are materials that can perfectly conduct electricity. They are used in a variety of applications, such as particle accelerators, nuclear fusion devices, MRI machines, and maglev trains. However, so far, no superconductor has been proven to work at ambient pressures and temperatures. On July 22, scientists in South Korea published research claiming to have solved this problem with a material called LK-99, which has an electrical resistivity that drops to near zero at 30 degrees Celsius (86 degrees Fahrenheit).========================In the last month, there have been developments in the field of superconductors. Scientists in South Korea have published research on a material called LK-99, which has the ability to conduct electricity with near-zero resistance at a temperature of 30 degrees Celsius (86 degrees Fahrenheit). This breakthrough could potentially lead to the development of superconductors that work at ambient pressures and temperatures, opening up new possibilities for various applications such as particle accelerators, nuclear fusion devices, MRI machines, and maglev trains.

      agent使用get_date工具来确定当前月份然后调用search时,根据发布日期应用Metaphor中的过滤器。使用retrieve_documents加载文档,并使用read_retrieve_documents读取这些文档。

参考文献

[1] https://blog.llamaindex.ai/llamaindex-metaphor-towards-automating-knowledgework-with-llms-5520a32efa2f

原文地址:https://blog.csdn.net/wshzd/article/details/134790851

本文来自互联网用户投稿,该文观点仅代表作者本人,不代表本站立场。本站仅提供信息存储空间服务,不拥有所有权,不承担相关法律责任

如若转载,请注明出处:http://www.7code.cn/show_42910.html

如若内容造成侵权/违法违规/事实不符,请联系代码007邮箱:suwngjj01@126.com进行投诉反馈,一经查实,立即删除

发表回复

您的邮箱地址不会被公开。 必填项已用 * 标注