LangChain官网LangChain官方文档langchain Githublangchain API文档llm-universe

LangChain 提供了几个类和函数来帮助构建和使用提示:

下面一一进行介绍

一、 Prompt templates

参考文档《Prompt templates》

  LangChain提供了用于生成语言模型提示的预定义模板。这些模板包括指令、少量示例以及适用于特定任务的具体背景问题本质上有两种不同的提示模板可用 -——字符串提示模板PromptTemplate聊天提示模板ChatPromptTemplate。前者提供字符串格式简单提示,而后者生成更结构化的提示以与聊天 API 一起使用。

1.1 langchain_core.prompts

langchain_core.prompts文档中有介绍,其Class hierarchy为:

BasePromptTemplate --> PipelinePromptTemplate
                       StringPromptTemplate --> PromptTemplate
                                                FewShotPromptTemplate
                                                FewShotPromptWithTemplates
                       BaseChatPromptTemplate --> AutoGPTPrompt
                                                  ChatPromptTemplate --> AgentScratchPadChatPromptTemplate



BaseMessagePromptTemplate --> MessagesPlaceholder
                              BaseStringMessagePromptTemplate --> ChatMessagePromptTemplate
                                                                  HumanMessagePromptTemplate
                                                                  AIMessagePromptTemplate
                                                                  SystemMessagePromptTemplate

  先看一下BasePromptTemplate,它是所有prompt templates基类,用于生成特定格式的提示。该类继承RunnableSerializable类泛型类,泛型参数DictPromptValue)和ABC类(Abstract Base Class,抽象基类),包含以下参数

参数 类型 必填 描述
input_types Dict[str, Any] 可选 期望的提示模板变量类型字典。如果未提供,则假定所有变量均为字符串
input_variables List[str] 必选 期望的提示模板变量名称列表
output_parser Optional[lBaseOutputParser] = None 可选 用于解析调用格式化提示的LLM输出方法
partial_variables Mapping[str, Union[str, Callable[[], str]]] 可选 字典映射,包含部分变量名称类型或生成值的回调函数

该类还包含以下方法

  1. 异步方法:
    类中定义多个异步方法,如abatchainvokeastream等,用于异步执行任务。这些方法提供了默认实现,但可以在子类中进行覆盖实现更高效的批处理异步执行。

  2. 配置相关的方法:

  3. 输入输出相关的方法:

  4. 流式处理相关方法:

  5. 其他方法:

  6. 属性

  7. 类方法:

  总体而言,该类一个通用的 Prompt 模板类,提供了一系列用于处理配置、输入输出异步执行等功能的方法。如果需要使用该模板,可以通过继承该类实现必要的方法来定制特定的 Prompt 行为

  BaseMessagePromptTemplate继承SerializableABC(Abstract Base Class),它是消息提示模板的基类,用于创建新模型并验证输入数据

  1. 类方法 construct:创建一个新模型,通过解析验证关键字参数中的输入数据。如果输入数据无法解析有效模型,则引发ValidationError参数_fields_set(可选的字段集合)和 **values(其他数值),返回新的模型实例

  2. 类方法 copy复制模型,可选择包含、排除、更改哪些字段,返回新的模型实例

  3. 类方法 dict:生成模型的字典表示,可选择包含或排除特定字段。

  4. 抽象方法 format_messages:从关键参数格式化消息,应返回BaseMessage列表

  5. 其他类方法和属性:略

1.2 PromptTemplate

1.2.1 简介

  使用PromptTemplate可以字符串提示创建模板。默认情况下,PromptTemplate使用Pythonstr.format语法进行模板化。

from langchain.prompts import PromptTemplate

prompt_template = PromptTemplate.from_template(
    "Tell me a {adjective} joke about {content}."
)
prompt_template.format(adjective="funny", content="chickens")
'Tell me a funny joke about chickens.'

该模板支持任意数量的变量,包括无变量

from langchain.prompts import PromptTemplate

prompt_template = PromptTemplate.from_template("Tell me a joke")
prompt_template.format()
'Tell me a joke'

  PromptTemplate 默认使用 Python fstring 作为其模板格式,但目前也支持 jinja2格式。通过 template_format 参数指定 jinja2 (参考《Template formats》)。

from langchain.prompts import PromptTemplate

jinja2_template = "Tell me a {{ adjective }} joke about {{ content }}"
prompt = PromptTemplate.from_template(jinja2_template, template_format="jinja2")

prompt.format(adjective="funny", content="chickens")
# Output: Tell me a funny joke about chickens.
1.2.2 ICEL

  PromptTemplateChatPromptTemplate 实现了Runnable接口,这是LangChain表达式语言(LCEL)的基本构建块。这意味着它们支持invokeainvokestreamastreambatchabatchastream_log等调用。

  PromptTemplate 接受一个字典(prompt变量)并返回一个StringPromptValueChatPromptTemplate 接受一个字典并返回一个 ChatPromptValue,这些value对象可以转换成不同的格式,为后续使用和处理提供了便利。

  根据StringPromptValue文档可知:StringPromptValue继承自基类PromptValue表示一个字符串prompt的值,有以下方法:

prompt_val = prompt_template.invoke({"adjective": "funny", "content": "chickens"})

prompt_val     			  # 输出: StringPromptValue(text='Tell me a joke')
prompt_val.to_string()    # 输出:'Tell me a joke'
prompt_val.to_messages()  # 输出:[HumanMessage(content='Tell me a joke')]
prompt_val.copy(update={"text": "Hello! How are you?"})  	# 输出:StringPromptValue(text='Hello! How are you?')
prompt.dict()  			  # 输出:{'text': 'Tell me a funny joke about chickens.'}
prompt.json()   		  # 输出:'{"text": "Tell me a funny joke about chickens."}'
prompt.json().to_json()
{'lc': 1,
 'type': 'constructor',
 'id': ['langchain', 'prompts', 'base', 'StringPromptValue'],
 'kwargs': {'text': 'Tell me a funny joke about chickens.'}}
1.2.3 Validate template

参考《Validate template》

  PromptTemplate会通过检查输入的变量是否与模板中定义变量匹配验证模板字符串。如果存在匹配变量默认情况下会引发ValueError异常。你可以通过设置validate_template=False禁用这种验证行为,因此不再会引发错误

template = "I am learning langchain because {reason}."

prompt_template = PromptTemplate(template=template,
                                 input_variables=["reason", "foo"]) 	# ValueError due to extra variables
prompt_template = PromptTemplate(template=template,
                                 input_variables=["reason", "foo"],
                                 validate_template=False) 				# No error

1.3 ChatPromptTemplate

  下面以百度文心一言为例进行演示。要调用文心一言 API,需要获取文心一言调用秘钥。首先我们需要进入文心千帆服务平台注册登录之后选择应用接入”——“创建应用”。然后简单输入基本信息选择默认配置,创建应用即可

在这里插入图片描述

  创建完成后,点击应用的“详情”即可看到应用AppID,API Key,Secret Key然后百度智能云在线调试平台-示例代码中心快速调试接口获取AccessToken(不解之处,详见API文档)。最后项目文件夹下使用vim .env(Linux)或type nul > .env(Windows cmd)创建.env文件,并在其中写入:

QIANFAN_AK="xxx"
QIANFAN_SK="xxx"
access_token="xxx"

下面将这些变量配置到环境中,后续就可以自动使用了。

# 使用openai、智谱ChatGLM、百度文心需要分别安装openai,zhipuai,qianfan
import os
import openai,zhipuai,qianfan
from langchain.llms import ChatGLM
from langchain.chat_models import ChatOpenAI,QianfanChatEndpoint

from dotenv import load_dotenv, find_dotenv
_ = load_dotenv(find_dotenv()) # read local .env file
openai.api_key =os.environ['OPENAI_API_KEY']
zhipuai.api_key =os.environ['ZHIPUAI_API_KEY']
qianfan.qianfan_ak=os.environ['QIANFAN_AK']
qianfan.qianfan_sk=os.environ['QIANFAN_SK']
1.3.1 使用(role, content)创建

  ChatPromptTemplatechat models聊天消息列表每个聊天消息都与内容相关联,并具有一个称为role角色)的额外参数。例如,在 OpenAI 聊天补全 API 中,聊天消息可以与 AI 助手人类系统角色相关联。创建一个聊天提示模板就像这样:

from langchain.prompts import ChatPromptTemplate

chat_template = ChatPromptTemplate.from_messages(
    [
        ("system", "You are a helpful AI bot. Your name is {name}."),
        ("human", "Hello, how are you doing?"),
        ("ai", "I'm doing well, thanks!"),
        ("human", "{user_input}"),
    ]
)

messages = chat_template.format_messages(name="Bob", user_input="What is your name?")
messages
[SystemMessage(content='You are a helpful AI bot. Your name is Bob.'),
 HumanMessage(content='Hello, how are you doing?'),
 AIMessage(content="I'm doing well, thanks!"),
 HumanMessage(content='What is your name?')]
1.3.2 使用MessagePromptTemplate创建

  ChatPromptTemplate.from_messages接受多种消息表示方式比如除了使用上面提到的(type, content)的2元组表示法之外,你还可以传入MessagePromptTemplate或BaseMessage的实例这为你在构建聊天提示时提供了很大的灵活性,下面用百度千帆进行演示。

import os
import openai,qianfan

from dotenv import load_dotenv, find_dotenv
_ = load_dotenv(find_dotenv()) # read local .env file
openai.api_key =os.environ['OPENAI_API_KEY']
qianfan.qianfan_ak=os.environ['QIANFAN_AK']
qianfan.qianfan_sk=os.environ['QIANFAN_SK']
  1. 使用MessagePromptTemplate
from langchain.chat_models import QianfanChatEndpoint
from langchain.prompts import HumanMessagePromptTemplate,SystemMessagePromptTemplate

template="You are a helpful assistant that translates {input_language} to {output_language}."
system_message_prompt = SystemMessagePromptTemplate.from_template(template)
human_template="{text}"
human_message_prompt = HumanMessagePromptTemplate.from_template(human_template)
chat_prompt = ChatPromptTemplate.from_messages([system_message_prompt, human_message_prompt])

chat = QianfanChatEndpoint()
chat(chat_prompt.format_prompt(input_language="English", output_language="French", text="I love programming.").to_messages())
AIMessage(content='编程是一项非常有趣和有挑战性的工作,我很羡慕你能够享受其中的乐趣。', additional_kwargs={'id': 'as-cxezsmtfga', 'object': 'chat.completion', 'created': 1701520678, 'result': '编程是一项非常有趣和有挑战性的工作,我很羡慕你能够享受其中的乐趣。', 'is_truncated': False, 'need_clear_history': False, 'usage': {'prompt_tokens': 4, 'completion_tokens': 18, 'total_tokens': 22}})
  1. 使用BaseMessage的实例
from langchain.schema.messages import SystemMessage

chat_template = ChatPromptTemplate.from_messages(
    [
        SystemMessage(
            content=(
                "You are a helpful assistant that re-writes the user's text to "
                "sound more upbeat."
            )
        ),
        HumanMessagePromptTemplate.from_template("{text}"),
    ]
)


chat(chat_template.format_messages(text="i dont like eating tasty things."))
AIMessage(content='很抱歉听到您不喜欢吃美味的食物。您有其他喜欢的食物类型吗?或许我们可以找到一些其他您喜欢吃的食物,您试试看是否能够喜欢呢?', additional_kwargs={'id': 'as-sdcbpxad11', 'object': 'chat.completion', 'created': 1701520841, 'result': '很抱歉听到您不喜欢吃美味的食物。您有其他喜欢的食物类型吗?或许我们可以找到一些其他您喜欢吃的食物,您试试看是否能够喜欢呢?', 'is_truncated': False, 'need_clear_history': False, 'usage': {'prompt_tokens': 8, 'completion_tokens': 34, 'total_tokens': 42}})
1.3.3 自定义MessagePromptTemplate

参考《Types of MessagePromptTemplate 》

  Chat models底层实现是LLMs,但它不使用“文本输入、文本输出”API,而是使用“聊天消息”作为输入和输出的界面,即chat model基于消息(List[BaseMessage] )而不是原始文本。在langchain中,消息接口由 BaseMessage 定义,它有两个必需属性:

  • content :消息的内容,通常为字符串。
  • role :消息来源(BaseMessage)的实体类别,比如
    • HumanMessage:来自人类/用户的BaseMessage。
    • AIMessage:来自AI/助手的BaseMessage。
    • SystemMessage:来自系统的BaseMessage。
    • FunctionMessage / ToolMessage:包含函数工具调用输出的BaseMessage。
    • ChatMessage:如果上述角色都不合适,可以自定义角色。
    • 消息也可以是 str (将自动转换为 HumanMessage )和 PromptValue(PromptTemplate的值) 。

  对应的,LangChain提供了不同类型MessagePromptTemplate 。最常用的是 AIMessagePromptTemplateSystemMessagePromptTemplateHumanMessagePromptTemplate ,它们分别创建 AI 消息、系统消息和用户消息。

1.3.3.1 自定义消息角色名

  如果要创建任意角色获取聊天消息,可以使用 ChatMessagePromptTemplate,它允许用户指定角色名称

from langchain.prompts import ChatMessagePromptTemplate

prompt = "May the {subject} be with you"

chat_message_prompt = ChatMessagePromptTemplate.from_template(role="Jedi", template=prompt)
chat_message_prompt.format(subject="force")
ChatMessage(content='May the force be with you', additional_kwargs={}, role='Jedi')
1.3.3.2 自定义消息

  LangChain 还提供了 MessagesPlaceholder ,它使您可以完全控制格式化期间要呈现的消息。当您不确定消息提示模板应使用什么角色或希望在格式化期间插入消息列表时,这会很有用。

from langchain.prompts import MessagesPlaceholder

human_prompt = "Summarize our conversation so far in {word_count} words."
human_message_template = HumanMessagePromptTemplate.from_template(human_prompt)

chat_prompt = ChatPromptTemplate.from_messages([MessagesPlaceholder(variable_name="conversation"), human_message_template])
human_message = HumanMessage(content="What is the best way to learn programming?")
ai_message = AIMessage(content="""
1. Choose a programming language: Decide on a programming language that you want to learn.

2. Start with the basics: Familiarize yourself with the basic programming concepts such as variables, data types and control structures.

3. Practice, practice, practice: The best way to learn programming is through hands-on experience
""")

chat_prompt.format_prompt(conversation=[human_message, ai_message], word_count="10").to_messages()
[HumanMessage(content='What is the best way to learn programming?', additional_kwargs={}),
 AIMessage(content='1. Choose a programming language: Decide on a programming language that you want to learn. nn2. Start with the basics: Familiarize yourself with the basic programming concepts such as variables, data types and control structures.nn3. Practice, practice, practice: The best way to learn programming is through hands-on experience', additional_kwargs={}),
 HumanMessage(content='Summarize our conversation so far in 10 words.', additional_kwargs={})]
1.3.4 LCEL

  ChatPromptTemplate 也支持LCEL, ChatPromptValue和StringPromptValue的方法也基本一致:

chat_val = chat_template.invoke({"text": "i dont like eating tasty things."})
chat_val.to_messages()
[SystemMessage(content="You are a helpful assistant that re-writes the user's text to sound more upbeat."),
HumanMessage(content='i dont like eating tasty things.')]
chat_val.to_string()
 "System: You are a helpful assistant that re-writes the user's text to sound more upbeat.nHuman: i dont like eating tasty things."
1.3.5 chat_prompt输出的三种format方法

参考《Format template output》

chat_prompt.format 方法的输出可以以三种形式获取

  • 字符串形式:调用 chat_prompt.format() 或者 chat_prompt.format_prompt().to_string() 可以直接获取格式化后的字符串作为输出。
output = chat_prompt.format(input_language="English", output_language="French", text="I love programming.")
output
'System: You are a helpful assistant that translates English to French.nHuman: I love programming.'
# or alternatively
output_2 = chat_prompt.format_prompt(input_language="English", output_language="French", text="I love programming.").to_string()

assert output == output_2
chat_prompt.format_prompt(input_language="English", output_language="French", text="I love programming.").to_messages()
[SystemMessage(content='You are a helpful assistant that translates English to French.', additional_kwargs={}),
 HumanMessage(content='I love programming.', additional_kwargs={})]
chat_prompt.format_prompt(input_language="English", output_language="French", text="I love programming.")
ChatPromptValue(messages=[SystemMessage(content='You are a helpful assistant that translates English to French.', additional_kwargs={}), HumanMessage(content='I love programming.', additional_kwargs={})])

二、 自定义PromptTemplate

参考文档《Custom prompt template》

  LangChain提供了一组默认的提示模板,用于生成各种任务的提示。然而,有时默认模板可能无法满足特定需求例如希望为语言模型创建具有特定动态指令自定义模板。本节介绍使用PromptTemplate创建自定义提示。

  为了创建自定义字符串提示模板,需要满足两个要求:

  示例:创建一个将函数名称作为输入的自定义提示模板,该模板将格式化提示,以提供函数源代码。首先,需要创建一个函数,该函数将根据函数名称返回函数的源代码

import inspect

def get_source_code(function_name):
    # Get the source code of the function
    return inspect.getsource(function_name)

inspectpython内置模块,用于获取源代码等信息

  接下来,我们将创建一个自定义提示模板,该模板接受函数名称作为输入,并格式化提示以提供函数的源代码。另外,我们从 pydantic 模块导入 BaseModel 类和 validator 装饰器,用于创建数据模型验证输入。

from langchain.prompts import StringPromptTemplate
from pydantic import BaseModel,validator

# 根据给定的函数名称源代码,生成一个关于函数的英语解释
PROMPT = """
Given the function name and source code, generate an English language explanation of the function.
Function Name: {function_name}
Source Code:
{source_code}
Explanation:
"""

class FunctionExplainerPromptTemplate(StringPromptTemplate, BaseModel):
    """一个自定义的提示模板,接受函数名称作为输入,并格式化提示模板以提供函数的源代码。"""

    @validator("input_variables")		# 使用 validator 装饰器定义了一个用于验证输入变量的方法。
    def validate_input_variables(cls, v):
        """定义了验证输入变量的方法,确保只有一个名为 function_name 的输入变量。"""
        if len(v) != 1 or "function_name" not in v:
            raise ValueError("function_name 必须是唯一的输入变量。")
        return v

    def format(self, **kwargs) -> str:
        # 获取函数的源代码
        source_code = get_source_code(kwargs["function_name"])

        # 生成要发送语言模型的提示
        prompt = PROMPT.format(
            function_name=kwargs["function_name"].__name__, source_code=source_code
        )
        return prompt

    def _prompt_type(self):
        return "function-explainer"

现在我们可以使用这个提示模板了:

fn_explainer = FunctionExplainerPromptTemplate(input_variables=["function_name"])

# 根据"get_source_code"函数生成提示
prompt = fn_explainer.format(function_name=get_source_code)
prompt
Given the function name and source code, generate an English language explanation of the function.
Function Name: get_source_code
Source Code:
def get_source_code(function_name):
    # Get the source code of the function
    return inspect.getsource(function_name)

Explanation:

例如,我们可能后面需要加入一个新参数 description,用于生成prompt的函数描述。使用 **kwargs 的话Caller端只要:

prompt = fn_explainer.format(
  function_name=get_source_code,
  description="utility function"
)

而无需改动 format 内部的代码。如果是严格的参数签名,那么任何新增参数都需要修改 format,这是很大的耦合

三、使用特征存储库(Feature Store)的实时特征建模

参考《Connecting to a Feature Store》

3.1 What Is a Feature Store?

参考《What Is a Feature Store?》

3.1.1 Feature Store简介

  在实际将机器学习系统投入生产中,团队面临着多个数据方面的挑战,涉及到获取正确的原始数据、构建特征、将特征组合训练数据、在生产环境计算和提供特征,以及监控生产环境中的特征。特征存储(Feature Store)通过减少数据工程工作重复性、加速机器学习生命周期,并促进跨团队合作,为组织带来了多方面的益处。
在这里插入图片描述

  特征存储核心作用是提供了一个高效、可靠集中管理特征数据平台,集中式存储管理特征数据。Feature stores具有以下特点:

3.1.2 Feature Store的组成

  现代特征存储有五个主要组成部分:转换(Transformation)、存储(Storage)、提供(Serving)、监控(Monitoring)和特征注册(Feature Registry)。
在这里插入图片描述

  1. 转换(Transformation):负责对数据进行处理,生成特征值。支持批处理流式处理和按需处理三种方式。重用代码,避免训练服务数据偏差。

  2. 存储(Storage):提供离线存储用于历史数据,方便模型训练;提供在线存储用于低延迟特征服务。通常与数据湖或数据库集成。使用实体中心数据模型

  3. 服务(Serving):通过高性能API实时提供特征数据。确保训练和服务使用一致的特征视图,避免偏差。支持通过SDK访问用于模型训练。

  4. 监控(Monitoring):检测数据质量、一致性等问题,确保系统运行正常。可以聚合关联多个指标,方便定位根源。

  5. 注册表(Registry):中心化管理特征定义和元数据。配置和调度特征转换、存储、服务等工作。提供接口与其他系统集成。可追踪来源依赖,支持审计

3.2 Feast

参考《Connecting to a Feature Store》feast GithubFeast文档

  在生产LLM应用时,个性化用户体验非常关键。特征存储的核心概念是保持数据的新鲜度和相关性,特别适用于将大型语言模型(LLM)应用于实际生产环境中。LangChain提供了一种简单的方法来将这些数据与LLM结合使用。

  Feast是一个流行的开源特征存储框架。使用说明详见Feast文档假设你已经按照Feast的README中的说明进行了设置准备接下来演示如何使用自定义了提示模板类,将Feast提供的特征数据注入到提示文本生成逻辑中,最终输出整合了实时特征的提示内容

from feast import FeatureStore

# You may need to update the path depending on where you stored it
feast_repo_path = "../../../../../my_feature_repo/feature_repo/"
# 初始化连接到 Feast 特征存储(Feature Store)。
store = FeatureStore(repo_path=feast_repo_path)

  FeatureStore 是 Feast 提供的用于访问特征存储的客户端类,初始化时需要指定 repo_path 参数,指向预先本地搭建的 Feast 特征库存路径然后就可以通过 store实例访问远程存储中的特征数据。

  下面我们将建立一个自定义的FeastPromptTemplate这个提示模板将接收一个driver id查找他们统计数据,并将这些统计数据格式化为一个提示信息。

  请注意这个提示模板的输入仅是driver id,因为这是用户定义的唯一部分(所有其他变量都在提示模板内部查找)。

from langchain.prompts import PromptTemplate, StringPromptTemplate

template = """Given the driver's up to date stats, write them note relaying those stats to them.
If they have a conversation rate above .5, give them a compliment. Otherwise, make a silly joke about chickens at the end to make them feel better

Here are the drivers stats:
Conversation rate: {conv_rate}
Acceptance rate: {acc_rate}
Average Daily Trips: {avg_daily_trips}

Your response:"""
prompt = PromptTemplate.from_template(template)

  这段prompt意思是生成最新驾驶员统计数据(会话率、接收率和日均行程数)并提供给他们。如果转化率(conversation rate)大于0.5,就给予赞美,否则就在信息末尾讲一个关于鸡的傻笑笑话来逗他们开心make them feel better)。

class FeastPromptTemplate(StringPromptTemplate):
	# **kwargs表示它可以接收任意多个关键字参数。
    def format(self, **kwargs) -> str:
        driver_id = kwargs.pop("driver_id")
        feature_vector = store.get_online_features(
            features=[
                "driver_hourly_stats:conv_rate",
                "driver_hourly_stats:acc_rate",
                "driver_hourly_stats:avg_daily_trips",
            ],
            entity_rows=[{"driver_id": driver_id}],
        ).to_dict()
        kwargs["conv_rate"] = feature_vector["conv_rate"][0]
        kwargs["acc_rate"] = feature_vector["acc_rate"][0]
        kwargs["avg_daily_trips"] = feature_vector["avg_daily_trips"][0]
        return prompt.format(**kwargs)
prompt_template = FeastPromptTemplate(input_variables=["driver_id"])
print(prompt_template.format(driver_id=1001))
Given the driver's up to date stats, write them note relaying those stats to them.
If they have a conversation rate above .5, give them a compliment. Otherwise, make a silly joke about chickens at the end to make them feel better

Here are the drivers stats:
Conversation rate: 0.4745151400566101
Acceptance rate: 0.055561766028404236
Average Daily Trips: 936

Your response:

我们现在可以创建一个利用特征存储实现个性化的链条。

from langchain.chains import LLMChain
from langchain.chat_models import ChatOpenAI

chain = LLMChain(llm=ChatOpenAI(), prompt=prompt_template)
chain.run(1001)
 "Hi there! I wanted to update you on your current stats. Your acceptance rate is 0.055561766028404236 and your average daily trips are 936. While your conversation rate is currently 0.4745151400566101, I have no doubt that with a little extra effort, you'll be able to exceed that .5 mark! Keep up the great work! And remember, even chickens can't always cross the road, but they still give it their best shot."

四、使用少量示例创建 prompt templates

参考《Few-shot prompt templates》

  本章我们将学习如何创建使用少量示例的提示模板。少量示例的提示模板可以从一组示例或者一个示例选择器对象构建而成。后者只是多了一步从示例集中创建示例选择器example selector并进行相似搜索过滤示例的步骤

4.1 使用示例集(example set

4.1.1 创建示例集

首先创建一个示例集,每个示例都是一个字典,包含输入变量的键和对应的值。

from langchain.prompts.few_shot import FewShotPromptTemplate
from langchain.prompts.prompt import PromptTemplate

examples = [
  {
    "question": "Who lived longer, Muhammad Ali or Alan Turing?",
    "answer":
"""
Are follow up questions needed here: Yes.
Follow up: How old was Muhammad Ali when he died?
Intermediate answer: Muhammad Ali was 74 years old when he died.
Follow up: How old was Alan Turing when he died?
Intermediate answer: Alan Turing was 41 years old when he died.
So the final answer is: Muhammad Ali
"""
  },
  {
    "question": "When was the founder of craigslist born?",
    "answer":
"""
Are follow up questions needed here: Yes.
Follow up: Who was the founder of craigslist?
Intermediate answer: Craigslist was founded by Craig Newmark.
Follow up: When was Craig Newmark born?
Intermediate answer: Craig Newmark was born on December 6, 1952.
So the final answer is: December 6, 1952
"""
  },
  {
    "question": "Who was the maternal grandfather of George Washington?",
    "answer":
"""
Are follow up questions needed here: Yes.
Follow up: Who was the mother of George Washington?
Intermediate answer: The mother of George Washington was Mary Ball Washington.
Follow up: Who was the father of Mary Ball Washington?
Intermediate answer: The father of Mary Ball Washington was Joseph Ball.
So the final answer is: Joseph Ball
"""
  },
  {
    "question": "Are both the directors of Jaws and Casino Royale from the same country?",
    "answer":
"""
Are follow up questions needed here: Yes.
Follow up: Who is the director of Jaws?
Intermediate Answer: The director of Jaws is Steven Spielberg.
Follow up: Where is Steven Spielberg from?
Intermediate Answer: The United States.
Follow up: Who is the director of Casino Royale?
Intermediate Answer: The director of Casino Royale is Martin Campbell.
Follow up: Where is Martin Campbell from?
Intermediate Answer: New Zealand.
So the final answer is: No
"""
  }
]
4.1.2 创建示例格式化器

  定义一个格式化器(PromptTemplate 对象),将示例格式化成字符串。

example_prompt = PromptTemplate(input_variables=["question", "answer"], template="Question: {question}n{answer}")

print(example_prompt.format(**examples[0]))
Question: Who lived longer, Muhammad Ali or Alan Turing?

Are follow up questions needed here: Yes.
Follow up: How old was Muhammad Ali when he died?
Intermediate answer: Muhammad Ali was 74 years old when he died.
Follow up: How old was Alan Turing when he died?
Intermediate answer: Alan Turing was 41 years old when he died.
So the final answer is: Muhammad Ali
4.1.3 创建FewShotPromptTemplate

创建 FewShotPromptTemplate 对象,该对象接收示例集和示例格式化器。

prompt = FewShotPromptTemplate(
    examples=examples,
    example_prompt=example_prompt,
    suffix="Question: {input}",
    input_variables=["input"]
)

print(prompt.format(input="Who was the father of Mary Ball Washington?"))
  • suffix参数:指定了输入问题的格式。具体来说,suffix=“Question: {input}” 表示在几份学习的示例后面,会接上一个 “Question: ” 的前缀和输入变量{input}。

      Question: Who lived longer, Muhammad Ali or Alan Turing?
      
      Are follow up questions needed here: Yes.
      Follow up: How old was Muhammad Ali when he died?
      Intermediate answer: Muhammad Ali was 74 years old when he died.
      Follow up: How old was Alan Turing when he died?
      Intermediate answer: Alan Turing was 41 years old when he died.
      So the final answer is: Muhammad Ali
      
      
      Question: When was the founder of craigslist born?
      
      Are follow up questions needed here: Yes.
      Follow up: Who was the founder of craigslist?
      Intermediate answer: Craigslist was founded by Craig Newmark.
      Follow up: When was Craig Newmark born?
      Intermediate answer: Craig Newmark was born on December 6, 1952.
      So the final answer is: December 6, 1952
      
      
      Question: Who was the maternal grandfather of George Washington?
      
      Are follow up questions needed here: Yes.
      Follow up: Who was the mother of George Washington?
      Intermediate answer: The mother of George Washington was Mary Ball Washington.
      Follow up: Who was the father of Mary Ball Washington?
      Intermediate answer: The father of Mary Ball Washington was Joseph Ball.
      So the final answer is: Joseph Ball
      
      
      Question: Are both the directors of Jaws and Casino Royale from the same country?
      
      Are follow up questions needed here: Yes.
      Follow up: Who is the director of Jaws?
      Intermediate Answer: The director of Jaws is Steven Spielberg.
      Follow up: Where is Steven Spielberg from?
      Intermediate Answer: The United States.
      Follow up: Who is the director of Casino Royale?
      Intermediate Answer: The director of Casino Royale is Martin Campbell.
      Follow up: Where is Martin Campbell from?
      Intermediate Answer: New Zealand.
      So the final answer is: No
      
      
      Question: Who was the father of Mary Ball Washington?
    

4.2 使用示例选择器​(example selector

  本节我们将使用 SemanticSimilarityExampleSelector 类,这个类根据输入和少数示例之间相似性来选择少数示例。它使用一个嵌入模型来计算输入和少数示例之间相似性,并使用向量存储(vectorstore)来执行最近邻搜索

   SemanticSimilarityExampleSelector继承BaseExampleSelectorBaseModel详情如下

参数 类型 描述
example_keys Optional[List[str]] 可选参数,用于过滤示例的键列表。
input_keys Optional[List[str]] 可选参数,用于过滤输入的键列表。如果提供,搜索将基于输入变量而不是所有变量。
k int 选择的示例数量,默认为4。
vectorstore langchain_core.vectorstores.VectorStore [Required] 包含有关示例信息的VectorStore,必需。
方法 描述
init(example_keys, input_keys, k, vectorstore) 通过解析和验证关键字参数的输入数据创建新模型。如果输入数据无法解析有效模型,则引发ValidationError。
add_example(example) 将新示例添加vectorstore中。
from_examples(examples, embeddings, vectorstore_cls, k, input_keys, **vectorstore_cls_kwargs) 使用示例列表和嵌入创建k-shot示例选择器。根据查询相似动态重新排列示例。
select_examples(input_variables) 基于语义相似性选择要使用的示例。返回一个包含所选示例的列表。
4.2.1 根据示例集创建ExampleSelector ​

  首先我们将重用上一节中的示例集和格式化程序,但是,这次我们不会直接将这些示例馈送到 FewShotPromptTemplate 对象,而是先馈送到 ExampleSelector 对象进行选择。

  Example selectors需要定义的唯一方法是 select_examples,它根据输入示例,返回选择的示例列表。你可以Select by lengthSelect by MMRSelect by n-gram overlapSelect by similarity

from langchain.prompts.example_selector import SemanticSimilarityExampleSelector
from langchain.vectorstores import Chroma
from langchain.embeddings import OpenAIEmbeddings

# 创建语义相似性示例选择器
example_selector = SemanticSimilarityExampleSelector.from_examples(
    # 可供选择的示例列表。
    examples,
    # 用于生成嵌入嵌入类,用于测量语义相似性。
    OpenAIEmbeddings(),
    # 用于存储嵌入并进行相似搜索的VectorStore类。
    Chroma,
    # 要生成的示例数量。
    k=1
)

# 选择与输入最相似的示例。
question = "Who was the father of Mary Ball Washington?"
selected_examples = example_selector.select_examples({"question": question})
print(f"与输入最相似的示例:{question}")
for example in selected_examples:
    print("n")
    for k, v in example.items():
        print(f"{k}: {v}") 				# 打印example键值对,即question,answer及其对应的值
Running Chroma using direct local API.
Using DuckDB in-memory for database. Data will be transient.
Examples most similar to the input: Who was the father of Mary Ball Washington?


question: Who was the maternal grandfather of George Washington?
answer:
Are follow up questions needed here: Yes.
Follow up: Who was the mother of George Washington?
Intermediate answer: The mother of George Washington was Mary Ball Washington.
Follow up: Who was the father of Mary Ball Washington?
Intermediate answer: The father of Mary Ball Washington was Joseph Ball.
So the final answer is: Joseph Ball

Chroma是一个向量存储库,详细信息Vector stores

4.2.2 将示例选择器输入 FewShotPromptTemplate

  最后,同4.1.3一样,创建一个 FewShotPromptTemplate 对象,该对象接受示例选择器和少数示例的格式化程序

prompt = FewShotPromptTemplate(
    example_selector=example_selector,
    example_prompt=example_prompt,
    suffix="Question: {input}",
    input_variables=["input"]
)

print(prompt.format(input="Who was the father of Mary Ball Washington?"))
Question: Who was the maternal grandfather of George Washington?

Are follow up questions needed here: Yes.
Follow up: Who was the mother of George Washington?
Intermediate answer: The mother of George Washington was Mary Ball Washington.
Follow up: Who was the father of Mary Ball Washington?
Intermediate answer: The father of Mary Ball Washington was Joseph Ball.
So the final answer is: Joseph Ball


Question: Who was the father of Mary Ball Washington?

五、使用少量示例创建ChatPromptTemplate

  few-shot prompting(少样本提示)的目的是根据输入动态选择相关的样本,并将这些样本格式化成提示给模型,使用 FewShotChatMessagePromptTemplate可以实现这一点。

5.1 使用示例集

基本的少样本提示是使用固定的提示示例,这种方法最简单,在生产环境中也较为可靠。其基本组成部分是:

  • examples:包含在最终提示中的示例(字典类型)列表
  • example_prompt:通过examples的 format_messages 方法将每个示例转换为 1 条或多条消息,比如one human message and one AI message response,或者是a human message followed by a function call message(紧跟着一个函数调用消息)。

下面进行演示。首先导入样本:

from langchain.prompts import ChatPromptTemplate,FewShotChatMessagePromptTemplate

examples = [
    {"input": "2+2", "output": "4"},
    {"input": "2+3", "output": "5"},
]

创建FewShotChatMessagePromptTemplate:

# This is a prompt template used to format each individual example.
example_prompt = ChatPromptTemplate.from_messages(
    [
        ("human", "{input}"),
        ("ai", "{output}"),
    ]
)
few_shot_prompt = FewShotChatMessagePromptTemplate(
    example_prompt=example_prompt,
    examples=examples,
)

print(few_shot_prompt.format())
Human: 2+2
AI: 4
Human: 2+3
AI: 5

调用创建好的FewShotChatMessagePromptTemplate:

from langchain.chat_models import ChatAnthropic

final_prompt = ChatPromptTemplate.from_messages(
    [
        ("system", "You are a wondrous wizard of math."),
        few_shot_prompt,
        ("human", "{input}"),
    ]
)

chain = final_prompt | ChatAnthropic(temperature=0.0)
chain.invoke({"input": "What's the square of a triangle?"})
AIMessage(content=' Triangles do not have a "square". A square refers to a shape with 4 equal sides and 4 right angles. Triangles have 3 sides and 3 angles.nnThe area of a triangle can be calculated using the formula:nnA = 1/2 * b * hnnWhere:nnA is the area nb is the base (the length of one of the sides)nh is the height (the length from the base to the opposite vertex)nnSo the area depends on the specific dimensions of the triangle. There is no single "square of a triangle". The area can vary greatly depending on the base and height measurements.', additional_kwargs={}, example=False)

  在构建最终推理prompt时,首先设置系统角色,然后提供少样本示例,最后传入用户的输入,让模型进行推理final_prompt同时为模型提供上下文、示例和输入的作用,使其能够有针对性地生成响应

5.2 使用示例选择器

  有时您可能希望根据输入来限制显示哪些示例,以达到Dynamic few-shot prompting的效果。为此,您可以将 examples 替换为 example_selector ,其他组件与上面相同(即包含example_selectorexample_prompt)。

  首先构建向量存储(vectorstore),存储输入和输出的embeddings,然后基于向量相似性实现动态示例选择。

from langchain.embeddings import OpenAIEmbeddings
from langchain.prompts import SemanticSimilarityExampleSelector
from langchain.vectorstores import Chroma

examples = [
    {"input": "2+2", "output": "4"},
    {"input": "2+3", "output": "5"},
    {"input": "2+4", "output": "6"},
    {"input": "What did the cow say to the moon?", "output": "nothing at all"},
    {
        "input": "Write me a poem about the moon",
        "output": "One for the moon, and one for me, who are we to talk about the moon?",
    },
]

to_vectorize = [" ".join(example.values()) for example in examples]
embeddings = OpenAIEmbeddings()
vectorstore = Chroma.from_texts(to_vectorize, embeddings, metadatas=examples)

之后,创建example_selector。此处,我们选择前 2 个示例最相似的示例。

example_selector = SemanticSimilarityExampleSelector(
    vectorstore=vectorstore,
    k=2,
)

# The prompt template will load examples by passing the input do the `select_examples` method
example_selector.select_examples({"input": "horse"})
 [{'input': 'What did the cow say to the moon?', 'output': 'nothing at all'},
     {'input': '2+4', 'output': '6'}]

创建FewShotChatMessagePromptTemplate:

from langchain.prompts import ChatPromptTemplate,FewShotChatMessagePromptTemplate,


# 定义few-shot prompt.
few_shot_prompt = FewShotChatMessagePromptTemplate(
    # input variables选择要传递给示例选择器的值
    input_variables=["input"],
    example_selector=example_selector,
    # 定义每个示例的格式。在这种情况下,每个示例将变成 2 条消息:
    # 1 条来自人类,1 条来自 AI
    example_prompt=ChatPromptTemplate.from_messages(
        [("human", "{input}"), ("ai", "{output}")]
    ),
)

print(few_shot_prompt.format(input="What's 3+3?"))
Human: 2+3
AI: 5
Human: 2+2
AI: 4

创建最终的提示模板:

final_prompt = ChatPromptTemplate.from_messages(
    [
        ("system", "You are a wondrous wizard of math."),
        few_shot_prompt,
        ("human", "{input}"),
    ]
)

print(few_shot_prompt.format(input="What's 3+3?"))
Human: 2+3
AI: 5
Human: 2+2
AI: 4
from langchain.chat_models import ChatAnthropic

chain = final_prompt | ChatAnthropic(temperature=0.0)
chain.invoke({"input": "What's 3+3?"})
AIMessage(content=' 3 + 3 = 6', additional_kwargs={}, example=False)

原文地址:https://blog.csdn.net/qq_56591814/article/details/134619031

本文来自互联网用户投稿,该文观点仅代表作者本人,不代表本站立场。本站仅提供信息存储空间服务,不拥有所有权,不承担相关法律责任

如若转载,请注明出处:http://www.7code.cn/show_46666.html

如若内容造成侵权/违法违规/事实不符,请联系代码007邮箱:suwngjj01@126.com进行投诉反馈,一经查实,立即删除

发表回复

您的邮箱地址不会被公开。 必填项已用 * 标注